Deploying Spiders

这一节提供了若干种部署Scrapy爬虫的方法,以供日常使用。Running Scrapy spiders in your local machine is very convenient for the (early) development stage, but not so much when you need to execute long-running spiders or move spiders to run in production continuously. This is where the solutions for deploying Scrapy spiders come in.

Popular choices for deploying Scrapy spiders are:

部署到Scrapyd服务器

Scrapyd is an open source application to run Scrapy spiders. It provides a server with HTTP API, capable of running and monitoring Scrapy spiders.

To deploy spiders to Scrapyd, you can use the scrapyd-deploy tool provided by the scrapyd-client package. Please refer to the scrapyd-deploy documentation for more information.

Scrapyd is maintained by some of the Scrapy developers.

Deploying to Scrapy Cloud

Scrapy Cloud is a hosted, cloud-based service by Scrapinghub, the company behind Scrapy.

Scrapy Cloud removes the need to setup and monitor servers and provides a nice UI to manage spiders and review scraped items, logs and stats.

To deploy spiders to Scrapy Cloud you can use the shub command line tool. Please refer to the Scrapy Cloud documentation for more information.

Scrapy Cloud is compatible with Scrapyd and one can switch between them as needed - the configuration is read from the scrapy.cfg file just like scrapyd-deploy.