Scrapyd is an application for deploying and running Scrapy spiders. It enables you to deploy (upload) your projects and control their spiders using a JSON API.
Scrapyd doesn't include any provision for password protecting itself. This container packages Scrapyd with an nginx proxy in front of it providing basic HTTP authentication. The username and password are configured through environment variables.
For more about Scrapyd, see the Scrapyd documentation.
$ docker run -d -e USERNAME=my_username -e PASSWORD=hunter123 cdrx/scrapyd-authenticatedYou can then use the Scrapyd client to easily deploy the scraper from your machine to the running container.
Support for HTTP authentication is built into scrapyd client. Add the username and password field to your scrapy.cfg file and then deploy as you normally would.
[deploy]
url = http://scrapyd:6800/
username = my_username
password = hunter123
If your scraper depends on some 3rd party Python packages (Redis, MySQL drivers, etc) you can install them when the container launches by adding the PACKAGES environment variable.
$ docker run -d -e USERNAME=my_username -e PASSWORD=hunter123 -e PACKAGES=requests,simplejson cdrx/scrapyd-authenticatedThis will make the container a bit slow to boot, so if your starting / stopping the container regularly you would be better off forking this repository and adding the packages to requirements.txt.
| Variable | Required | Example | Description |
|---|---|---|---|
USERNAME |
Yes | my_user | The username for authentication with the Scrapy server |
PASSWORD |
Yes | hunter123 | The password for authentication with the Scrapy server |
PACKAGES |
No | simplejson,requests | Comma separated list of Python packages to install before starting scrapyd |
To persist data between launches, you can mount the volume /scrapyd somewhere on your Docker host.
If you have any problems with or questions about this image, please file an issue on the GitHub repository.
Pull requests welcome :-)