Double Boiler is a work-in-progress demonstration of how to pack extra processes into every Heroku (or other Heroku-alike) process, as well as attempting to squeeze as much throughput as possible out of a single Heroku process.
It currently uses Python, Django, Celery, and Foreman, a desktop
implementation of the Procfile format that is preferred by Heroku.
Because there is a second Procfile not interpreted by Heroku
(LocalProcfile
) that is used to start more processes within a
single Heroku process (seen in Procfile
) there is a
double-layering of Procfiles. It is this technique that gives the
project its name.
The intention was that hobbyists who wanted some asynchronous execution via Celery who cannot justify two full Heroku processes could still get all the expressivity they desire, so long as all their required processes fit into one of the containers furnished by their platform-as-a-service provider. In a system large enough to keep multiple process types busy this trick is probably just extra complexity.
Finally, each of the processes started in LocalProcfile
are
themselves multi-process: Celery starts multiple forked workers to
service asynchronous jobs, and Gunicorn starts multiple web serving
backends. To get even more throughput from each forked Gunicorn
worker, the gevent backend is used to process multiple requests per
Gunicorn backend in a cooperatively interleaved fashion, using the
time that would normally be spent waiting for I/O to do even more
work.
The history of the project in Git attempts to be reasonably well documented, but there is no reason why it cannot be used as a hello-world template in its latest revision.
Because this project intended to be a template of kinds, no stable Django
SECRET_KEY
(insettings.py
) is defined. Instead, a random one is created upon every startup. It is completely necessary to make a stableSECRET_KEY
in production settings, and the current code reads an environment variable calledDJANGO_SECRET_KEY
.One can set it in their Heroku environment easily:
$ heroku config:add DJANGO_SECRET_KEY='my-long-random-thing-here'
Also, as a normal Django project, it is necessary to run
syncdb
to have things run at all correctly, especially the first time. You can do this on Heroku with:$ heroku run hellodjango/manage.py syncdb
Heroku (and some other services with a similar model) might idle out processes on their platforms. That means some things like scheduled tasks will not work so well, should one's application not receive a steady stream of traffic. Some creativity can be used (look at Heroku's cron addon, for example) to assist with these use cases.
- There has been no attempt to try tweaking the concurrency (aka number of forked workers) levels of Celery and Gunicorn to be sensible for Heroku. If you see out of memory errors (Python calls these MemoryError exceptions) then some of your workers or web processes may be using too much memory, and reducing the number of forked workers may help.
- gevent monkey-patches underlying Python libraries to make I/O done through them also imply a cooperative-scheduling task-switching opportunity. This very often but does not always work (bugs or misbehavior between gevent and other common packages are tracked with the gevent project and mailing lists). One can opt-out of gevent by using the 'sync' backend (as detailed by the Gunicorn documentation) instead of the gevent one.
The current and (currently) very unexciting demonstration of this can be seen at http://double-boiler.heroku.com. It just shows a hello world Django page; it is indiscernible from a normal Django "hello world".
I'd like to have a more compelling demo than hello world (say, generating jobs rapidly via many, many ajax calls and then serving them) to show gevent, Gunicorn, and Celery in action. A mini benchmark would be nice. I would like contributions in this area, because I'm not really a great nor enthusiastic web developer.