Switching to pip for Python deployments

Andy McKay

2

For the last year or so addons.mozilla.org and marketplace.firefox.com have deployed primarily by pulling the entire project out of github. Required libraries were placed in a git submodule called vendor. Vendor was then a git submodule of our project, meaning we had recursive submodules. At deployment time we recursively pulled all the modules from github to our master server.

This created a few issues. The recursive pull from github was quite slow as we pulled down an awful lot of code. To update something in vendor was quite a tortuous path of updates that generated quite a few expletives from most developers the first few times they tried it. Scripts were written to make that made it easier, but that was just addressing the symptom. Multiple commits appeared in zamboni that had accidental vendor changes in the submodule and more expletives were uttered.

Because everything ended up underneath the main project anything that recursively searched directories took longer. Some git commands, greps, test runs etc took longer and longer as the vendor library grew.

Finally, building packages and using pypi and the existing community infrastructure is a good thing. It means that our code is as close to as normal as possible and our libraries are more likely to be reused if we package them properly from the start.

How we build

We don’t want any surprises about what pip is installing on our server. Often the setup.py script installs dependencies. To prevent that we invoke pip with the --no-deps flag. All dependencies have to be manually specified in the requirements files.

For each package we pin to the specific version, for example Django==1.4.3. This is a little faster since pip doesn’t have to find the relevant version. It means we have to manually update, but it ensures we don’t get a version we didn’t expect. Updates are so much simpler and easier to read in source control (for example).

Unfortunately not everything is a package. We have a few packages that pull directly from github as eggs (as of writing we have 80 python packages and 20 eggs from github). For those we set --exists-action=w to ensure that if those change the deployment continues smoothly.

Security

There are more security issues with taking a package from pypi and installing it on the production server than there is with pulling from github. We wanted to be sure that we were being as safe as possible.

So we set up a server to store local copies of our packages. It is a simple HTTP server that serves from the file system and developers have to scp packages over to it. The deployment pulls from that instead of from pypi, ensuring that we’ve got accountability and traceability about what packages are going to our production servers.

Before developers can get access to upload packages, they have to read and sign off a policy for uploading packages.

Results

The deployments are now slower because we delete the existing virtualenv and reinstall the packages from scratch each deployment. This was a temporary measure to cope with the large number of package changes over first few deployments. This part of the deployment takes about 2 minutes, which is the slowest part of the deployment. Once we stop deleting the virtualenv that should improve.

The vendor submodule still exists to look after legacy code that hasn’t been put into packages or is unlikely ever to do so. It also contains one JavaScript library.

The number of expletives issued by developers during library bumps has dropped and the number of commits with random vendor changes has vanished. Although there have been a few expletives uttered at packages. Updates are faster, git and other command line tools are faster. Overall, life is a whole lot better.

We’ve now got at least four projects using and deploying with pip and we won’t be going back. Hopefully more Mozilla web sites will now also be able to deploy with pip.

Thanks to help from Jason, Jeremy and Guillaume for making this happen.

2 responses

  1. Anders Pearson wrote on ::

    This is pretty much the approach that I’ve been employing and advocating for the last few years. (see this post: http://ccnmtl.columbia.edu/compiled/sysadmin/deploying_django_and_deploying.html)

    I go one step further and actually check in the packages for the dependencies (as tarballs) in the same repo as my project and use (essentially) “pip -E ve –index-url=” –requirement requirements.txt” to force it to install *only* from those packages and never try to contact any remote server during the deploy.

    That bloats the repos a little bit (30-40mb for a typical Django app), which I’m OK with, but has the advantage that deploys have no dependencies on external servers and I can bootstrap a project on my laptop without an internet connection, which has proved invaluable many times.

    So congrats on getting away from dreaded recursive submodules.

  2. Luper Rouch wrote on :

    I made a script to ease the creation of ‘frozen’ requirements files: https://github.com/Stupeflix/freeze-requirements