Categories: Data Engineering Glean

This Week in Glean: Glean for Python on Windows

(“This Week in Glean” is a series of blog posts that the Glean Team at Mozilla is using to try to communicate better about our work. They could be release notes, documentation, hopes, dreams, or whatever: so long as it is inspired by Glean. You can find an index of all TWiG posts online.)

One of the top-line level goals with Glean is to support many different platforms and application types with a single core of code written in Rust.  We’ve had Android and iOS support for a while, and are building out the pieces necessary to support desktop Firefox, but today I’d like to talk about how we support Python applications, specifically on Windows.

Python has a rich C API for writing extensions, which are libraries used from Python, but written in compiled languages. This is used by some popular libraries, such as Numpy, to achieve higher performance than you could get from Python alone. There are even Rust bindings to this API in PyO3, which also helps to handle otherwise manual reference counting automatically.  However, for Glean, we don’t use it.

One of the downsides of using the Python/C API is that each minor version of Python has its own ABI, so extensions need to be built specially for each version of Python we need to support.  In our case, we need to support Python 3.5, 3.6, 3.7 and 3.8.  Multiply that be the platforms we need to support (Windows 32-bit, Windows 64-bit, MacOS 64-bit and Linux 64-bit), and that would be 16 separate binaries we would need to build and distribute to support our user base.

Instead, we compile the Rust code into a standard shared object/dynamically linked library with a C API. We then use the cffi library to interact with it directly from Python.  This means we only have to ship one build of Glean for each of the platforms we need to support, not for each version of Python.  The shared object is just included as a data file inside the wheel that we build, and the cffi library handles opening it and calling methods on it.

Building Glean for Python on Windows on Linux

How we build these wheels for Windows is a bit unusual.  We use CircleCI for our continuous testing and building of release products, which doesn’t have Windows support.  We could use another CI service with full Windows support, such as AppVeyor.  However, adding another service would add additional complexity and cost that isn’t really worth it now, though it might be at some point in the future.

Fortunately, it’s really easy to build Rust DLLs for Windows on a Linux container.  You can see how we do it by looking at our CircleCI configuration, and the basic steps are described below.

First, you need to install the Rust toolchain that targets Windows, for use with the GNU (mingw) toolchain:

rustup target add x86_64-pc-windows-gnu

Then you need to install the mingw toolchain (which is used to get the C standard library and the linker).  Since we’re using a Debian VM on CircleCI, we just install the Debian packages:

sudo apt install -y gcc-mingw-w64

We then need to tell Rust’s cargo tool to use this linker when building for Windows by adding the following to the ~/.cargo/config file:

      [target.x86_64-pc-windows-gnu]
      linker = "/usr/bin/x86_64-w64-mingw32-gcc”

Now we can build Glean the way we normally do, except by passing --target x86_64-pc-windows-gnu, it will be built for Windows.

cargo build --target x86_64-pc-windows-gnu

Of course, we also want to test this.  In order to do that, we need some way to run the result.  For that, we can use Wine, which lets you run Windows executables on Linux.  It’s easy to install this on Debian:

sudo apt install wine

Through the magic of wine installing a handler for Windows executables, it’s then easy to run Rust unit tests in the regular way, just by telling it to build and test on the Windows target:

cargo test --target x86_64-pc-windows-gnu

That tests the shared object written in Rust.  Of course, we should also test that shared object running inside of our Python wrappers.  For that, we can install the Windows version of Python inside of Wine.  Python for Windows is normally shipped as a Windows installer, but since we’re running this on CircleCI where no one is around to click the buttons in the installer, we instead use the zip file distribution of Python, and extract it.

wget https://www.python.org/ftp/python/3.7.7/python-3.7.7-embed-amd64.zip
mkdir winpython
unzip python-3.7.7-embed-amd64.zip -d winpython

This distribution of Python doesn’t include pip, so we have to install pip into it before we can install Glean’s dependencies in it.  The details are a bit involved, so I refer the reader to our CircleCI config for that bit.

Now we can build a wheel for Glean that contains the Rust code compiled for Windows.  This cross-compiling configuration — building compiled code for another target than what you’re building on — is not well-supported by Python’s distutils building infrastructure. So Glean just has a hack: if the GLEAN_PYTHON_MINGW_X86_64_BUILD environment variable is set, Glean’s setup.py script knows it should build for Windows and put that shared object inside of the wheel (instead of for Linux as it would normally do).  Then we hard code the name of the wheel, which pip uses to know what platforms the wheel will work with.

Once we’ve done that, we can install the Glean wheel inside Python for Windows running on Wine and run all of Glean’s unit tests.

Stay tuned

With those pieces in place, we are able to build wheels to support our Windows users, while still using the Linux-based CI infrastructure we use for the rest of the project.

In a future installment, I’ll share how we get around some of the multiprocess limitations on Windows by building a super-simple subprocess work manager.