Migrating Air Mozilla from MySQL to PostgreSQL

Before we dig into the how let’s take a look at the why.

From the beginning, Air Mozilla has been a straight forward Django project that uses the ORM without requiring any database specific features. It didn’t really matter what database you used. Here at Mozilla, we currently prefer MySQL because we have a rock solid and mature infrastructure set up around running it. (Thank you database team!)

This week we launched full-text search in Air Mozilla. Here’s an example search. PostgreSQL supports very powerful features specifically for full-text search, including stemming, highlighting, ranking and custom dictionaries. (Note: MySQL has full-text search indexing too as of MySQL 5.6 but it does not yet support stemming or highlighting).

So how did we migrate the database? In short, this tool: py-mysql2pgsql. What the tool does internally is that it connects to both MySQL and PostgreSQL and reads one table and record at a time to convert over to PostgreSQL. You can check out the code on github.com/philipsoutham/py-mysql2pgsql/.

To run it, all you have to do is fill in the connection details into a YAML file for both MySQL and PostgreSQL and you should now have a working “clone”.

There was one caveat though that irked me. MySQL does not support timestamps with time zone and PostgreSQL does. Django can work around this by applying the time zone with a Django settings variable. By having the time zone information in the database, we don’t have to fake the time zone information any more. It’s also bound to be more performant because you’ve moved the conversion to aware date times nearer to the database. To make this change, we wrote a simple conversion script that you can afterwards throw into PostgreSQL. You can see the rest of the instructions for the migration here.

And here’s a little bonus; the time it took to actually run the migration was approximately 10-15 seconds to migrate over 25,000 rows across 42 tables. That’s connecting to a MySQL and a PostgreSQL in two separate different locations in the same data center.

In a follow-up post I will try to explain more about how we do the full-text search in PostgreSQL with Django.