from zope import django: persistence

Having recently come from the Plone world to join Mozilla, I am in the delicious but fleeting position to compare two major Python web frameworks with some pretension of familiarity. Through a series of articles focusing on specific features, I will compare the Zope family of frameworks (as they are used in Plone) with the Django framework, which is gaining popularity at Mozilla and currently runs support.mozilla.com and addons.mozilla.com. My hope is that users of and contributors to each framework will learn from the mistakes and triumphs of the other, and everyone will come out with a broader apprehension of the possibilities in this space.

Zope and Django have vastly different design philosophies, but they both come from a common—yet surprising—place: the newspaper industry. Django was born of an internal content management system at the Lawrence Journal-World; Zope, from a classified ad search system for the InfiNet consortium. It could be just dumb luck, but perhaps the quick deadline cycles and sheer volume of daily papers have a natural tendency to produce agile software for managing large amounts of content. Today, we’ll have a look at each framework’s facilities for storing and accessing that content: Django’s object–relational mapper and the Zope Object Database.

Django: relational to its core

Django assumes a relational datastore at its very lowest levels, and that assumption percolates unashamedly up to its public API. All the typical DBs are supported—PostgreSQL, MySQL, SQLite, and Oracle—and more are available from third parties. To store data in Django, one first defines a schema in a Pythonic manner…

    class Hamster(Model):
        first_name = CharField(max_length=50, db_index=True)
        middle_name = CharField(max_length=50)
        last_name = CharField(max_length=50)

…which Django automatically turns into a DB table. These “model” objects are then created or fetched explicitly:

    # Make a new Hamster:
    Hamster.objects.create(first_name='Aloysius', last_name='Fearsomepants')

    # Get the Hamster with the given first and last names:
    h = Hamster.objects.get(first_name='Mister', last_name='Fluffypants')

Table-wide queries happen via so-called “managers” like the above objects one, special objects typically stored as attributes on the pertinent model classes. Individual row operations, meanwhile, are performed directly on instances, where attributes generally map to DB fields:

    # Set Mister Fluffypants' middle name:
    h.middle_name = 'Fuzzy'

Saves are done explicitly:

    # Save the Hamster. This sends an UPDATE query.
    h.save()

Django hijacks Python’s keyword argument syntax in a creative way to express more complex concepts such as substring matching and cross-table joins. Note the double underscores in the following:

    hamsters = Hamster.objects.filter(last_name__startswith='F')

If we had added a self-referencing mother attribute to the Hamster class, we could even join the table to itself and find all the hamsters whose mothers’ last names contain “pants”…

    hamsters = Hamster.objects.filter(mother__last_name__contains='pants')

…or whose mother is a specific hamster:

    hamsters = Hamster.objects.filter(mother=some_specific_hamster)

In each of the previous three examples, hamsters is a QuerySet, a lazy representation of a DB query. QuerySets can be combined in set-mathy ways—unions, intersections, exclusions—and even limited using Python’s slice syntax, all without hitting the DB. If a framework is going to expose relational semantics, this is a beautiful, natural way to do it. Relational datastores are a mature technology, and SQL is ubiquitous; by mapping Python closely to SQL, a learning requirement and a complex, leaky abstraction layer are both avoided.

Zope: transparent Python persistence

The Zope Object Database (ZODB), on the other hand, eschews any concept of schema or explicit reference to a storage layer at all, instead pursuing transparent object persistence über alles. The ZODB is essentially a hierarchy of pickled Python objects. There is a single root object, and others hang off it via hashable keys, just like a Python dictionary (though in practice, most ZODB make use of B-trees rather than hashes to allow better concurrency).

The hamster example might look like this in ZODB parlance:

    # Make a new Hamster:
    root['mister-fluffypants'] = dict(first_name='Aloysius', last_name='Fearsomepants')

First, notice that there is no schema: any pickleable Python object can be stored directly. In this case, I store a plain dict just to make a point, but one typically stores things that inherit from various provided base classes so the ZODB can notice changes. This schema-less-ness is fantastic for prototyping, but it does ultimately allow cruft to accumulate in the DB, leading to branching code and annoying database-walking migrations as data formats change.

To avoid having to walk the world to find an object, one uses the ZCatalog, ZODB’s canonical indexing solution:

    # Get the Hamster with the given first and last names:
    hamsters = catalog.searchResults(first_name='Mister', last_name='Fluffypants')

The ZCatalog is in no way integral to the ZODB; it rides on top, storing its indexes as objects in the database just like everyone else. The ZODB has no particular hooks that update ZCatalog; instead, client code is responsible for notifying it when an object is changed. Zope’s object modification and creation event hooks come in handy here, but it is not uncommon, when using third-party Plone add-ons for instance, for indexes to get out of sync and to require a database-walking rebuild.

ZCatalog has a much smaller API than Django’s managers, as it cannot leverage a whole RDBMS underneath. Typically, a simple query like the above will suffice, or data resides at a predictable path in the object hierarchy. If not, one must write Python code to walk the DB and find what’s needed. Similarly, there is no widely supported protocol for accessing the ZODB from non-Python languages, a continual challenge that leads to a lot of ad hoc XML-RPC or REST interfaces (though wsapi4plone fills many simple needs).

The ZODB provides ACID properties, though it’s light on the C: referential and type consistency are the application’s responsibility. It sports multiversion concurrency control and is reasonably fast, churning through several hundred transactions per second on a modest laptop.

Conclusion

I’ve had a bit of a persistence odyssey these past 15 years: from custom serializers to bare SQL, from writing my own ORMs to wrestling the almost unusably ambitious peak.storage, and finally spending 4 years in the Zope world. After all that, I must admit that Django’s object–relational mapper impresses me greatly. By coupling storage and modeling concerns, it gains simplicity. This is a good tradeoff, since relational storage is mature and its limitations known—if you need a different kind of store, go somewhere else (though there are rumblings of making Django work atop key-value stores). It’s also easy on the eyes: its one-liner field definitions are about the same number of tokens one would need to initialize state in a new plain-Python object, making relational persistence almost a break-even proposition.

Zope retains a slight advantage for reckless, pedal-to-the-metal prototyping, and it’s more natural than a relational store for representing hierarchies, polymorphism, and inheritance relationships. It requires discipline, since there is no constraint-enforcing DB to keep you honest, but nothing beats it for transparency and for feeling at home in a Python program. Transparency, of course, is a double-edged sword: when it stops being obvious when you’re performing a linear scan or even doing DB access at all, performance can go out the window for the unwary. But such are the hazards of leaky abstractions, and both Django and Zope carry this warning.

If you want to learn more about Django’s modeling and storage framework, check out its Model and QuerySet documentation. For an introduction to the ZODB, see the ZODB Tutorial. I also expose some hard-to-find facts about ZODB’s on-disk format in chapter 10 of Plone 3 for Education.

Come back next time for a comparison of Django and Zope’s templating or component coupling systems!

1 response

  1. Matthew Schinckel wrote on :

    Check out South as a way to get database migrations within Django: it means you can change schema quickly, and migrate to or from these changes. Makes prototyping faster (no need to delete/recreate schemas).