With the launch of our Mozillians.org community phonebook, I wanted to talk about it’s unusual data access model.
Typical Web Apps
Most web applications use a single shared authentication account to access data.
Users authenticate to the site as themselves, but the web app has business logic to control who sees what from the database. The code has a shared username, say
web-rw and a shared password to connect to a MySQL database.
With the phonebook, we want to eventually support fine-grained privacy controls, much like G+ Circles or Facebook Profile settings. Profile information is sensitive, as we will eventually add t-shirt size, home address, etc. These details should be given out only to trusted groups within the phonebook; not every phonebook user, nor the public.
Gerv chose OpenLDAP as the backend data store, for its ability to provide fine-grained permissions through it’s Access Control List (ACL) feature. Instead of a shared credential, we connect to the LDAP directory as the user on every request.
So what does this have to do with my favorite Saturday pastime – LARPing?
To make this all work, we had to roll our own data access model: LDAP Authentication for Resources Per Each Request aka LARPER.
A typical usage looks something like this:
from larper import UserSession def search(request): ... directory = UserSession.connect(request) results = directory.search(query) return jingo.render(request, 'search.html', dict(people=results))
Under the covers, LARPER manages LDAP connections, binding, marshaling results into person objects etc.
Breaking Modern Frameworks
Django and many modern web app frameworks assume a shared authentication model to the database. They totally break when we want to do per user per request access. Connection pooling, API design, etc are hosed. At best these frameworks support a limited set of connection credentials, but they do not support a per-user database connection model.
So why would we go against the grain and inflict so much pain onto ourselves?
Defense in Depth
If there is a bug in our web application code, we are less likely to leak data since the database itself is handling ACL. This responsibility lies in the data store, instead of the middleware.
Our OpenLDAP server “slapd” has a single config file for managing the Access Control List. This ACL config file is a clean, single, clear place to capture static authorization rules.
A last reason we need a new layer is that Django has an object/relational mapping model. We don’t need this since LDAP directories are already object oriented and not relational data.
Okay, LARPing is not all rainbows and unicorns…
There are some cracks in LARPER, which may be ironed out over time.
Shared Authentication Credentials
Okay, I lied. There are several non-user accounts:
* LDAPAdmin – Can delete an account
* regAgent – Can create a new account
* replicationAgent – Read only access to everything in the directory for replication
* monitor – Useful for operations agents to monitor server health
We’ve tried hard to keep “admin” type accounts to a minimum. Actions like vouching, inviting others, etc are done via LARPER as the current user, but this does break down in terms of some tasks. Do you give these capabilities to a set of users? Whom? How do you get that first user into the system? Who vouches this first user? etc.
We have web analytics, but to get deep community metrics, we’ll need access to some aggregate information. For now we’re keeping a copy of some data in MySQL to be aggregated and analyzed.
Code Paths without a Request
There are places in the code where Django’s framework doesn’t make the current request available, so we don’t use the LARPER abstraction. We can patch these cracks in the future.
Performance and Scalability
A LARPER style architecture is inherently harder to optimize. Breaking Django’s assumptions, we don’t get its optimizations for free either.
- One cannot blindly use caching and connection pooling from the web app layer to the data storage layer
- We cannot rely on Django style connection pooling, nor create a pg_bouncer style pool, but must instead create a per-user connection pool
- Caching must include the current user as part of the cache key, so that we don’t accidentally leak data to be only be seen by User A into the view of User B
Currently profile images are checked against OpenLDAP ACL before being sent to the client. This is being discussed as a product decision, but again has performance ramifications in multiple tiers of the application (Cache vary on Cookie, etc).
We think the extra expense and difficulties are worth the cost, given the privacy and security requirements of a phonebook application.
Is LARPing Right for Me?
The next time you look at storing user data, it’s worth re-evaluating shared credentials to access the data.
Like all of our code, the larper module is open source. Today it is not abstract enough to reuse directly. It may serve as an example of how to push ACL down into your data store, or how to use LDAP and django without using the django-ldap ORM library.