(“This Week in Glean” is a series of blog posts that the Glean Team at Mozilla is using to try to communicate better about our work. They could be release notes, documentation, hopes, dreams, or whatever: so long as it is inspired by Glean. You can find an index of all TWiG posts online.)
Recently, I had the pleasure of working with our wonderful iOS developers here at Mozilla in instrumenting Lockwise, one of our iOS applications, with the Glean SDK. At this point, I’ve already helped integrate it with several other applications, all of which went pretty smoothly, and Lockwise for iOS held true to that. It wasn’t until later, when unexpected things started happening, that I realized something was amiss…
Integrating the Glean SDK with a new product is a fairly straightforward process. On iOS, it amounts to adding the dependency via Carthage, and adding a couple of build-steps to get it to do its thing. After this is done, we generally smoke test the data using the built in debugging tools. If everything looks good, we submit a request for data review for collecting the new metrics. Once a data steward has signed off on our request to collect new data, we can then release a new version of the application with its Glean SDK powered telemetry. Finally, we collect a few weeks of data to validate that everything looks good, such as user counts, distribution of locales, and we look for anything that might indicate that the data isn’t getting collected like we expected, such as holes in sequence numbers or missing fields. In Lockwise for iOS’s case, all of this went just as expected.
One part of the Glean SDK integration that I haven’t mentioned yet is enabling the application in our data ingestion pipeline via the probe-scraper so that we can accept data from it. On iOS, the Glean SDK makes use of the application bundle identifier to uniquely identify the app to our pipeline, so enabling the app means letting the pipeline know about this id so that it won’t turn away the data. This identifier also determines the table that the data ultimately ends up in, so it’s a key identifier in the process.
So, here’s where I learned something new about iOS architecture, especially as it relates to embedded application extensions. Application extensions are a cool and handy way of adding additional features and functionality to your application in the Apple ecosystem. In the case of Lockwise, they are using a form of extension that provides credentials to other applications. This allows the credentials stored in Lockwise to be used to authenticate in websites and other apps installed on the device. I knew about extensions but hadn’t really worked with them much until now, so it was pretty interesting to see how it all worked in Lockwise.
Here’s where a brick smacks into the story. Remember that bundle identifier that I said was used to uniquely identify the app? Well, it turns out that application extensions in iOS modify this a bit by adding to it to uniquely identify themselves! We realized this when we started to see our pipeline reject this new identifier, because it wasn’t an exact match for the identifier that we expected and had allowed through. The id we expected was org-mozilla-ios-lockbox, but the extension was reporting org-mozilla-ios-Lockbox-CredentialProvider. Using a different bundle identifier totally makes sense, since they run as a separate process within their own application sandbox container. The OS needs to see them differently because an extension can run even if the base application isn’t running. Unfortunately, the Glean SDK is purposefully built to not care about, or even know about different processes so we had a bit of a blind spot in the application extension. Not only that, but remember I mentioned that the extension’s storage container is a separate sandbox from the base application? Well, since the extension runs in a different process from the base application, and it has a separate storage, the Glean SDK running in the extension acted just like the extension was a completely separate application. With separate storage, it happily generates a different unique identifier for the client, which does not match the id generated for the base application. So there was no way to attribute the information in the extension to the base application that contained it because the ingestion pipeline saw these as separate applications with no way to associate the client ids between the two. These were two sandboxes that just couldn’t interact with each other. To be fair, Apple does provide a way to share data between extensions and applications, but it requires creating a completely separate shared sandbox, and this doesn’t solve the problem that the same Glean SDK instance just shouldn’t be used directly by multiple processes at the same time.
Well, that wasn’t ideal, to say the least, so we began an investigation to determine what course of action we should (or could) take. We went back and forth over the details but ultimately we determined that the Glean SDK shouldn’t know about processes and that there wasn’t much we could do aside from blocking it from running in the extensions and documenting the fact that it was up to the Glean SDK-using application to ensure that metrics were only collected by the main process application. I was a bit sad that there wasn’t much we could do to make the user-experience better for Glean SDK consumers, but sometimes you just can’t predict the challenges you will face when implementing a truly cross-platform thing. I still hold out hope that a way will open up to make this easier, but the lesson I learned from all of this is that sometimes you can’t win but it’s important to stick to the design and do the best you can.