(“This Week in Glean” is a series of blog posts that the Glean Team at Mozilla is using to try to communicate better about our work. They could be release notes, documentation, hopes, dreams, or whatever: so long as it is inspired by Glean. You can find an index of all TWiG posts online.)
One of the things I’ve spent a fair amount of time helping with on the Glean Team are the migrations from legacy telemetry to Glean being performed by different Mozilla products. This includes everything from internal tools and libraries to the mobile browsers and even desktop Firefox. As it turns out, there were quite a few projects and products that needed to be migrated. While we have started migrating all of our active products, each of them are at different stages and have different timelines for completion. I thought it might be helpful to take a little narrative look through what the migration process looks like, and so here we go!
Add Glean To Your Project
The first step in migration is adding Glean to the project. Depending on the platform, the precise steps vary slightly, but ultimately result in adding Glean as a project dependency and initializing the SDK on startup. Libraries using Glean follow a slightly different path since they aren’t responsible for initializing Glean. Libraries instead just add the dependency at this point and rely on the base application integration with Glean to initialize the SDK. Oh, and don’t forget to get a data review for adding Glean, this is an important step in ensuring that we are following the guidelines and policies.
Enable Ingestion
Now that the app or library can send Glean telemetry (and still is sending Legacy telemetry), we will need to inform the ingestion pipeline about the application so that the data will be handled properly by the telemetry endpoint. The process for this involves filing a bug, and someone on the data-engineering team will add the appropriate application ids to the Probe Scraper so that ingestion of the data can occur. For libraries, Probe Scraper also needs to know which applications depend on it so that the metrics from the library will be included in the datasets for those applications.
Verify The Initial Integration
Once the basic Glean integration is complete and live, the first steps of verification of the data begins. For applications migrating to Glean, this involves making sure that the baseline pings are flowing in. There are a few ways to do this, from writing some SQL to using Looker to explore the data. This is the opportunity to ensure that the application is showing up in our data tools like our Looker instance, the Glean Dictionary, and possibly checking for the data in GLAM. The things that are important to check here are: that we are getting the data without unexpected ingestion errors, that the client and ping counts appear reasonable, and that the technical information in the baseline ping matches expected values. This is also the time to check that data is being received from the different distribution channels for the application. Typically we have a “nightly”, “beta”, and “release” channel that needs to be verified so some of these analysis steps may need to be repeated for each channel. It’s also a good idea to look at some of the metrics that are critical in filtering, such as language/locale, ping start and end times, and duration, to ensure that everything matches our expectations. Being confident that the integration is correct is the ultimate goal, but don’t be alarmed if you see some differences in things like client and ping counts between legacy and Glean: this is often expected due to differences in how each system is designed and works. Product teams are experts in their product, so if the differences between legacy and Glean data seem wrong, don’t be afraid to find a Glean Team member or your friendly neighborhood data scientist to take a look and advise you if needed.
Enable Deletion Requests For Legacy Telemetry (if needed)
The next step in the process is to add the legacy telemetry identifier to the Glean deletion-request ping. The deletion-request ping is a special Glean ping that gets sent when a user opts out of the telemetry collection. It informs the pipeline to delete the data associated with the identifier. Glean can also handle this step for legacy telemetry, but we need to add the legacy id as a secondary identifier in the deletion-request ping to make it work. Just a note that this is typically only a requirement for applications that are integrating Glean and not libraries, unless those libraries contain identifiers that reference data that may need to be deleted as part of a user’s request. This is also only required if the legacy system doesn’t already have a mechanism for sending its own “deletion-request”
Plan The Migration
At this point we should have basic usage information coming from Glean, so the next step is to migrate the metrics that are specific to the application (or library). The Glean Team provides a spreadsheet to assist in this process, where the product team will start by filling in all of the existing legacy metric collections, along with some information about where they are collected in the code, who owns the collection, and what questions they answer. Once the product team fills in this information, the Glean Team will advise on the correct Glean metric types to ensure that the metric collections in Glean record the same information needed to answer the questions. This is a really good place for product teams to really audit their existing telemetry collections to ensure that it is answering the questions that they have, and that it is really needed. This can help to reduce the overall work required for the migration by potentially eliminating unnecessary and unused metric collections, and promotes lean data collection overall.
Migrate The Metrics
Now that the list of metrics to migrate has been settled upon, the work of instrumenting the metrics in Glean begins. This involves adding metrics to the metrics.yaml and instrumenting the recording of the metric in the code. There are likely several strategies that could be used here, but I would recommend migrating metrics in logical units, such as a feature at a time, in order to better plan, prioritize, implement, and verify. An entire application likely has a lot of metrics as a whole, but looking at it feature by feature makes the process more manageable. The instrumentation isn’t complete without adding test coverage for the metrics. Glean provides a testing API which allows to check that a valid value was recorded for every metric type to use for the purpose of writing these tests. The API extends to checking for errors in recording, as well as for testing of custom pings. These tests should be a first line of validation and can help catch things that could potentially cause issues with the data. As each feature is migrated from legacy to Glean, the product team should look at the legacy and Glean data side-by-side to ensure that the Glean data is correct and complete. Part of this process should include verifying that any ETL (extract, transform, load) job that processes the data is also correct. This may require some data-science help if there are any questions that arise, but it’s important to ensure that the collection is correct before calling it complete.
Validate The Migration And Reporting
Once all the legacy collections have been migrated, there will be two telemetry systems instrumenting the application or library. While it might be tempting to remove the legacy system now, there is one more important consideration that must be taken into account before that work can proceed. All of the data that is collected goes somewhere, ending up in a query or dashboard that (we hope) someone is using to make decisions on. Part of the migration work includes migrating those consumers to using the new Glean data so that there is a continuity of information and the ability to answer those business questions isn’t interrupted. Don’t forget that it may not just be the product team that is consuming this data, it could be other teams interested in this for management dashboards, or for revenue dashboards, etc. It is important to understand all of the stakeholders in the legacy system’s data so that they can migrate to the Glean data along with the product. Finally, now that everything from the instrumentation to the reporting infrastructure is migrated, and with the okay of data-science that everything looks good, it should be safe to remove the legacy telemetry instrumentation.
There are a lot of steps and nuances to the migration process that might not be clear at first glance. My intention with this post is to illuminate the overall migration process a bit more, and perhaps help you to find where you are at in it and where to go if you are feeling a bit lost in the process. The Glean Team is always around to advise and help, but no one knows each product better than the product teams themselves, so understanding this process will hopefully help those teams have a better command and ownership over their telemetry collections and the questions they can answer with them.