Categories: Data Engineering

This Week in Glean: Metric lifetimes

This Week in Glean: Metric lifetimes

(“This Week in Glean” is a series of blog posts that the Glean Team at Mozilla is using to try to communicate better about our work. They could be release notes, documentation, hopes, dreams, or whatever: so long as it is inspired by Glean. You can find an index of all TWiG posts online.)

One of the goals of the Glean SDK is to handle many of the details of recording telemetry correctly on the client. We want a world in which developers using the Glean SDK to record telemetry merely have to pass the appropriate data, and it will take care of storing it, submitting it in the appropriate container (or ping), and automatically resetting the data at the appropriate time.

This last part is related to the concept of the “metric lifetime”, which is the period in which the metric should live until it is reset. While this concept is critical to creating reliable and trustworthy data, it has been one of the most difficult concepts to communicate to our users.

Glean supports three different metric lifetimes:

  • ping (default): The metric is cleared each time it is submitted in the ping. This is the most common case, and should be used for metrics that are highly dynamic, such as things computed in response to the user’s interaction with the application.
  • application: The metric is related to an application run, and is cleared after the application restarts and any Glean-owned ping, due at startup, is submitted. This should be used for things that are constant during the run of an application, such as the operating system version. In practice, these metrics are generally set during application startup.
  • user: The metric is part of the user’s profile. This should be used for things that change only when the user’s profile is created. It is rare to use this lifetime outside of some metrics that are built in to Glean, such as first_run_date.

While lifetimes are important to understand for all of Glean’s metric types, they are particularly important for the metric types that record single values and don’t aggregate on the client.  These metrics will send the “last known” value and missing the earlier values could be a form of unintended data loss.

Let’s work through an example to see how these lifetimes play out in practice. Suppose we have a user preference, “turbo mode”, which defaults to false, but the user can turn it to true at any time.  We want to know when this flag is true so we can measure its effect on other metrics in the same ping. In the following diagram, we look at a time period that sends 4 pings across two separate runs of the application. We assume here, that like Glean’s built-in metrics ping, the developer writing the metric isn’t in control of when the ping is submitted.

In this diagram, the ping measurement windows are represented as rectangles, but the moment the ping is “submitted” is represented by its right edge. The user changes the “turbo mode” setting from false to true in the first run, and then toggles it again twice in the second run.  For each example metric A-F, each row shows the lifetime of the metric and when it’s recorded.

metric lifetime timeline

  • A. Ping lifetime, set on change: The value isn’t included in Ping 1, because Glean doesn’t know about it yet.  It is included in the first ping after being recorded (Ping 2), which causes it to be cleared.
  • B. Ping lifetime, set on init and change: The default value is included in Ping 1, which clears it, and the changed value is included in Ping 2, which again clears it.  It misses Ping 3, but when the application is started, it is recorded again and it is included in Ping 4. However, this causes it to be cleared again and it is not in Ping 5.
  • C. Application lifetime, set on change: The value isn’t included in Ping 1, because Glean doesn’t know about it yet. After the value is changed, it is included in Pings 2 and 3, but then due to application restart it is cleared, so it is not included until the value is manually toggled again.
  • D. Application, set on init and change: The default value is included in Ping 1, and the changed value is included in Pings 2 and 3. Even though the application startup causes it to be cleared, it is set again, and all subsequent pings also have the value.
  • E. User, set on change: The default value is missing from Ping 1, but since user lifetime metrics aren’t cleared unless the user profile is reset (e.g. on Android, when the product is uninstalled), it is included in all subsequent pings.
  • F. User, set on init and change: Since user lifetime metrics aren’t cleared unless the user profile is reset, it is included in all subsequent pings.  This would be true even if the “turbo mode” preference were never changed again.

Note that for all of the metric configurations, the toggle of the preference off and on during Ping 4 is completely missed.

If you need to make the lifetime of the value being recorded exactly match the lifetime of the ping it is in, Glean provides a facility for custom pings, where the developer using Glean can control exactly when pings are sent.

Alright, folks, that’s the end of the blog lifetime.  Time to clear all the metrics and see you next week.