Categories: Data Engineering

This Week in Glean: GeckoView + Glean = Fenix performance metrics

(“This Week in Glean” is a series of blog posts that the Glean Team at Mozilla is using to try to communicate better about our work. They could be release notes, documentation, hopes, dreams, or whatever: so long as it is inspired by Glean. You can find an index of all TWiG posts online.)

This week in Glean we tell a tale of components, design, performance and ponies (I promise!): how to bridge different telemetry worlds, with different semantics and principles? How can we get the data to answer the question “is Fenix loading pages faster or slower compared to Fennec”?

To better understand the problem, let’s take a very high level look at how Fenix is structured: it is based off a mix of Android Components’ components, each covering different features. One of these components is engine-gecko-nightly which, together with engine-gecko-beta and engine-gecko, wraps GeckoView, Mozilla’s Gecko engine packed for Android. Different engine-gecko-* components represent different GeckoView release channels, following Mozilla’s train model. The Gecko engine is responsible for loading web pages and rendering them. Key performance metrics, for example page load time and how much time it takes to compose a page, are collected inside of Gecko, using Telemetry (note the capital T!). However Gecko doesn’t know too much about the product embedding it. How would Fenix know that these metrics were collected? Their layout? Their schedule?

A measurement as it traverses the whole Fenix stack.

In Fennec a legacy telemetry system was in place: Telemetry was tightly coupled with both Gecko and the product. In Fenix, we had the opportunity to introduce Glean, a modern telemetry framework by Mozilla that encourages lean data practices. Glean comes with an SDK that provides a variety of tools for engineers to measure timespans, timing distributions, counts and so on in contrast with Telemetry which offered lower level facilities such as histograms and scalars. Moreover, the Glean SDK defines a few pings out of the box, with the ‘metrics’ ping containing the bulk of metrics being sent only once per day.

With the two systems being so radically different, we got creative and slightly changed the Telemetry system in Gecko when it’s packed by GeckoView: instead of storing data internally as done for Firefox Telemetry, the new version would only bubble up histograms or scalars calls to the recording API through the technological stack, traversing language differences and boundaries (how romantic!), eventually reaching the engine-gecko-* component to be dispatched through the Glean SDK. Welcome streaming telemetry!

In order for this to work, Gecko engineers need to declare their desired Glean SDK metrics in a registry file. This is different than the Histogram.json or Scalars.yaml used by Gecko Telemetry, as it requires a different and extended set of metadata (e.g. data_review!). Moreover, the definitions in this file are required to mention which Telemetry metric connects to them, through the “gecko_datapoint” property.

To glue it all together, the engine-gecko-* component provides an implementation of GeckoViews’s RuntimeTelemetry.Delegate. For each metric type received by the GeckoView layer, this implementation looks up the corresponding Glean SDK metric types in the build-time generated lookup table. Such a table is produced by the Glean SDK by parsing the metrics.yaml file declared in mozilla-central, which is extracted from the GeckoView AAR.

Gecko streaming telemetry, together with the Glean SDK, enable Glean-supported Mozilla Android products to collect GeckoView metrics out of the box, in the ‘metrics’ ping with very few changes:

val builder = GeckoRuntimeSettings.Builder()
val runtimeSettings = builder
                      .telemetryDelegate(GeckoGleanAdapter())
                      .build()
// Create the Gecko runtime.
GeckoRuntime.create(context, runtimeSettings)

And, as a nice bonus point, exfiltrated Gecko metrics get documentation, automatically generated by the Glean SDK, here (Nightly channel).

This was a project spanning multiple months that saw the collaboration of many different engineers: huge thanks to @chuttenc, the GeckoView and the Android Components teams!

Sorry, I lied. No ponies in this article! But stay tuned for the next week: a lot is happening in the Glean world!

(( This is a syndicated copy of the original post. ))