How does the Glean SDK send gzipped pings

(“This Week in Glean” is a series of blog posts that the Glean Team at Mozilla is using to try to communicate better about our work. They could be release notes, documentation, hopes, dreams, or whatever: so long as it is inspired by Glean.)

Last week’s blog post:This Week in Glean: mozregression telemetry (part 2) by William Lachance.

All “This Week in Glean” blog posts are listed in the TWiG index (and on the Mozilla Data blog).

In the Glean SDK, when a ping is submitted it gets internally persisted to disk and then queued for upload. The actual upload may happen later on, depending on factors such as the availability of an Internet connection or throttling. To save users’ bandwidth and reduce the costs to move bytes within our pipeline, we recently introduced gzip compression for outgoing pings.

This article will go through some details of our upload system and what it took us to enable the ping compression.

How does ping uploading work?

Within the Glean SDK, the glean-core Rust component does not provide any specific implementation to perform the upload of pings. This means that either the language bindings (e.g. Glean APIs for Android in Kotlin) or the product itself (e.g. Fenix) have to provide a way to transport data from the client to the telemetry endpoint.

Before our recent changes (by Beatriz Rizental and Jan-Erik) to the ping upload system, the language bindings needed to understand the format with which pings were persisted to disk in order to read and finally upload them. This is not the case anymore: glean-core will provide language bindings with the headers and the data (ping payload!) of the request they need to upload.

The new upload API empowers the SDK to provide a single place in which to compress the payload to be uploaded: glean-core, right before serving upload requests to the language bindings.

gzipping: the implementation details

The implementation of the function to compress the payload is trivial, thanks to the `flate2` Rust crate:

/// Attempt to gzip the provided ping content. fn gzip_content(path: &str, content: &[u8]) -> Option<Vec<u8>> { let mut gzipper = GzEncoder::new(Vec::new(), Compression::default()); // Attempt to add the content to the gzipper. if let Err(e) = gzipper.write_all(content) { log::error!("Failed to write to the gzipper: {} - {:?}", path, e); return None; } gzipper.finish().ok() }

And an even simpler way to use it to compress the body of outgoing requests:

pub fn new(document_id: &str, path: &str, body: JsonValue) -> Self { let original_as_string = body.to_string(); let gzipped_content = Self::gzip_content(path, original_as_string.as_bytes()); let add_gzip_header = gzipped_content.is_some(); let body = gzipped_content.unwrap_or_else(|| original_as_string.into_bytes()); let body_len = body.len(); Self { document_id: document_id.into(), path: path.into(), body, headers: Self::create_request_headers(add_gzip_header, body_len), } }

What’s next?

The new upload mechanism and its compression improvement is only currently available for the iOS and Android Glean SDK language bindings. Our next step (currently in progress!) is to add the newer APIs to the Python bindings as well, moving the complexity of handling the upload process to the shared Rust core.

In our future, the new upload mechanism will additionally provide a flexible constraint-based scheduler (e.g. “send at most 10 pings per hour”) in addition to pre-defined rules for products to use.

Data@Mozilla

How does the Glean SDK send gzipped pings

This Week in Glean: Page Load Data, Three Ways (Or, How Expensive Are Events?)

This Week in Glean: Your personal Glean data pipeline

This Week in Glean: What If I Want To Collect All The Data?

This Week in Glean: Migrating Legacy Telemetry Collections to Glean

This Week in Glean: How Long Must I Wait Before I Can See My Data?

Never Look at the Data: Why did we start getting so many pings from Korea?

This Week in Data: Python Environment Freshness

This Week in Glean: Reviewing a Book – Rust in Action

Crash Reporting Data Sprint

This Week in Glean: What Flips Your Bit?

Never Look at the Data: Why did we start getting so many pings from Korea?

This Week in Data: Python Environment Freshness

This Week in Glean: What Flips Your Bit?

This Week in Glean: Designing a telemetry collection with Glean

My first time experience at the SciPy conference

Documenting outages to seek transparency and accountability

Data and Firefox Suggest

Announcing Mozilla Rally

Data Publishing @ Mozilla

Understanding default browser trends

This Week in Glean: Data Reviews are Important, Glean Parser makes them Easy

This Week in Glean: What Flips Your Bit?

Detecting Internet Outages with Mozilla Telemetry Data

Making your Data Work for you with Mozilla Rally

This Week in Glean: Fantastic Facts and where to find them

Welcome (back) to Data@Mozilla

This Week in Data: Reading “The Manager’s Path” by Camille Fournier