{"id":459,"date":"2025-12-09T15:40:28","date_gmt":"2025-12-09T15:40:28","guid":{"rendered":"https:\/\/blog.mozilla.org\/data\/?p=459"},"modified":"2025-12-09T15:40:28","modified_gmt":"2025-12-09T15:40:28","slug":"incident-report-a-compiler-bug-and-json","status":"publish","type":"post","link":"https:\/\/blog.mozilla.org\/data\/2025\/12\/09\/incident-report-a-compiler-bug-and-json\/","title":{"rendered":"Incident Report: A compiler bug and JSON"},"content":{"rendered":"<p>It all started rather inconspicuous: The Data Engineering team filed <a href=\"https:\/\/bugzilla.mozilla.org\/show_bug.cgi?id=1999791\">a bug report<\/a> about a sudden increase in schema errors at ingestion of telemetry data from Firefox for Android. At that point in time about 0.9% of all incoming pings were not passing our schema validation checks.<\/p>\n<p>The data we were seeing was surprising. Our ingestion endpoint received valid JSON that contained snippets like this:<\/p>\n<div id=\"cb1\" class=\"sourceCode\">\n<pre class=\"sourceCode json\"><code class=\"sourceCode json\"><span id=\"cb1-1\"><span class=\"fu\">{<\/span><\/span>\r\n<span id=\"cb1-2\">    <span class=\"dt\">\"metrics\"<\/span><span class=\"fu\">:<\/span> <span class=\"fu\">{<\/span><\/span>\r\n<span id=\"cb1-3\">        <span class=\"dt\">\"schema: counter\"<\/span><span class=\"fu\">:<\/span> <span class=\"fu\">{<\/span><\/span>\r\n<span id=\"cb1-4\">            <span class=\"dt\">\"glean.validation.pings_submitted\"<\/span><span class=\"fu\">:<\/span> <span class=\"fu\">{<\/span><\/span>\r\n<span id=\"cb1-5\">                <span class=\"dt\">\"events\"<\/span><span class=\"fu\">:<\/span> <span class=\"dv\">1<\/span><\/span>\r\n<span id=\"cb1-6\">            <span class=\"fu\">}<\/span><\/span>\r\n<span id=\"cb1-7\">        <span class=\"fu\">},<\/span><\/span>\r\n<span id=\"cb1-8\">        <span class=\"er\">...<\/span><\/span>\r\n<span id=\"cb1-9\">    <span class=\"fu\">},<\/span><\/span>\r\n<span id=\"cb1-10\">    <span class=\"er\">...<\/span><\/span>\r\n<span id=\"cb1-11\"><span class=\"fu\">}<\/span><\/span><\/code><\/pre>\n<\/div>\n<p>What we would expect and would pass our schema validation is this:<\/p>\n<div id=\"cb2\" class=\"sourceCode\">\n<pre class=\"sourceCode json\"><code class=\"sourceCode json\"><span id=\"cb2-1\"><span class=\"fu\">{<\/span><\/span>\r\n<span id=\"cb2-2\">    <span class=\"dt\">\"metrics\"<\/span><span class=\"fu\">:<\/span> <span class=\"fu\">{<\/span><\/span>\r\n<span id=\"cb2-3\">        <span class=\"dt\">\"labeled_counter\"<\/span><span class=\"fu\">:<\/span> <span class=\"fu\">{<\/span><\/span>\r\n<span id=\"cb2-4\">            <span class=\"dt\">\"glean.validation.pings_submitted\"<\/span><span class=\"fu\">:<\/span> <span class=\"fu\">{<\/span><\/span>\r\n<span id=\"cb2-5\">                <span class=\"dt\">\"events\"<\/span><span class=\"fu\">:<\/span> <span class=\"dv\">1<\/span><\/span>\r\n<span id=\"cb2-6\">            <span class=\"fu\">}<\/span><\/span>\r\n<span id=\"cb2-7\">        <span class=\"fu\">},<\/span><\/span>\r\n<span id=\"cb2-8\">        <span class=\"er\">...<\/span><\/span>\r\n<span id=\"cb2-9\">    <span class=\"fu\">},<\/span><\/span>\r\n<span id=\"cb2-10\">    <span class=\"er\">...<\/span><\/span>\r\n<span id=\"cb2-11\"><span class=\"fu\">}<\/span><\/span><\/code><\/pre>\n<\/div>\n<p>The difference? 8 characters:<\/p>\n<div id=\"cb3\" class=\"sourceCode\">\n<pre class=\"sourceCode patch\"><code class=\"sourceCode diff\"><span id=\"cb3-1\"><span class=\"st\">-        \"schema: counter\": {<\/span><\/span>\r\n<span id=\"cb3-2\"><span class=\"va\">+        \"labeled_counter\": {<\/span><\/span><\/code><\/pre>\n<\/div>\n<p>8 different characters that still make up valid JSON, but break validation.<\/p>\n<p>A week later the number of errors kept increasing, affecting up to 2% of all ingested pings from Firefox for Android Beta. That&#8217;s worryingly high. That&#8217;s enough to drop other work and call an incident.<\/p>\n<h2 id=\"aside-telemetry-ingestion\">Aside: Telemetry ingestion<\/h2>\n<p>In Firefox the data is collected using the <a href=\"https:\/\/github.com\/mozilla\/glean\">Glean SDK<\/a>. Data is stored in a local database and eventually assembled into what we call a <a href=\"https:\/\/mozilla.github.io\/glean\/book\/appendix\/glossary.html#ping\">ping<\/a>: A bundle of related metrics, gathered in a JSON payload to be transmitted. This JSON document is then <code>POST<\/code>ed to the <a href=\"https:\/\/docs.telemetry.mozilla.org\/concepts\/pipeline\/http_edge_spec\">Telemetry edge server<\/a>. From there the decoder eventually picks it up and processes it further. One of the early things it does is verify the received data against one of the <a href=\"https:\/\/github.com\/mozilla-services\/mozilla-pipeline-schemas\">pre-defined schemas<\/a>. When data is coming from the Glean SDK it must pass <a href=\"https:\/\/github.com\/mozilla-services\/mozilla-pipeline-schemas\/blob\/main\/schemas\/glean\/glean\/glean.1.schema.json\">the pre-defined <code>glean.1.schema.json<\/code><\/a>. This essentially describes which fields to expect in the nested JSON object. One thing it is expecting is <a href=\"https:\/\/github.com\/mozilla-services\/mozilla-pipeline-schemas\/blob\/1f4e1dada6a32f7ff1718034c74427b1a351a1df\/schemas\/glean\/glean\/glean.1.schema.json#L299-L310\">a <code>labeled_counter<\/code><\/a> A thing it is NOT expecting is <code>schema: counter<\/code>. In fact<br \/>\nkeys other than the listed ones <a href=\"https:\/\/github.com\/mozilla-services\/mozilla-pipeline-schemas\/blob\/1f4e1dada6a32f7ff1718034c74427b1a351a1df\/schemas\/glean\/glean\/glean.1.schema.json#L173\">are forbidden<\/a>.<\/p>\n<h2 id=\"the-missing-schema_\">The missing schema:_<\/h2>\n<p>The data we were receiving from a growing number of clients contained 8 bytes that we didn&#8217;t expect in that place: <code>schema: <\/code>. That 8-character string didn&#8217;t even show up in the <a href=\"https:\/\/searchfox.org\/glean\/search?q=schema%3A+&amp;path=*.rs&amp;case=true&amp;regexp=false\">Glean SDK source code<\/a>. Where does it come from? Why was it showing up now?<\/p>\n<p>We did receive entirely valid JSON, so it&#8217;s unlikely to be simple memory corruption<a id=\"fnref1\" class=\"footnote-ref\" role=\"doc-noteref\" href=\"#fn1\"><sup>1<\/sup><\/a>. More like memory confusion, if that&#8217;s a thing.<\/p>\n<p>We know where the payload is constructed. The nested object for labeled metrics is constructed <a href=\"https:\/\/github.com\/mozilla\/glean\/blob\/88e30a21f6bf621757c4139c271afda8c8a6123e\/glean-core\/src\/storage\/mod.rs#L35\">in its own function<\/a>. It starts with string formatting:<\/p>\n<div id=\"cb4\" class=\"sourceCode\">\n<pre class=\"sourceCode rust\"><code class=\"sourceCode rust\"><span id=\"cb4-1\"><span class=\"kw\">let<\/span> ping_section <span class=\"op\">=<\/span> <span class=\"pp\">format!<\/span>(<span class=\"st\">\"labeled_{}\"<\/span><span class=\"op\">,<\/span> metric<span class=\"op\">.<\/span>ping_section())<span class=\"op\">;<\/span><\/span><\/code><\/pre>\n<\/div>\n<p>There&#8217;s our 8-character string <code>labeled_<\/code> that gets swapped. The Glean SDK is embedded into Firefox inside mozilla-central and compiled with all the other code together. A single candidate for the <code>schema: <\/code> string <a href=\"https:\/\/searchfox.org\/firefox-main\/search?q=%22schema%3A&amp;path=&amp;case=false&amp;regexp=false\">exists in that codebase<\/a>. That&#8217;s another clue it could be memory confusion.<\/p>\n<h2 id=\"my-schema-confused\">My schema? Confused.<\/h2>\n<p>I don&#8217;t know much about how string formatting in Rust works under the hood, but luckily <a href=\"https:\/\/marabos.nl\/\">Mara<\/a> blogged about it 2 years ago: <a href=\"https:\/\/blog.m-ou.se\/format-args\/\">Behind the Scenes of Rust String Formatting: format_args!()<\/a> (and then <a href=\"https:\/\/hachyderm.io\/@Mara\/115542621720999480\">recently improved the implementation<\/a><a id=\"fnref2\" class=\"footnote-ref\" role=\"doc-noteref\" href=\"#fn2\"><sup>2<\/sup><\/a>).<\/p>\n<p>So the <code>format!<\/code> from above expands into something like this:<\/p>\n<div id=\"cb5\" class=\"sourceCode\">\n<pre class=\"sourceCode rust\"><code class=\"sourceCode rust\"><span id=\"cb5-1\"><span class=\"pp\">std::io::<\/span>_format(<\/span>\r\n<span id=\"cb5-2\">    <span class=\"co\">\/\/ Simplified expansion of format_args!():<\/span><\/span>\r\n<span id=\"cb5-3\">    <span class=\"pp\">std::fmt::<\/span>Arguments <span class=\"op\">{<\/span><\/span>\r\n<span id=\"cb5-4\">        template<span class=\"op\">:<\/span> <span class=\"op\">&amp;<\/span>[<span class=\"bu\">Str<\/span>(<span class=\"st\">\"labeled_ \"<\/span>)<span class=\"op\">,<\/span> Arg(<span class=\"dv\">0<\/span>)]<span class=\"op\">,<\/span><\/span>\r\n<span id=\"cb5-5\">        arguments<span class=\"op\">:<\/span> <span class=\"op\">&amp;<\/span>[<span class=\"op\">&amp;<\/span>metric<span class=\"op\">.<\/span>ping_section() <span class=\"kw\">as<\/span> <span class=\"op\">&amp;<\/span><span class=\"kw\">dyn<\/span> <span class=\"bu\">Display<\/span>]<span class=\"op\">,<\/span><\/span>\r\n<span id=\"cb5-6\">    <span class=\"op\">}<\/span><\/span>\r\n<span id=\"cb5-7\">)<span class=\"op\">;<\/span><\/span><\/code><\/pre>\n<\/div>\n<p>Another clue that the <code>labeled_<\/code> string is referenced all by itself and swapping out the pointer to it would be enough to lead to the corrupted data we were seeing.<\/p>\n<h2 id=\"architecturing-more-clues\">Architecturing more clues<\/h2>\n<p>Whenever we&#8217;re faced with data anomalies we <a href=\"https:\/\/mozilla.github.io\/glean\/book\/user\/howto\/investigating-data-issues\/investigating-data-issues.html\">start by dissecting the data<\/a> to figure out if the anomalies are from a particular subset of clients. The hope is that identifying the subset of clients where it happens gives us more clues about the bug itself.<\/p>\n<p>After initially focusing too much on actual <em>devices<\/em> colleagues helpfully pointed out that the actual split was the device&#8217;s architecture<a id=\"fnref3\" class=\"footnote-ref\" role=\"doc-noteref\" href=\"#fn3\"><sup>3<\/sup><\/a>:<\/p>\n<div id=\"attachment_460\" style=\"width: 2530px\" class=\"wp-caption aligncenter\"><img aria-describedby=\"caption-attachment-460\" decoding=\"async\" loading=\"lazy\" class=\"wp-image-460 size-full\" src=\"http:\/\/blog.mozilla.org\/data\/files\/2025\/12\/2025-12-01-schema-counter-error-architecture.png\" alt=\"Data since 2025-11-11 showing a sharp increase in errors for armeabi-v7a clients\" width=\"2520\" height=\"970\" srcset=\"https:\/\/blog.mozilla.org\/data\/files\/2025\/12\/2025-12-01-schema-counter-error-architecture.png 2520w, https:\/\/blog.mozilla.org\/data\/files\/2025\/12\/2025-12-01-schema-counter-error-architecture-300x115.png 300w, https:\/\/blog.mozilla.org\/data\/files\/2025\/12\/2025-12-01-schema-counter-error-architecture-600x231.png 600w, https:\/\/blog.mozilla.org\/data\/files\/2025\/12\/2025-12-01-schema-counter-error-architecture-768x296.png 768w, https:\/\/blog.mozilla.org\/data\/files\/2025\/12\/2025-12-01-schema-counter-error-architecture-1536x591.png 1536w, https:\/\/blog.mozilla.org\/data\/files\/2025\/12\/2025-12-01-schema-counter-error-architecture-2048x788.png 2048w, https:\/\/blog.mozilla.org\/data\/files\/2025\/12\/2025-12-01-schema-counter-error-architecture-1000x385.png 1000w\" sizes=\"(max-width: 2520px) 100vw, 2520px\" \/><p id=\"caption-attachment-460\" class=\"wp-caption-text\">Data since 2025-11-11 showing a sharp increase in errors for armeabi-v7a clients<\/p><\/div>\n<p>ARMv8, the 64-bit architecture, did not run into this issue<a id=\"fnref4\" class=\"footnote-ref\" role=\"doc-noteref\" href=\"#fn4\"><sup>4<\/sup><\/a>. ARMv7, purely 32-bit, was the sole driver of this data anomaly. Another clue that something in the code specifically for this architecture was causing this.<\/p>\n<h2 id=\"logically-unchanged\">Logically unchanged<\/h2>\n<p>With a hypothesis what was happening, but no definite answer why, we went to speculative engineering: Let&#8217;s avoid the code path that we think is problematic.<\/p>\n<p>By explicitly listing out the different strings we want to have in the JSON payload we avoid the formatting and thus hopefully any memory confusion.<\/p>\n<div id=\"cb6\" class=\"sourceCode\">\n<pre class=\"sourceCode rust\"><code class=\"sourceCode rust\"><span id=\"cb6-1\"><span class=\"kw\">let<\/span> ping_section <span class=\"op\">=<\/span> <span class=\"cf\">match<\/span> metric<span class=\"op\">.<\/span>ping_section() <span class=\"op\">{<\/span><\/span>\r\n<span id=\"cb6-2\">    <span class=\"st\">\"boolean\"<\/span> <span class=\"op\">=&gt;<\/span> <span class=\"st\">\"labeled_boolean\"<\/span><span class=\"op\">.<\/span>to_string()<span class=\"op\">,<\/span><\/span>\r\n<span id=\"cb6-3\">    <span class=\"st\">\"counter\"<\/span> <span class=\"op\">=&gt;<\/span> <span class=\"st\">\"labeled_counter\"<\/span><span class=\"op\">.<\/span>to_string()<span class=\"op\">,<\/span><\/span>\r\n<span id=\"cb6-4\">    <span class=\"co\">\/\/ &lt;snip&gt;<\/span><\/span>\r\n<span id=\"cb6-5\">    _ <span class=\"op\">=&gt;<\/span> <span class=\"pp\">format!<\/span>(<span class=\"st\">\"labeled_{}\"<\/span><span class=\"op\">,<\/span> metric<span class=\"op\">.<\/span>ping_section())<span class=\"op\">,<\/span><\/span>\r\n<span id=\"cb6-6\"><span class=\"op\">};<\/span><\/span><\/code><\/pre>\n<\/div>\n<p>This was implemented in <a href=\"https:\/\/github.com\/mozilla\/glean\/commit\/912fc8063575df48c5b3d838944036a1a37d6fc3\">912fc80<\/a> and shipped in <a href=\"https:\/\/github.com\/mozilla\/glean\/releases\/tag\/v66.1.2\">Glean v66.1.2<\/a>. It landed in Firefox the same day of the SDK release and made it to Firefox for Android Beta the Friday after. The data shows: It&#8217;s working, no more memory confusion!<\/p>\n<div id=\"attachment_461\" style=\"width: 1874px\" class=\"wp-caption aligncenter\"><img aria-describedby=\"caption-attachment-461\" decoding=\"async\" loading=\"lazy\" class=\"size-full wp-image-461\" src=\"http:\/\/blog.mozilla.org\/data\/files\/2025\/12\/2025-12-01-schema-counter-error-downwards.png\" alt=\"The number of errors have been on a downturn ever since the fix landed on 2025-11-26\" width=\"1864\" height=\"614\" srcset=\"https:\/\/blog.mozilla.org\/data\/files\/2025\/12\/2025-12-01-schema-counter-error-downwards.png 1864w, https:\/\/blog.mozilla.org\/data\/files\/2025\/12\/2025-12-01-schema-counter-error-downwards-300x99.png 300w, https:\/\/blog.mozilla.org\/data\/files\/2025\/12\/2025-12-01-schema-counter-error-downwards-600x198.png 600w, https:\/\/blog.mozilla.org\/data\/files\/2025\/12\/2025-12-01-schema-counter-error-downwards-768x253.png 768w, https:\/\/blog.mozilla.org\/data\/files\/2025\/12\/2025-12-01-schema-counter-error-downwards-1536x506.png 1536w, https:\/\/blog.mozilla.org\/data\/files\/2025\/12\/2025-12-01-schema-counter-error-downwards-1000x329.png 1000w\" sizes=\"(max-width: 1864px) 100vw, 1864px\" \/><p id=\"caption-attachment-461\" class=\"wp-caption-text\">The number of errors have been on a downturn ever since the fix landed on 2025-11-26<\/p><\/div>\n<h2 id=\"a-bug-gone-but-still-there\">A bug gone but still there<\/h2>\n<p>The immediate incident-causing data anomaly was mitigated, the bug is not making it to the <a href=\"https:\/\/whattrainisitnow.com\/release\/?version=146\">Firefox 146 release<\/a>.<\/p>\n<p>But we still didn&#8217;t know why this was happening in the first place. My colleagues Yannis and Serge kept working and searching and were finally able to track down what exactly is happening in the code. <a href=\"https:\/\/bugzilla.mozilla.org\/show_bug.cgi?id=2003320\">The bug<\/a> contains more information on the investigation.<\/p>\n<p>While I was trying to read and understand the disassembly of the broken builds, they went ahead and wrote a tiny emulator (based on the <a href=\"https:\/\/www.unicorn-engine.org\/\">Unicorn engine<\/a>) that runs just enough of the code to find the offending code path<a id=\"fnref5\" class=\"footnote-ref\" role=\"doc-noteref\" href=\"#fn5\"><sup>5<\/sup><\/a>.<\/p>\n<pre><code>&gt; python .\/emulator.py libxul.so\r\nPath: libxul.so\r\nGNU build id: 1b9e9c8f439b649244c7b3acf649d1f33200f441\r\nSymbol server ID: 8F9C9E1B9B43926444C7B3ACF649D1F30\r\nPlease wait, downloading symbols from: https:\/\/symbols.mozilla.org\/try\/libxul.so\/8F9C9E1B9B43926444C7B3ACF649D1F30\/libxul.so.sym\r\nPlease wait, uncompressing symbols...\r\nPlease wait, processing symbols...\r\nProceeding to emulation.\r\nResult of emulation: bytearray(b'schema: ')\r\nThis is a BAD build.<\/code><\/pre>\n<p>The relevant section of the code boils down to this:<\/p>\n<div id=\"cb8\" class=\"sourceCode\">\n<pre class=\"sourceCode asm\"><code class=\"sourceCode fasm\"><span id=\"cb8-1\">ldr   r3<span class=\"op\">,<\/span> <span class=\"op\">[<\/span>pc<span class=\"op\">,<\/span> <span class=\"op\">#<\/span><span class=\"bn\">0x20c<\/span><span class=\"op\">]<\/span><\/span>\r\n<span id=\"cb8-2\"><span class=\"bu\">add<\/span>   r3<span class=\"op\">,<\/span> pc<\/span>\r\n<span id=\"cb8-3\">strd  r3<span class=\"op\">,<\/span> r0<span class=\"op\">,<\/span> <span class=\"op\">[<\/span><span class=\"kw\">sp<\/span><span class=\"op\">,<\/span> <span class=\"op\">#<\/span><span class=\"bn\">0xd0<\/span><span class=\"op\">]<\/span><\/span>\r\n<span id=\"cb8-4\"><span class=\"bu\">add<\/span>   r1<span class=\"op\">,<\/span> <span class=\"kw\">sp<\/span><span class=\"op\">,<\/span> <span class=\"op\">#<\/span><span class=\"bn\">0xd0<\/span><\/span>\r\n<span id=\"cb8-5\">bl    alloc<span class=\"op\">::<\/span>fmt<span class=\"op\">::<\/span>format_inner<\/span><\/code><\/pre>\n<\/div>\n<blockquote><p>The first two instructions build the pointer to the slice in r3, by using a pc-relative offset found in a nearby constant. Then we store that pointer at <code>sp+0xd0<\/code>, and we put the address <code>sp+0xd0<\/code> into <code>r1<\/code>. So before we reach <code>alloc::fmt::format_inner<\/code>, <code>r1<\/code> points to a stack location that contains a pointer to the slice of interest. The slice lives in <code>.data.rel.ro<\/code> and contains a pointer to the string, and the length of the string (8). The string itself lives in <code>.rodata<\/code>.<\/p><\/blockquote>\n<p>In good builds the <code>.rodata<\/code> <code>r3<\/code> points to looks like this:<\/p>\n<pre><code>0x06f0c3d4: 0x005dac18  --&gt;  \"labeled_\"\r\n0x06f0c3d8:        0x8\r\n0x06f0c3dc: 0x0185d707  --&gt;  \"\/builds\/&lt;snip&gt;\/rust\/glean-core\/src\/storage\/mod.rs\"\r\n0x06f0c3e0:       0x4d<\/code><\/pre>\n<p>In bad builds however it points to something that has our dreaded <code>schema: <\/code> string:<\/p>\n<pre><code>0x06d651c8: 0x010aa2e8  --&gt;  \"schema: \"\r\n0x06d651cc:        0x8\r\n0x06d651d0: 0x01a869a7  --&gt;  \"maintenance: \"\r\n0x06d651d4:        0xd\r\n0x06d651d8: 0x01a869b4  --&gt;  \"storage dir: \"\r\n0x06d651dc:        0xd\r\n0x06d651e0: 0x01a869c8  --&gt;  \"from variant of type \"\r\n0x06d651e4:       0x15\r\n0x06d651e8: 0x017f793c  --&gt;  \": \"\r\n0x06d651ec:        0x2<\/code><\/pre>\n<p>This confirms the suspicion that it&#8217;s a compiler\/linker bug. Now the question was how to fix that.<\/p>\n<p>Firefox builds with a variety of Clang\/LLVM versions. Mozilla uses its own build of LLVM and Clang to build the final applications, the exact version used is <a href=\"https:\/\/firefox-source-docs.mozilla.org\/build\/buildsystem\/toolchains-update-policy.html#clang\">updated as soon as possible, but never on release<\/a>. Sometimes additional patches are applied on top of the Clang release, like some backports fixing other compiler bugs.<\/p>\n<p>After identifying that this is indeed a bug in the linker and that it has already been patched in later LLVM versions, Serge did all the work to bisect the LLVM release to find which patches to apply to Mozilla&#8217;s own Clang build. Ultimately he tracked it down to these two patches for LLVM:<\/p>\n<ul>\n<li><a href=\"https:\/\/github.com\/llvm\/llvm-project\/pull\/151346\">[InstCombine] Don&#8217;t handle non-canonical index type in icmp of load fold<\/a><\/li>\n<li><a href=\"https:\/\/github.com\/llvm\/llvm-project\/pull\/150639\">[InstCombine] Make foldCmpLoadFromIndexedGlobal more resiliant to non-array geps.<\/a><\/li>\n<\/ul>\n<p>With those patches applied, the old code, without our small code rearrangement, does not lead to broken builds anymore.<\/p>\n<p>With the Glean code patched, the ingestion errors dropping and the certainty that we have identified and patched the compiler bug, we can safely ship the next release of Firefox (for Android).<\/p>\n<h2 id=\"collaboration\">Collaboration<\/h2>\n<p>Incidents are stressful situations, but a great place for collaboration across the whole company. The number of people involved in resolving this is long.<\/p>\n<p>Thanks to Eduardo &amp; Ben from Data Engineering for raising the issue.<br \/>\nThanks to Alessio (my manager) for managing the incident.<br \/>\nThanks to chutten and Travis (from my team) for brainstorming what caused this and suggesting solutions\/workarounds.<br \/>\nThanks to Donal (Release Management) for fast-tracking the mitigation into a Beta release.<br \/>\nThanks to Alex (Release Engineering) for some initial investigation into the linker bug.<br \/>\nThanks to Brad (Data Science) for handling the data analysis side.<br \/>\nThanks to Yannis and Serge (OS integration) for identifying, finding and patching the linker bug.<\/p>\n<hr \/>\n<p><em>Footnotes:<\/em><\/p>\n<section id=\"footnotes\" class=\"footnotes footnotes-end-of-document\" role=\"doc-endnotes\">\n<ol>\n<li id=\"fn1\">Memory corruption is never &#8220;simple&#8221;. But if it were memory corruption I would expect data to be broken worse or in other places too. Not just a string swap in a single place.<a class=\"footnote-back\" role=\"doc-backlink\" href=\"#fnref1\">\u21a9\ufe0e<\/a><\/li>\n<li id=\"fn2\">That improvement is not yet available to us. The application experiencing the issue was compiled using Rust 1.86.0.<a class=\"footnote-back\" role=\"doc-backlink\" href=\"#fnref2\">\u21a9\ufe0e<\/a><\/li>\n<li id=\"fn3\">Our checklist initially omitted architecture. <a href=\"https:\/\/github.com\/mozilla\/glean\/commit\/8693a13ff9057454984cc4cbff08a1ff712d87ff\">A mistake we since fixed<\/a>.<a class=\"footnote-back\" role=\"doc-backlink\" href=\"#fnref3\">\u21a9\ufe0e<\/a><\/li>\n<li id=\"fn4\">Apparently we do see <em>some<\/em> errors, but they are so infrequent that we can ignore them for now.<a class=\"footnote-back\" role=\"doc-backlink\" href=\"#fnref4\">\u21a9\ufe0e<\/a><\/li>\n<li id=\"fn5\">Later Yannis wrote a script that can identify broken builds purely much quicker, just by searching for the right string patterns.<a class=\"footnote-back\" role=\"doc-backlink\" href=\"#fnref5\">\u21a9\ufe0e<\/a><\/li>\n<\/ol>\n<\/section>\n","protected":false},"excerpt":{"rendered":"<p>It all started rather inconspicuous: The Data Engineering team filed a bug report about a sudden increase in schema errors at ingestion of telemetry data from Firefox for Android. At &hellip; <a class=\"go\" href=\"https:\/\/blog.mozilla.org\/data\/2025\/12\/09\/incident-report-a-compiler-bug-and-json\/\">Read more<\/a><\/p>\n","protected":false},"author":1756,"featured_media":422,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[315988],"tags":[],"coauthors":[448350],"_links":{"self":[{"href":"https:\/\/blog.mozilla.org\/data\/wp-json\/wp\/v2\/posts\/459"}],"collection":[{"href":"https:\/\/blog.mozilla.org\/data\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.mozilla.org\/data\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.mozilla.org\/data\/wp-json\/wp\/v2\/users\/1756"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.mozilla.org\/data\/wp-json\/wp\/v2\/comments?post=459"}],"version-history":[{"count":0,"href":"https:\/\/blog.mozilla.org\/data\/wp-json\/wp\/v2\/posts\/459\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/blog.mozilla.org\/data\/wp-json\/wp\/v2\/media\/422"}],"wp:attachment":[{"href":"https:\/\/blog.mozilla.org\/data\/wp-json\/wp\/v2\/media?parent=459"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.mozilla.org\/data\/wp-json\/wp\/v2\/categories?post=459"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.mozilla.org\/data\/wp-json\/wp\/v2\/tags?post=459"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/blog.mozilla.org\/data\/wp-json\/wp\/v2\/coauthors?post=459"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}