{"id":239,"date":"2020-09-25T15:43:00","date_gmt":"2020-09-25T15:43:00","guid":{"rendered":"https:\/\/blog.mozilla.org\/data\/?p=239"},"modified":"2020-09-25T15:43:00","modified_gmt":"2020-09-25T15:43:00","slug":"this-week-in-glean-glean-core-to-wasm-experiment","status":"publish","type":"post","link":"https:\/\/blog.mozilla.org\/data\/2020\/09\/25\/this-week-in-glean-glean-core-to-wasm-experiment\/","title":{"rendered":"This Week in Glean: glean-core to Wasm experiment"},"content":{"rendered":"<p>(\u201cThis Week in Glean\u201d is a series of blog posts that the Glean Team at Mozilla is using to try to communicate better about our work. They could be release notes, documentation, hopes, dreams, or whatever: so long as it is inspired by Glean.)<\/p>\n<p>All \u201cThis Week in Glean\u201d blog posts are listed in the<a href=\"https:\/\/mozilla.github.io\/glean\/book\/appendix\/twig.html\"> TWiG index<\/a>.<\/p>\n<p>&nbsp;<\/p>\n<hr \/>\n<p>&nbsp;<\/p>\n<p>In the past week <a href=\"https:\/\/github.com\/Dexterp37\">Alessio<\/a>, <a href=\"https:\/\/github.com\/mdboom\">Mike<\/a>, <a href=\"https:\/\/github.com\/hamilton\">Hamilton<\/a> and I got together for the Glean.js workweek. Our purpose was to build a proof-of-concept of a Glean SDK that works on Javascript environments. You can expect a TWiG in the next few weeks about the outcome of that. Today I am going to talk about something that I tried out in preparation for that week: attempting to compile glean-core to Wasm.<\/p>\n<p>&nbsp;<\/p>\n<h2><b>A quick primer<\/b><\/h2>\n<p>&nbsp;<\/p>\n<h3><b>glean-core<\/b><\/h3>\n<p>&nbsp;<\/p>\n<p>The <a href=\"https:\/\/github.com\/mozilla\/glean\">glean-core<\/a> is the heart of the Glean SDK where most of the logic and functionality of Glean lives. It is written in Rust and communicates with the language bindings in C#, Java, Swift or Python through an FFI layer. For a comprehensive overview of the Glean SDKs architecture, please refer to Jan-Erik\u2019s great <a href=\"https:\/\/blog.mozilla.org\/data\/2020\/09\/01\/twig-leveraging-rust\/\">blog post<\/a> and <a href=\"https:\/\/www.youtube.com\/watch?v=j5rczOF7pzg\">talk<\/a> on the subject.<\/p>\n<h3><b>wasm<\/b><\/h3>\n<p>&nbsp;<\/p>\n<p>From <a href=\"https:\/\/webassembly.org\/\">the WebAssembly website<\/a>:<\/p>\n<blockquote><p>\u201cWebAssembly (abbreviated <i>Wasm<\/i>) is a binary instruction format for a stack-based virtual machine. Wasm is designed as a portable compilation target for programming languages, enabling deployment on the web for client and server applications.\u201d<\/p><\/blockquote>\n<p>Or, from Lin Clark\u2019s <a href=\"https:\/\/hacks.mozilla.org\/2017\/02\/a-cartoon-intro-to-webassembly\/\">\u201cA cartoon intro to WebAssembly\u201d<\/a>:<\/p>\n<blockquote><p>\u201cWebAssembly is a way of taking code written in programming languages other than JavaScript and running that code in the browser.\u201d<\/p><\/blockquote>\n<p>&nbsp;<\/p>\n<h2><b>Why did I decide to do this?<\/b><\/h2>\n<p>&nbsp;<\/p>\n<p>On the Glean team we make an effort to move as much of the logic as possible to glean-core, so that we don\u2019t have too much code duplication on the language bindings and guarantee standardized behaviour throughout all platforms.<\/p>\n<p>Since that is the case, it was counterintuitive for me, that when we set out to build a version of Glean for the web, we wouldn\u2019t rely on the same glean-core as all our other language bindings. The hypothesis was: let\u2019s make JavaScript just another language binding, by making our Rust core compile to a target that runs on the browser.<\/p>\n<p>Rust is notorious for making an effort to have a great Rust to Wasm experience, and the <a href=\"https:\/\/github.com\/rustwasm\/team\">Rust and Webassembly working group<\/a> has built awesome tools that make boilerplate for such projects much leaner.<\/p>\n<p>&nbsp;<\/p>\n<h2><b>First try: compile glean-core \u201cas is\u201d to Wasm<\/b><\/h2>\n<p>&nbsp;<\/p>\n<p>Since this was my first try in doing anything Wasm, I started by following MDN\u2019s guide <a href=\"https:\/\/developer.mozilla.org\/en-US\/docs\/WebAssembly\/Rust_to_wasm\">\u201cCompiling from Rust to WebAssembly\u201d<\/a>, but instead of using their example \u201cHello, World!\u201d Rust project, I used <a href=\"https:\/\/github.com\/mozilla\/glean\/tree\/main\/glean-core\/src\">glean-core<\/a>.<\/p>\n<p>From that guide I learned about <a href=\"https:\/\/rustwasm.github.io\/docs\/wasm-pack\/\">wasm-pack<\/a>, a tool that deals with the complexities of compiling a Rust crate to Wasm and <a href=\"https:\/\/github.com\/rustwasm\/wasm-bindgen\">wasm-bindgen<\/a> a tool that exposes, among many other things, the <code>#[wasm_bindgen]<\/code> attribute which, when added to a function, will make that function accessible from Javascript.<\/p>\n<p>The first thing that was obvious, was that it would be much harder to try and compile glean-core directly to Wasm. Passing complex types to it has many limitations and I was not able to add the <code>#[wasm_bindgen]<\/code> attribute to <a href=\"https:\/\/doc.rust-lang.org\/reference\/types\/trait-object.html\">trait objects<\/a> or structs that contain trait objects or lifetime annotations. I needed a simpler API surface to make the connection between Rust and Javascript. Fortunately, I had that in hand: <code>glean-ffi<\/code>.<\/p>\n<p>Our FFI crate exposes functions that rely on a global Glean singleton and have relatively simple signatures. These functions are the ones accessed by our language bindings through a C FFI. Most of the Rust complex structures are hidden by this layer from the consumers.<\/p>\n<p>Perfect! I proceeded to add the <code>#[wasm_bindgen]<\/code> attribute to one of our entrypoint functions: <a href=\"https:\/\/docs.rs\/glean-ffi\/29.0.0\/glean_ffi\/fn.glean_initialize.html\">glean_initialize<\/a>. This uncovered a limitation I didn\u2019t know about: you can\u2019t add this attribute to functions that are <code><a href=\"https:\/\/doc.rust-lang.org\/book\/ch19-01-unsafe-rust.html\">unsafe<\/a><\/code>, which unfortunately this one is.<\/p>\n<p>My assumption that I would be able to just expose the API of <code>glean-ffi<\/code> to Javascript by compiling it to Wasm without making any changes to it was not holding up. I would have to go through some refactoring to make that work. But until now, I hadn\u2019t gotten to the actual compilation step, the error I was getting was a syntax error. I wanted to go through compilation and see if that completed before diving into any refactoring work. I just removed the\u00a0 <code>#[wasm_bindgen]<\/code> attribute for now and made a new attempt at compiling.<\/p>\n<p>Now I got a new error. Progress! If you clone <a href=\"https:\/\/github.com\/mozilla\/glean\">the Glean repository<\/a>, install <code>wasm-pack<\/code>, and run <code>wasm-pack build<\/code> inside the <code><a href=\"https:\/\/github.com\/mozilla\/glean\/tree\/main\/glean-core\/ffi\">glean-core\/ffi\/<\/a><\/code> folder right now, you are bound to get this same error and here is one important excerpt of it:<\/p>\n<pre>&lt;...&gt;\r\n\r\nfatal error: 'sys\/types.h' file not found\r\ncargo:warning=#include &lt;sys\/types.h&gt;\r\ncargo:warning= \u00a0 \u00a0 \u00a0\u00a0\u00a0 ^~~~~~~~~~~~~\r\ncargo:warning=1 error generated.\r\nexit code: 1\r\n\r\n--- stderr\r\n\r\nerror occurred: Command \"clang\" \"-Os\" \"-ffunction-sections\" \"-fdata-sections\" \"-fPIC\" \"--target=wasm32-unknown-unknown\" \"-Wall\" \"-Wextra\" \"-DMDB_IDL_LOGN=16\" \"-o\" \"&lt;...&gt;\/target\/wasm32-unknown-unknown\/release\/build\/lmdb-rkv-sys-5e7282bb8d9ba64e\/out\/mdb.o\" \"-c\" \"&lt;...&gt;\/.cargo\/registry\/src\/github.com-1ecc6299db9ec823\/lmdb-rkv-sys-0.11.0\/lmdb\/libraries\/liblmdb\/mdb.c\" with args \"clang\" did not execute successfully (status code exit code: 1)<\/pre>\n<p>One of glean-core\u2019s dependencies is <a href=\"https:\/\/github.com\/mozilla\/rkv\">rkv<\/a> a storage crate we use for persisting metrics before they are collected and sent in pings. This crate depends on <a href=\"https:\/\/en.wikipedia.org\/wiki\/Lightning_Memory-Mapped_Database\">LMDB<\/a> which is written in C, thus the <code>clang<\/code> error.<\/p>\n<p>I do not have extensive experience in writing C\/C++ programs, so this was not familiar to me. I figured out that the file this error points to as \u201cnot found\u201d, <code>&lt;sys\/types.h&gt;<\/code>, is a header file that should be part of libc. This compiles just fine when trying to compile for our usual targets, so I had a hunch that maybe I just didn\u2019t have the proper libc files for compiling to Wasm targets.<\/p>\n<p>Internet searching pointed me to <a href=\"https:\/\/github.com\/WebAssembly\/wasi-libc\">wasi-libc<\/a>, a libc for WebAssembly programs. Promising! With this, I retried compiling glean-ffi to Wasm.\u00a0 I just needed to run the build command with added flags:<\/p>\n<pre>CFLAGS=\"--sysroot=\/path\/to\/the\/newly\/built\/wasi-libc\/sysroot\" wasm-pack build<\/pre>\n<p>This didn\u2019t work immediately and the error messages told me to add some extra flags to the command, which I did without thinking much and the final command is:<\/p>\n<pre>CFLAGS=\"--sysroot=\/path\/to\/wasi-sdk\/clone\/share\/wasi-sysroot -D_WASI_EMULATED_MMAN -D_WASI_EMULATED_SIGNAL\" wasm-pack build<\/pre>\n<p>I would advise the reader now not to get too excited. This command still doesn\u2019t work. It will return yet another set of errors and warnings, mostly related to \u201cusage of undeclared identifiers\u201d or \u201cimplicit declaration of functions\u201d. Most of the identifiers that were erroing started with the <code>pthread_<\/code> prefix, which reminded me of something that I read on the <a href=\"https:\/\/github.com\/WebAssembly\/wasi-sdk\">wasi-sdk<\/a>, a toolkit for compiling C programs to WebAssembly that includes wasi-libc,\u00a0 README section:<\/p>\n<blockquote><p>\u201cSpecifically, WASI does not yet have an API for creating and managing threads yet, and WASI libc does not yet have pthread support\u201d.<\/p><\/blockquote>\n<p>That was it. I was done with trying to approach the problem of compiling glean-core to Wasm \u201cas is\u201d and I decided to try another way. I could try to abstract away our usage of rkv so that depending on it didn\u2019t block compilation to Wasm, but that is way too big a refactoring task that I considered it a blocker for this experiment.<\/p>\n<p>&nbsp;<\/p>\n<h2><b>Second try: take a part of glean-core and compile that to Wasm<\/b><\/h2>\n<p>&nbsp;<\/p>\n<p>After learning that it would require way too much refactoring of <code>glean-core<\/code> and <code>glean-ffi<\/code> to get them to compile to Wasm, I decided to try a different approach and just get a small self contained part of <code>glean-core<\/code> and compile that to Wasm.<\/p>\n<p>Earlier this year I had a small taste of trying to rewrite part of <code>glean-core<\/code> in Javascript for the distribution simulators that we added to <a href=\"https:\/\/mozilla.github.io\/glean\/book\/user\/metrics\/memory_distribution.html#simulator\">The Glean Book<\/a>. To make the simulators work I essentially had to reimplement histograms code and part of the distribution metrics code in Javascript.<\/p>\n<p>The histograms code is very self contained so it was a perfect candidate to try and single out for this experiment. I did just that and I was actually able to get it to not error fairly quickly as a standalone thing (you can check out the <a href=\"https:\/\/github.com\/brizental\/glean-wasm-experiment\/tree\/main\/src\/histogram\">histogram code<\/a> on the glean-to-wasm repo vs. the <a href=\"https:\/\/github.com\/mozilla\/glean\/tree\/main\/glean-core\/src\/histogram\">histogram code<\/a> on the Glean repo).<\/p>\n<p>After getting this to work I created three accumulation functions that would mimic how each one of the distribution metric types work. These functions would then be exposed to Javascript. The resulting API looks like this:<\/p>\n<pre>#[wasm_bindgen]\r\npub fn accumulate_samples_custom_distribution(\r\n    range_min: u32,\r\n    range_max: u32,\r\n    bucket_count: usize,\r\n    histogram_type: i32,\r\n    samples: Vec&lt;u64&gt;,\r\n) -&gt; String\r\n\r\n#[wasm_bindgen]\r\npub fn accumulate_samples_timing_distribution(\r\n    time_unit: i32,\r\n    samples: Vec&lt;u64&gt;\r\n) -&gt; String\r\n\r\n#[wasm_bindgen]\r\npub fn accumulate_samples_memory_distribution(\r\n    memory_unit: i32,\r\n    samples: Vec&lt;u64&gt;\r\n) -&gt; String\r\n\r\n<\/pre>\n<p>Each one of these functions creates a histogram, accumulates the given samples to this histogram and returns the resulting histogram as a JSON encoded string. I tried getting them to return <code>HashMap&lt;u64,u64&gt;<\/code> at first, but that is not supported.<\/p>\n<p>For this I was still following MDN\u2019s guide <a href=\"https:\/\/developer.mozilla.org\/en-US\/docs\/WebAssembly\/Rust_to_wasm\">\u201cCompiling from Rust to WebAssembly\u201d<\/a>, which I can\u2019t recommend enough, and after I got my Rust code to compile to Wasm it was fairly straightforward to call the functions imported from the Wasm module inside my Javascript code.<\/p>\n<p>Here is a little taste of what that looked like:<\/p>\n<div>\n<pre>import(\"glean-wasm\").then(Glean =&gt; {\r\n    const data = JSON.parse(\r\n        Glean.accumulate_samples_memory_distribution(\r\n            unit, \/\/ A Number value between 0 - 3\r\n            values \/\/ A BigUint64Array with the sample values\r\n        )\r\n    )\r\n    \/\/ &lt;Do something with <code>data<\/code>&gt;\r\n})<\/pre>\n<div><\/div>\n<\/div>\n<p>The only hiccup I ran into was that I needed to change my code to use the <a href=\"https:\/\/developer.mozilla.org\/en-US\/docs\/Web\/JavaScript\/Reference\/Global_Objects\/BigInt\">BigInt<\/a> number type instead of the default <a href=\"https:\/\/developer.mozilla.org\/en-US\/docs\/Web\/JavaScript\/Reference\/Global_Objects\/Number\">Number<\/a> type from Javascript. That is necessary because, in Rust, my functions expect a u64 and BigInt is the type that maps to that from Javascript.<\/p>\n<p>This code can be checked out at: <a href=\"https:\/\/github.com\/brizental\/glean-wasm-experiment\">https:\/\/github.com\/brizental\/glean-wasm-experiment<\/a><\/p>\n<p>And there is a demo of it working in: <a href=\"https:\/\/glean-wasm.herokuapp.com\/\">https:\/\/glean-wasm.herokuapp.com\/<\/a><\/p>\n<p>&nbsp;<\/p>\n<h2><b>Final considerations<\/b><\/h2>\n<p>&nbsp;<\/p>\n<p>This was a very fun experiment, but does it validate my initial hypothesis:<\/p>\n<blockquote><p>Should we compile glean-core to Wasm and have Javascript be just another language binding?<\/p><\/blockquote>\n<p>We definitely can do that. Even though my first try was not concluded, if we abstract away all the dependencies that we have that can\u2019t be compiled to Wasm, refactor the unsafe functions out and all other possible roadblocks that we find other than these, we can do it. The effort that would take though, I believe is not worth it. It would take us much less time to rewrite glean-core\u2019s code in Javascript. Spoiler alert for our upcoming TWiG about the Glean.js workweek, but in just a week we were able to get a functioning prototype of that.<\/p>\n<p>Our requirements for a Glean software for the web are different from our requirements for a native version of Glean. Different enough that the burden of maintenance for two versions of glean-core, one in Rust and another in Javascript, is probably smaller than the amount of work and hacks it would take to build a single version that attends both platforms.<\/p>\n<p>Another issue is compatibility, Wasm <a href=\"https:\/\/caniuse.com\/?search=wasm\">is very well supported<\/a> but there are environments that still don\u2019t have support for it. It would be suboptimal if we went through the trouble of changing glean-core for it to compile to Wasm and then still had to make a Javascript only version for compatibility reasons.<\/p>\n<p>My conclusion is that although we <b>can<\/b> compile glean-core to Wasm, it doesn\u2019t mean that we <b>should<\/b> do that. The advantages of having a single source of truth for the Glean SDK are very enticing, but at the moment it would be more practical to rewrite something specific for the web.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>(\u201cThis Week in Glean\u201d is a series of blog posts that the Glean Team at Mozilla is using to try to communicate better about our work. They could be release &hellip; <a class=\"go\" href=\"https:\/\/blog.mozilla.org\/data\/2020\/09\/25\/this-week-in-glean-glean-core-to-wasm-experiment\/\">Read more<\/a><\/p>\n","protected":false},"author":1754,"featured_media":197,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[448297],"tags":[448312,283010,282895],"coauthors":[],"_links":{"self":[{"href":"https:\/\/blog.mozilla.org\/data\/wp-json\/wp\/v2\/posts\/239"}],"collection":[{"href":"https:\/\/blog.mozilla.org\/data\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.mozilla.org\/data\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.mozilla.org\/data\/wp-json\/wp\/v2\/users\/1754"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.mozilla.org\/data\/wp-json\/wp\/v2\/comments?post=239"}],"version-history":[{"count":0,"href":"https:\/\/blog.mozilla.org\/data\/wp-json\/wp\/v2\/posts\/239\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/blog.mozilla.org\/data\/wp-json\/wp\/v2\/media\/197"}],"wp:attachment":[{"href":"https:\/\/blog.mozilla.org\/data\/wp-json\/wp\/v2\/media?parent=239"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.mozilla.org\/data\/wp-json\/wp\/v2\/categories?post=239"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.mozilla.org\/data\/wp-json\/wp\/v2\/tags?post=239"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/blog.mozilla.org\/data\/wp-json\/wp\/v2\/coauthors?post=239"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}