25
Apr 19

an unexpected benefit of standardizing on clang-cl

I wrote several months ago about our impending decision to switch to clang-cl on Windows.  In the intervening months, we did that, and we also dropped MSVC as a supported compiler.  (We still build on Linux with GCC, and will probably continue to do that for some time.)  One (extremely welcome) consequence of the switch to clang-cl has only become clear to me in the past couple of weeks: using assembly language across platforms is no longer painful.

First, a little bit of background: GCC (and Clang) support a feature called inline assembly, which enables you to write little snippets of assembly code directly in your C/C++ program.  The syntax is baroque, it’s incredibly easy to shoot yourself in the foot with it, and it’s incredibly useful for a variety of low-level things.  MSVC supports inline assembly as well, but only on x86, and with a completely different syntax than GCC.

OK, so maybe you want to put your code in a separate assembly file instead.  The complementary assembler for GCC (courtesy of binutils) is called gas, with its own specific syntax for various low-level details.  If you give gcc an assembly file, it knows to pass it directly to gas, and will even run the C preprocessor on the assembly before invoking gas if you request that.  So you only ever need to invoke gcc to compile everything, and the right thing will just happen. MSVC, by contrast, requires you to invoke a separate, differently-named assembler for each architecture, with different assembly language syntaxes (e.g. directives for the x86-64 assembler are quite different than the arm64 assembler), and preprocessing files beforehand requires you to jump through hoops.  (To be fair, a number of these details are handled for you if you’re building from inside Visual Studio; the differences are only annoying to handle in cross-platform build systems.)

In short, dealing with assembler in a world where you have to support MSVC is somewhat painful.  You have to copy-and-paste code, or maybe you write Perl scripts to translate from the gas syntax to whatever flavor of syntax the Microsoft assembler you’re using is.  Your build system needs to handle Windows and non-Windows differently for assembly files, and may even need to handle different architectures for Windows differently.  Things like our ICU data generation have been made somewhat more complex than necessary to support Windows platforms.

Enter clang-cl.  Since clang-cl is just clang under the hood, it handles being passed assembly files on the command line in the same way and will even preprocess them for you.  Additionally, clang-cl contains a gas-compatible assembly syntax parser, so assembly files that you pass on the command line are parsed by clang-cl and therefore you can now write a single assembly syntax that works on Unix-y and Windows platforms.  (You do, of course, have to handle differing platform calling conventions and the like, but that’s simplified by having a preprocessor available all the time.)  Finally, clang-cl supports GCC-style inline assembly, so you don’t even have to drop into separate assembly files if you don’t want to.

In short, clang-cl solves every problem that made assembly usage painful on Windows. Might we have a future world where open source projects that have to deal with any amount of assembly standardize on clang-cl for their Windows support, and declare MSVC unsupported?


28
Mar 19

a thousand and one quite modest ones

From The Reckoning, by David Halberstam:

Shaiken’s studies showed that the Japanese had made their great surge in the sixties and seventies, by which time the financial men had climbed to eminence within America’s industrial companies and had successfully subordinated the power of the manufacturing men. When the Japanese advantage in quality became obvious in the early eighties, it was fashionable among American managers to attribute it to the Japanese lead in robots, and it was true that Japanese were somewhat more robotized than the Americans. But in Shaiken’s opinion the Japanese success had come not from technology but from manufacturing skills. The Japanese had moved ahead of America when they were at a distinct disadvantage in technology. They had done it by slowly and systematically improving the process of the manufacturing in a thousand tiny increments. They had done it by being there, on the factory floor, as the Americans were not.

In that opinion Shaiken was joined by Don Lennox, the former Ford manufacturing man who had ended up at Harvester. Lennox had gone to Japan in the mid-seventies and been dazzled by what the Japanese had achieved in modernizing their factories. He was amazed not by the brilliance and originality of what they had done but by the practicality of it. Lennox’s visit had been an epiphany: He had suddenly envisioned the past twenty years in Japan, two decades of Japanese manufacturing engineers coming to work every day, busy, serious, being taken seriously by their superiors, being filled with the importance of the mission, improving the manufacturing in countless small ways. It was not that they had made one giant breakthrough, Lennox realized; they had made a thousand and one quite modest ones.


04
Jan 19

arm64 windows update #1

A month ago, we formally announced that we were working to bring Firefox to ARM64 Windows.  The last month has seen significant progress on our journey to that release.

The biggest news is that we have dogfoodable (auto-updating) Nightly builds available!  As that message states, these Nightlies are even nightlier than our normal Nightlies, as they have not gone through our normal testing processes. But Firefox is perfectly usable on ARM64 Windows in its present state, so if you have an ARM64 device, please give it a try and file any bugs you find!

Since that announcement, native stack unwinding has been implemented.  That in turn means the Gecko Profiler can now capture native (C++/Rust) stack frames, which is an important step towards making the Gecko Profiler functional.  We also enabled WebRTC support, even though WebRTC video not working on ARM64 Windows is a known issue.

We’re currently working on porting our top-tier JavaScript JIT (IonMonkey) to ARM64.  We’re also working on enabling the crashreporter, which is a pretty important feature for getting bug reports from the field!  From my low-level tools perspective, the most interesting bug discovered via dogfooding is a WebRender crash caused by obscure ARM64-specific parameter passing issues in Rust itself.

Ideally, I’ll be writing updates every two weeks or so.  If you see something I missed, or want to point out something that should be in the next update, please email me or come find me on IRC.


29
May 18

when an implementation monoculture might be the right thing

It’s looking increasingly likely that Firefox will, in the not-too-distant future, build with a single C++ compiler across the four major platforms we support.  I’m uneasy with this, but I think I’ve made my peace with it, partly as a result of writing the piece below.

Firefox currently builds with three major C++ compilers across four platforms: Microsoft’s Visual C++ compiler (MSVC), GCC, and Clang.  A fair amount of work has been done to deal with peculiar bugs in all three compilers: you can go search the source code and/or Bugzilla to find hacks that were needed for one reason or another.  A fair amount of work has also been stalled or shelved because one or two compilers don’t quite measure up in some required area (e.g. standards support).  As you might imagine, many a Firefox engineer has bemoaned the need for cross-compiler compatibility.

Cross-implementation compatibility is something that Mozilla expends a lot of effort on in a different context.  We have a Tech Evangelism bugzilla component for outreach to sites who use techniques that don’t translate across browsers.  When new sites appear that deliberately block Firefox (whether because the launch team took the time to test with Firefox and determine the user experience wouldn’t be acceptable, or because cross-browser compatibility was an explicit non-goal), Firefox engineers go find the performance cliffs and fix them.  Mozilla has a long-history of promoting the benefits of multiple implementations of the web platform; some of the old guard might remember “Works best in all browsers” campaigns and the like.  If you squint properly, you can even see this promotion in the manifesto (principles 2, 5, 6, 7, and 9, by my reckoning).

So as nice as a single implementation might be, dealing with multiple implementations was a fact of life in building an high quality open-source browser.  We dealt with it, because it seemed like we would always need to support MSVC; who would invest the time to create an open source, MSVC-compatible compiler?

Well, Google, mostly, and a host of other people, because the past several releases of Clang have included an MSVC-compatible frontend, clang-cl.  (Indeed, Firefox has been using clang-cl for Windows static analysis builds for some time.)  And now that we have a usable non-MSVC compiler on Windows, we can contemplate using an open-source compiler to create our release Windows builds.  And once we have that, we can consider using (and potentially only supporting) a single compiler (Clang) for all of the major platforms we support; Linux would be the remaining holdout.  (Chrome already ships on Windows with clang and requires clang everywhere, FWIW.)

We might continue to require that things build with MSVC and GCC on relevant platforms, even if we’re not shipping these builds; even if this happened, such builds seem unlikely to last for very long, for all the reasons that we wanted them dropped in the first place.  I imagine we’d probably continue to accept patches to make things build with non-Clang compilers, as long as the patches were not intrusive, just like we accept patches for non-tier 1 platforms.

Supporting a single compiler has a number of advantages:

  • Cross-language LTO (i.e. inlining) between Rust and C++ (we could, of course, do this today, but we wouldn’t get the win on all platforms);
  • Mozilla engineers can fix bugs in Clang/LLVM if need be;
  • Fixes can be more easily backported from the Clang/LLVM development tree;
  • Contributors have fewer compiler quirks to hold up their patches;
  • Integrating and/or upgrading local copies of upstream projects becomes easier;
  • Performance tuning becomes somewhat more straightforward when you have a single compiler to worry about.

I am probably forgetting some along the way.  (I don’t think it’s true that we’ll be able to entirely eliminate hacks to pacify the compiler; you push on C++ hard enough and long enough, and you find yourself doing all manner of unusual things.  We might even find ourselves doing more hacks, since we can justify it via, “Since we can/can’t rely on the compiler to do X…”)

I can see all the advantages.  I can even admire the sheer coolness of some of them; cross-language inlining sounds fantastic!  But the analogy between the Web situation and the C++ compiler situation makes me uneasy: we ask web developers to write cross-browser compatible websites, with all the time and energy that requires.  We tout the goodness of supporting multiple implementations of the web platform.  However, in the implementation of that web platform, we are in the process of deciding that the benefits of supporting a single C++ implementation are greater than whatever benefits (engineering, philosophical, etc.) might accrue from supporting multiple implementations.

To be explicit: we are making the exact style of decision that we ask web development teams not to make.

After having proposed this and thought about it for a while, I think the analogy is a bit strained.  We make the argument that websites should be cross-browser compatible because we support the freedom of users to access those sites with whatever browser they like.  Whereas Firefox engineering is the only “consumer” of the compiler(s), and so we should optimize for that single consumer.  Indeed, we don’t really concern ourselves with cross-engine compatibility for the JavaScript that lies behind our UI.  Firefox users (generally) don’t care too much what compiler gets used to build Firefox, and they’d probably support a switch to a compiler monoculture if that meant the browser got faster!

(I’m not completely at ease with calling the two situations dissimilar; it’d be all too easy for a website to say they only care about a single “user”, viz. users of $BROWSER, and dispense with cross-browser support.  I want to have a stronger argument for this case, but I don’t at the moment…)

At the end of the day, I think I’m mostly in support (0.6 on the Apache voting scale?).  I think it will be cool when it’s done, and I will probably wind up doing some work in support of the project.  But I can’t completely shake my uneasiness.  What do you think?


25
May 17

applying amazon’s lessons to mozilla (part 1)

Several days ago, somebody pointed me at Why Amazon is eating the world and the key idea has been rolling around in my head ever since:

[The reason that Amazon’s position is defensible is] that each piece of Amazon is being built with a service-oriented architecture, and Amazon is using that architecture to successively turn every single piece of the company into a separate platform — and thus opening each piece to outside competition.

The most obvious example of Amazon’s [service-oriented architecture] structure is Amazon Web Services (Steve Yegge wrote a great rant about the beginnings of this back in 2011). Because of the timing of Amazon’s unparalleled scaling — hypergrowth in the early 2000s, before enterprise-class SaaS was widely available — Amazon had to build their own technology infrastructure. The financial genius of turning this infrastructure into an external product (AWS) has been well-covered — the windfalls have been enormous, to the tune of a $14 billion annual run rate. But the revenue bonanza is a footnote compared to the overlooked organizational insight that Amazon discovered: By carving out an operational piece of the company as a platform, they could future-proof the company against inefficiency and technological stagnation.

…Amazon has replaced useless, time-intensive bureaucracy like internal surveys and audits with a feedback loop that generates cash when it works — and quickly identifies problems when it doesn’t. They say that money earned is a reasonable approximation of the value you’re creating for the world, and Amazon has figured out a way to measure its own value in dozens of previously invisible areas.

Open source is the analogue of this strategy into the world of software.  You have some small collection of code that you think would be useful to the wider world, so you host your own repository or post it on Github/Bitbucket/etc.  You make an announcement in a couple of different venues where you expect to find interested people.  People start using it, express appreciation for what you’ve done, and begin to generate ideas on how it could be made better, filing bug reports and sending you patches.  Ideally, all of this turns into a virtuous cycle of making your internal code better as well as providing a useful service to external contributors.  The point of the above article is that Amazon has applied an open-source-like strategy to its business relentlessly, and it’s paid off handsomely.

Google is probably the best (unintentional?) practitioner of this strategy, exporting countless packages of software, such as GTest, Go, and TensorFlow, not to mention tools like their collection of sanitizers. They also do software-related exports like their C++ style guide. Facebook opens up in-house-developed components with React, HHVM, and Buck, among others. Microsoft has been charging into this arena in the past couple of years, with examples like Visual Studio Code, TypeScript, and ChakraCore.  Apple doesn’t really play the open source game; their opensource site and available software is practically the definition of “throwing code over the wall”, even if having access to the source is useful in a lot of cases.  To the best of my knowledge, Amazon doesn’t really play in this space either.  I could also list examples of exported code from other smaller but still influential technology companies: Github, Dropbox, Twitter, and so forth, as well as companies that aren’t traditional technology companies, but have still invested in open-sourcing some of their software.

Whither Mozilla in the above list?  That is an excellent question.  I think in many cases, we haven’t tried, and in the Firefox-related cases where we tried, we decided (incorrectly, judging through the above lens) that the risks of the open source approach weren’t worth it.  Two recent cases where we have tried exporting software and succeeded wildly have been asm.js/WebAssembly and Rust, and it’d be worth considering how to translate those successes into Firefox-related ones.  I’d like to make a follow-up post exploring some of those ideas soon.

 

 


19
Apr 17

on customer service; or, how to treat bug reports

From United: Broken Culture, by Jean-Louis Gassée, writing on his time as the head of Apple France:

Over time, a customer service theorem emerged. When a customer brings a complaint, there are two tokens on the table: It’s Nothing and It’s Awful. Both tokens are always played, so whoever chooses first forces the other to grab the token that’s left. For example: Customer claims something’s wrong. I try to play down the damage: It’s Probably Nothing…are you sure you know what you’re doing? Customer, enraged at my lack of judgment and empathy, ups the ante: How are you boors still in business??

But if I take the other token first and commiserate with Customer’s complaint: This Is Awful! How could we have done something like this? Dear Customer is left with no choice, compelled to say Oh, it isn’t so bad…certainly not the end of the world..

It’s simple, it works…even in marriages, I’m told.

There’s no downside to taking the It’s Awful position. If, on further and calm investigation, the customer is revealed to be seriously wrong, you can always move to the playbook’s Upon Further Review page.


29
Mar 17

on mutex performance and WTF::Lock

One of the things I’ve been doing this quarter is removing Gecko’s dependence on NSPR locks.  Gecko’s (non-recursive) mutexes and condition variables now use platform-specific constructs, rather than indirecting through NSPR.  This change makes things smaller, especially on POSIX platforms, and uses no dynamic memory allocation, so there are fewer untested failure paths.  I haven’t rigorously benchmarked things yet, but I suspect various operations are faster, too.

As I’ve done this, I’ve fielded questions about why we’re not using something like WTF::Lock or the Rust equivalent in parking_lot.  My response has always been some variant of the following: the benchmarks for the WTF::Lock blog post were conducted on OS X.  We have anecdotal evidence that mutex overhead can be quite high on OS X, and that changing locking strategies on OS X can be beneficial.  The blog post also says things like:

One advantage of OS mutexes is that they guarantee fairness: All threads waiting for a lock form a queue, and, when the lock is released, the thread at the head of the queue acquires it. It’s 100% deterministic. While this kind of behavior makes mutexes easier to reason about, it reduces throughput because it prevents a thread from reacquiring a mutex it just released.

This is certainly true for mutexes on OS X, as the measurements in the blog post show.  But fairness is not guaranteed for all OS mutexes; in fact, fairness isn’t even guaranteed in the pthreads standard (which OS X mutexes follow).  Fairness in OS X mutexes is an implementation detail.

These observations are not intended to denigrate the WTF::Lock work: the blog post and the work it describes are excellent.  But it’s not at all clear that the conclusions reached in that post necessarily carry over to other operating systems.

As a partial demonstration of the non-cross-platform applicability of some of the conclusions, I ported WebKit’s lock fairness benchmark to use raw pthreads constructs; the code is available on GitHub.  The benchmark sets up a number of threads that are all contending for a single lock.  The number of lock acquisitions for each thread over a given period of time is then counted.  While both of these qualities are configurable via command-line parameters in WebKit’s benchmark, they are fixed at 10 threads and 100ms in mine, mostly because I was lazy. The output I get on my Mac mini running OS X 10.10.5 is as follows:

1509
1509
1509
1509
1509
1509
1509
1508
1508
1508

Each line indicates the number of lock acquisitions performed by a given thread.  Notice the nearly-identical output for all the threads; this result follows from the fairness of OS X’s mutexes.

The output I get on my Linux box is quite different (aside from each thread performing significantly more lock acquisitions because of differences in processor speed, etc.):

108226
99025
103122
105539
101885
104715
104608
105590
103170
105476

The counts vary significantly between threads: Linux mutexes are not fair by default–and that’s perfectly OK.

What’s more, the developers of OS X have recognized this and added a way to make their mutexes non-fair.  In <pthread_spis.h>, there’s a OS X-only function, pthread_mutexattr_setpolicy_np.  (pthread mutex attributes control various qualities of pthread mutexes: normal, recursively acquirable, etc.  This particular function, supported since OS X 10.7, enables setting the fairness policy of mutexes to either _PTHREAD_MUTEX_POLICY_FAIRSHARE (the default) or _PTHREAD_MUTEX_POLICY_FIRSTFIT.  The firstfit policy is not documented anywhere, but I’m guessing that it’s something akin to the “barging” locks described in the WTF::Lock blog post: the lock is made available to whatever thread happens to get to it first, rather than enforcing a queue to acquire the lock.  (If you’re curious, the code dealing with firstfit policy can be found in Apple’s pthread_mutex.c.)

Running the benchmark on OS X with mutexes configured with the firstfit policy yields quite different numbers:

14627
13239
13503
13720
13989
13449
13943
14376
13927
14142

The variation in these numbers are more akin to what we saw with the non-fair locks on Linux, and what’s more, they’re almost an order of magnitude higher than the fair locks. Maybe we should start using firstfit locks in Gecko!  I don’t know how firstfit policy locks compare to something like WTF::Lock on my Mac mini, but it’s clear that saying simply “OS mutexes are slow” doesn’t tell the whole story. And of course there are other concerns, such as the size required by locks, that motivated the WTF::Lock work.

I have vague plans of doing more benchmarking, especially on Windows, where we may want to use slim reader/writer locks rather than critical sections, and evaluating Rust’s parking_lot on more platforms.  Pull requests welcome.


29
Nov 16

accessibility tools for everyone

From The Man Who Is Transforming Microsoft:

[Satya Nadella] moves to another group of kids and then shifts his attention to a teenage student who is blind. The young woman has been working on building accessibility features using Cortana, Microsoft’s speech-activated digital assistant. She smiles and recites the menu options: “Hey Cortana. My essentials.” Despite his transatlantic jet lag Nadella is transfixed. “That’s awesome,” he says. “It’s fantastic to see you pushing the boundaries of what can be done.” He thanks her and turns toward the next group.

“I have a particular passion around accessibility, and this is something I spend quite a bit of cycles on,” Nadella tells me later. He has two daughters and a son; the son has special needs. “What she was showing me is essentially how she’s building out as a developer the tools that she can use in her everyday life to be productive. One thing is certain in life: All of us will need accessibility tools at some point.”


15
Nov 16

efficiently passing the buck with needinfo requests

A while back, Bugzilla added this great tool called needinfo requests: you set a flag on the bug indicating that a particular person’s input is desired. X will then get something dropped into their requests page and a separate email notifying them of the needinfo request. Then, when X responds, clearing the needinfo request, you get an email notifying you that the request has been dealt with. This mechanism works much better than merely saying “X, what do you think?” in a bug comment and expecting that X will see the comment in their bugmail and respond.

My needinfo-related mail, along with all review-related mail, gets filtered into a separate folder in my email client.  It is then very obvious when I get needinfo requests, or needinfo requests that I have made have been answered.

Occasionally, however, when you get a needinfo, you will not be the correct person to answer the question, and you will need to needinfo someone else who has the appropriate knowledge…or is at least one step closer to providing the appropriate knowledge.

There is a right way and a wrong way to accomplish this. The wrong way is to clear your own needinfo request and request needinfo from someone else:

wrong-way

Why is this bad? Because the original requester will receive a notification that request has been dealt with appropriately, when it has not! So now they have to remember to watch the bug, or poll their bugmail, or similar to figure out when their request has been dealt with.  Additionally, you’ll get an email notification when your needinfo request has been answered, which you don’t necessarily want.

The right way (which I just discovered this week) is to uncheck the “Clear the needinfo request” box, which turns the second checkbox into a “Redirect my needinfo request”:

right-way

This method appropriately redirects the needinfo without notifying the original requester, and the original requester will (ideally) now receive a notification only when the request has been dealt with.


29
Jul 16

a git pre-commit hook for tooltool manifest checking

I’ve recently been uploading packages to tooltool for my work on Rust-in-Gecko and Android toolchains. The steps I usually follow are:

  1. Put together tarball of files.
  2. Call tooltool.py from build-tooltool to create a tooltool manifest.
  3. Upload files to tooltool with said manifest.
  4. Copy bits from said manifest into one of the manifest files automation uses.
  5. Do try push with new manifest.
  6. Admire my completely green try push.

That would be the ideal, anyway.  What usually happens at step 4 is that I forget a comma, or I forget a field in the manifest, and so step 5 winds up going awry, and I end up taking several times as long as I would have liked.

After running into this again today, I decided to implement some minimal validation for automation manifests.  I use a fork of gecko-dev for development, as I prefer Git to Mercurial. Git supports running programs when certain things occur; these programs are known as hooks and are usually implemented as shell scripts. The hook I’m interested in is the pre-commit hook, which is looked for at .git/hooks/pre-commit in any git repository. Repositories come with a sample hook for every hook supported by Git, so I started with:

cp .git/hooks/pre-commit.sample .git/hooks/pre-commit

The sample pre-commit hook checks trailing whitespace in files, which I sometimes leave around, especially when I’m editing Python, and can check for non-ASCII filenames being added.  I then added the following lines to that file:

if git diff --cached --name-only | grep -q releng.manifest; then
    for f in $(git diff --cached --name-only | grep releng.manifest); do
	if ! python -<<EOF
import json
import sys
try:
    with open("$f", 'r') as f:
        json.loads(f.read())
    sys.exit(0)
except:
    sys.exit(1)
EOF
	    then
	    echo $f is not valid JSON
	    exit 1
	fi
     done
fi

In prose, we’re checking to see if the current commit has any releng.manifest files being changed in any way. If so, then we’ll try parsing each of those files as JSON, and throwing an error if one doesn’t parse.

There are several ways this check could be more robust:

  • The check will error if a commit is removing a releng.manifest, because that file won’t exist for the script to check;
  • The check could ensure that the unpack field is set for all files, as the manifest file used for the upload in step 3, above, doesn’t include that field: it needs to be added manually.
  • The check could ensure that all of the digest fields are the correct length for the specified digest in use.
  • …and so on.

So far, though, simple syntax errors are the greatest source of pain for me, so that’s what’s getting checked for.  (Mismatched sizes have also been an issue, but I’m unsure of how to check that…)

What pre-commit hooks have you found useful in your own projects?