25
May 17

applying amazon’s lessons to mozilla (part 1)

Several days ago, somebody pointed me at Why Amazon is eating the world and the key idea has been rolling around in my head ever since:

[The reason that Amazon’s position is defensible is] that each piece of Amazon is being built with a service-oriented architecture, and Amazon is using that architecture to successively turn every single piece of the company into a separate platform — and thus opening each piece to outside competition.

The most obvious example of Amazon’s [service-oriented architecture] structure is Amazon Web Services (Steve Yegge wrote a great rant about the beginnings of this back in 2011). Because of the timing of Amazon’s unparalleled scaling — hypergrowth in the early 2000s, before enterprise-class SaaS was widely available — Amazon had to build their own technology infrastructure. The financial genius of turning this infrastructure into an external product (AWS) has been well-covered — the windfalls have been enormous, to the tune of a $14 billion annual run rate. But the revenue bonanza is a footnote compared to the overlooked organizational insight that Amazon discovered: By carving out an operational piece of the company as a platform, they could future-proof the company against inefficiency and technological stagnation.

…Amazon has replaced useless, time-intensive bureaucracy like internal surveys and audits with a feedback loop that generates cash when it works — and quickly identifies problems when it doesn’t. They say that money earned is a reasonable approximation of the value you’re creating for the world, and Amazon has figured out a way to measure its own value in dozens of previously invisible areas.

Open source is the analogue of this strategy into the world of software.  You have some small collection of code that you think would be useful to the wider world, so you host your own repository or post it on Github/Bitbucket/etc.  You make an announcement in a couple of different venues where you expect to find interested people.  People start using it, express appreciation for what you’ve done, and begin to generate ideas on how it could be made better, filing bug reports and sending you patches.  Ideally, all of this turns into a virtuous cycle of making your internal code better as well as providing a useful service to external contributors.  The point of the above article is that Amazon has applied an open-source-like strategy to its business relentlessly, and it’s paid off handsomely.

Google is probably the best (unintentional?) practitioner of this strategy, exporting countless packages of software, such as GTest, Go, and TensorFlow, not to mention tools like their collection of sanitizers. They also do software-related exports like their C++ style guide. Facebook opens up in-house-developed components with React, HHVM, and Buck, among others. Microsoft has been charging into this arena in the past couple of years, with examples like Visual Studio Code, TypeScript, and ChakraCore.  Apple doesn’t really play the open source game; their opensource site and available software is practically the definition of “throwing code over the wall”, even if having access to the source is useful in a lot of cases.  To the best of my knowledge, Amazon doesn’t really play in this space either.  I could also list examples of exported code from other smaller but still influential technology companies: Github, Dropbox, Twitter, and so forth, as well as companies that aren’t traditional technology companies, but have still invested in open-sourcing some of their software.

Whither Mozilla in the above list?  That is an excellent question.  I think in many cases, we haven’t tried, and in the Firefox-related cases where we tried, we decided (incorrectly, judging through the above lens) that the risks of the open source approach weren’t worth it.  Two recent cases where we have tried exporting software and succeeded wildly have been asm.js/WebAssembly and Rust, and it’d be worth considering how to translate those successes into Firefox-related ones.  I’d like to make a follow-up post exploring some of those ideas soon.

 

 


19
Apr 17

on customer service; or, how to treat bug reports

From United: Broken Culture, by Jean-Louis Gassée, writing on his time as the head of Apple France:

Over time, a customer service theorem emerged. When a customer brings a complaint, there are two tokens on the table: It’s Nothing and It’s Awful. Both tokens are always played, so whoever chooses first forces the other to grab the token that’s left. For example: Customer claims something’s wrong. I try to play down the damage: It’s Probably Nothing…are you sure you know what you’re doing? Customer, enraged at my lack of judgment and empathy, ups the ante: How are you boors still in business??

But if I take the other token first and commiserate with Customer’s complaint: This Is Awful! How could we have done something like this? Dear Customer is left with no choice, compelled to say Oh, it isn’t so bad…certainly not the end of the world..

It’s simple, it works…even in marriages, I’m told.

There’s no downside to taking the It’s Awful position. If, on further and calm investigation, the customer is revealed to be seriously wrong, you can always move to the playbook’s Upon Further Review page.


29
Mar 17

on mutex performance and WTF::Lock

One of the things I’ve been doing this quarter is removing Gecko’s dependence on NSPR locks.  Gecko’s (non-recursive) mutexes and condition variables now use platform-specific constructs, rather than indirecting through NSPR.  This change makes things smaller, especially on POSIX platforms, and uses no dynamic memory allocation, so there are fewer untested failure paths.  I haven’t rigorously benchmarked things yet, but I suspect various operations are faster, too.

As I’ve done this, I’ve fielded questions about why we’re not using something like WTF::Lock or the Rust equivalent in parking_lot.  My response has always been some variant of the following: the benchmarks for the WTF::Lock blog post were conducted on OS X.  We have anecdotal evidence that mutex overhead can be quite high on OS X, and that changing locking strategies on OS X can be beneficial.  The blog post also says things like:

One advantage of OS mutexes is that they guarantee fairness: All threads waiting for a lock form a queue, and, when the lock is released, the thread at the head of the queue acquires it. It’s 100% deterministic. While this kind of behavior makes mutexes easier to reason about, it reduces throughput because it prevents a thread from reacquiring a mutex it just released.

This is certainly true for mutexes on OS X, as the measurements in the blog post show.  But fairness is not guaranteed for all OS mutexes; in fact, fairness isn’t even guaranteed in the pthreads standard (which OS X mutexes follow).  Fairness in OS X mutexes is an implementation detail.

These observations are not intended to denigrate the WTF::Lock work: the blog post and the work it describes are excellent.  But it’s not at all clear that the conclusions reached in that post necessarily carry over to other operating systems.

As a partial demonstration of the non-cross-platform applicability of some of the conclusions, I ported WebKit’s lock fairness benchmark to use raw pthreads constructs; the code is available on GitHub.  The benchmark sets up a number of threads that are all contending for a single lock.  The number of lock acquisitions for each thread over a given period of time is then counted.  While both of these qualities are configurable via command-line parameters in WebKit’s benchmark, they are fixed at 10 threads and 100ms in mine, mostly because I was lazy. The output I get on my Mac mini running OS X 10.10.5 is as follows:

1509
1509
1509
1509
1509
1509
1509
1508
1508
1508

Each line indicates the number of lock acquisitions performed by a given thread.  Notice the nearly-identical output for all the threads; this result follows from the fairness of OS X’s mutexes.

The output I get on my Linux box is quite different (aside from each thread performing significantly more lock acquisitions because of differences in processor speed, etc.):

108226
99025
103122
105539
101885
104715
104608
105590
103170
105476

The counts vary significantly between threads: Linux mutexes are not fair by default–and that’s perfectly OK.

What’s more, the developers of OS X have recognized this and added a way to make their mutexes non-fair.  In <pthread_spis.h>, there’s a OS X-only function, pthread_mutexattr_setpolicy_np.  (pthread mutex attributes control various qualities of pthread mutexes: normal, recursively acquirable, etc.  This particular function, supported since OS X 10.7, enables setting the fairness policy of mutexes to either _PTHREAD_MUTEX_POLICY_FAIRSHARE (the default) or _PTHREAD_MUTEX_POLICY_FIRSTFIT.  The firstfit policy is not documented anywhere, but I’m guessing that it’s something akin to the “barging” locks described in the WTF::Lock blog post: the lock is made available to whatever thread happens to get to it first, rather than enforcing a queue to acquire the lock.  (If you’re curious, the code dealing with firstfit policy can be found in Apple’s pthread_mutex.c.)

Running the benchmark on OS X with mutexes configured with the firstfit policy yields quite different numbers:

14627
13239
13503
13720
13989
13449
13943
14376
13927
14142

The variation in these numbers are more akin to what we saw with the non-fair locks on Linux, and what’s more, they’re almost an order of magnitude higher than the fair locks. Maybe we should start using firstfit locks in Gecko!  I don’t know how firstfit policy locks compare to something like WTF::Lock on my Mac mini, but it’s clear that saying simply “OS mutexes are slow” doesn’t tell the whole story. And of course there are other concerns, such as the size required by locks, that motivated the WTF::Lock work.

I have vague plans of doing more benchmarking, especially on Windows, where we may want to use slim reader/writer locks rather than critical sections, and evaluating Rust’s parking_lot on more platforms.  Pull requests welcome.


29
Nov 16

accessibility tools for everyone

From The Man Who Is Transforming Microsoft:

[Satya Nadella] moves to another group of kids and then shifts his attention to a teenage student who is blind. The young woman has been working on building accessibility features using Cortana, Microsoft’s speech-activated digital assistant. She smiles and recites the menu options: “Hey Cortana. My essentials.” Despite his transatlantic jet lag Nadella is transfixed. “That’s awesome,” he says. “It’s fantastic to see you pushing the boundaries of what can be done.” He thanks her and turns toward the next group.

“I have a particular passion around accessibility, and this is something I spend quite a bit of cycles on,” Nadella tells me later. He has two daughters and a son; the son has special needs. “What she was showing me is essentially how she’s building out as a developer the tools that she can use in her everyday life to be productive. One thing is certain in life: All of us will need accessibility tools at some point.”


15
Nov 16

efficiently passing the buck with needinfo requests

A while back, Bugzilla added this great tool called needinfo requests: you set a flag on the bug indicating that a particular person’s input is desired. X will then get something dropped into their requests page and a separate email notifying them of the needinfo request. Then, when X responds, clearing the needinfo request, you get an email notifying you that the request has been dealt with. This mechanism works much better than merely saying “X, what do you think?” in a bug comment and expecting that X will see the comment in their bugmail and respond.

My needinfo-related mail, along with all review-related mail, gets filtered into a separate folder in my email client.  It is then very obvious when I get needinfo requests, or needinfo requests that I have made have been answered.

Occasionally, however, when you get a needinfo, you will not be the correct person to answer the question, and you will need to needinfo someone else who has the appropriate knowledge…or is at least one step closer to providing the appropriate knowledge.

There is a right way and a wrong way to accomplish this. The wrong way is to clear your own needinfo request and request needinfo from someone else:

wrong-way

Why is this bad? Because the original requester will receive a notification that request has been dealt with appropriately, when it has not! So now they have to remember to watch the bug, or poll their bugmail, or similar to figure out when their request has been dealt with.  Additionally, you’ll get an email notification when your needinfo request has been answered, which you don’t necessarily want.

The right way (which I just discovered this week) is to uncheck the “Clear the needinfo request” box, which turns the second checkbox into a “Redirect my needinfo request”:

right-way

This method appropriately redirects the needinfo without notifying the original requester, and the original requester will (ideally) now receive a notification only when the request has been dealt with.


29
Jul 16

a git pre-commit hook for tooltool manifest checking

I’ve recently been uploading packages to tooltool for my work on Rust-in-Gecko and Android toolchains. The steps I usually follow are:

  1. Put together tarball of files.
  2. Call tooltool.py from build-tooltool to create a tooltool manifest.
  3. Upload files to tooltool with said manifest.
  4. Copy bits from said manifest into one of the manifest files automation uses.
  5. Do try push with new manifest.
  6. Admire my completely green try push.

That would be the ideal, anyway.  What usually happens at step 4 is that I forget a comma, or I forget a field in the manifest, and so step 5 winds up going awry, and I end up taking several times as long as I would have liked.

After running into this again today, I decided to implement some minimal validation for automation manifests.  I use a fork of gecko-dev for development, as I prefer Git to Mercurial. Git supports running programs when certain things occur; these programs are known as hooks and are usually implemented as shell scripts. The hook I’m interested in is the pre-commit hook, which is looked for at .git/hooks/pre-commit in any git repository. Repositories come with a sample hook for every hook supported by Git, so I started with:

cp .git/hooks/pre-commit.sample .git/hooks/pre-commit

The sample pre-commit hook checks trailing whitespace in files, which I sometimes leave around, especially when I’m editing Python, and can check for non-ASCII filenames being added.  I then added the following lines to that file:

if git diff --cached --name-only | grep -q releng.manifest; then
    for f in $(git diff --cached --name-only | grep releng.manifest); do
	if ! python -<<EOF
import json
import sys
try:
    with open("$f", 'r') as f:
        json.loads(f.read())
    sys.exit(0)
except:
    sys.exit(1)
EOF
	    then
	    echo $f is not valid JSON
	    exit 1
	fi
     done
fi

In prose, we’re checking to see if the current commit has any releng.manifest files being changed in any way. If so, then we’ll try parsing each of those files as JSON, and throwing an error if one doesn’t parse.

There are several ways this check could be more robust:

  • The check will error if a commit is removing a releng.manifest, because that file won’t exist for the script to check;
  • The check could ensure that the unpack field is set for all files, as the manifest file used for the upload in step 3, above, doesn’t include that field: it needs to be added manually.
  • The check could ensure that all of the digest fields are the correct length for the specified digest in use.
  • …and so on.

So far, though, simple syntax errors are the greatest source of pain for me, so that’s what’s getting checked for.  (Mismatched sizes have also been an issue, but I’m unsure of how to check that…)

What pre-commit hooks have you found useful in your own projects?


06
Jul 16

on the usefulness of computer books

I have a book, purchased during my undergraduate days, entitled Introduction to Algorithms. Said book contains a wealth of information about algorithms and data structures, has its own Wikipedia page, and even a snappy acronym people use (“CLRS”, for the first letters of its authors’ last names).

When I bought it, I expected it to be both an excellent textbook and a book I would refer to many times throughout my professional career.  I cannot remember whether it was a good textbook in the context of my classes, and I cannot remember the last time I opened it to find some algorithm or verify some subtle point.  Mostly, it has served two purposes: an excellent support for my monitor to position the monitor more closely to eye level, and as extra weight to move around when I have had to transfer my worldly possessions from place to place.

Whether this reflects on the sort of code I have worked on, or the rise of the Internet for answering questions, I am unsure.

I have another book, also purchased during my undergraduate days, entitled Programming with POSIX Threads.  Said book contains a wealth of information about POSIX threads (“pthreads”), is only mentioned in “Further Reading” on the Wikipedia page for POSIX threads, and has no snappy acronym associated with it.

I purchased this book because I thought I might assemble a library of programming knowledge, and of course threads would be a part of that.  Mostly, it would sit on the shelves to show people I was a Real Programmer(tm).

Instead, I have found it to be one of those books to always have close at hand, particularly working on Gecko.  Its explanations of the basic concepts of synchronization are clear and extensive, its examples of how to structure multithreaded algorithms are excellent, and its secondary coverage of “real-world” things such as memory ordering and signals + threads (short version: “don’t”) have been helpful when people have asked me for opinions or to review multi-threaded code.  When I have not followed the advice of this book, I have found myself in trouble later on.

My sense when searching for some of the same topics the book covers is that finding the same quality of coverage for those topics online is rather difficult, even taking into account that topics might be covered by disparate people.

If I had to trim my computer book library down significantly, I’m pretty sure I know what book I would choose.

What book have you found unexpectedly (un)helpful in your programming life?


31
May 16

why gecko data structures should be preferred to std:: ones

In light of the recent announcement that all of our Tier-1 platforms now have a C++11-supporting standard library, I received some questions about whether we should continue encouraging the use of Gecko-specific data structures. My answer was “yes”, and as I was writing the justification for said answer, I felt that the justification was worth broadcasting to a wider audience. Here are the reasons I came up with; feel free to agree or disagree in the comments.

  • Gecko’s data structures can be customized extensively for our purposes, whereas we don’t have the same control over the standard library.  Our string classes, for instance, permit sharing structure between strings (whether via something like nsDependentString or reference-counted string buffers); that functionality isn’t currently supported in the standard library.  While the default behavior on allocation failure in Gecko is to crash, our data structures provide interfaces for failing gracefully when allocations fail.  Allocation failures in standard library data structures are reported via exceptions, which we don’t use.  If you’re not using exceptions, allocation failures in those data structures simply crash, which isn’t acceptable in a number of places throughout Gecko.
  • Gecko data structures can assume things about the environment that the standard library can’t.  We ship the same memory allocator on all our platforms, so our hashtables and our arrays can attempt to make their allocation behavior line up with what the memory allocator efficiently supports.  It’s possible that the standard library implementations we’re using do things like this, but it’s not guaranteed by the standard.
  • Along similar lines as the first two, Gecko data structures provide better visibility for things like debug checks and memory reporting.  Some standard libraries we support come with built-in debug modes, but not all of them, and not all debug modes are equally complete. Where possible, we should have consistent support for these sorts of things across all our platforms.
  • Custom data structures may provide better behavior than standard data structures by relaxing the specifications provided by the standard.  The WebKit team had a great blog post on their new mutex implementation, which optimizes for cases that OS-provided mutexes aren’t optimized for, either because of compatibility constraints or because of outside specifications.  Chandler Carruth has a CppCon talk where he mentions the non-ideal interfaces in many of the standard library data structures.  We can do better with custom data structures.
  • Data structures in the standard library may provide inconsistent performance across platforms, or disagree on the finer points of the standard.  Love them or hate them, Gecko’s data structures at least provide consistent behavior everywhere.

Most of these arguments are not new; if you look at the documentation for Facebook’s open-source Folly library, for instance, you’ll find a number of these arguments, if not expressed in quite the same way.  Browsing through WebKit’s WTF library shows they have a number of the same things that we do in xpcom/ or mfbt/ as well, presumably for some of the same reasons.

All of this is not to say that our data structures are perfect: the APIs for our hashtables could use some improvements, our strings and nsTArray do a poor job of separating “data structure” from “algorithm”, nsDeque serves as an excellent excuse to go use the standard library instead, and XPCOM’s synchronization primitives should stop going through NSPR and use the underlying OS’s primitives directly (or simply be rewritten to use something like WebKit’s locking primitives, above).  This is a non-exhaustive list; I have more ideas if people are interested.

Having a C++11 standard library on all platforms brings opportunities to remove dead polyfills; MFBT contains a number of these (Atomics.h, Tuple.h, TypeTraits.h, UniquePtr.h, etc.)  But we shouldn’t flock to the standard library’s functionality just because it’s the standard.  If the standard library’s functionality doesn’t fit our use cases, we should definitely write our own replacement(s) and use them widely.


18
Apr 16

rr talk post-mortem

On Wednesday last week, I gave an invited talk on rr to a group of interested students and faculty at Rose-Hulman. The slides I used are available, though I doubt they make a lot of sense without the talk itself to go with them. Things I was pleased with:

  • I didn’t overrun my time limit, which was pretty satisfying.  I would have liked to have an hour (40 minutes talk/20 minutes for questions or overrun), but the slot was for a standard class period of 50 minutes.  I also wanted to leave some time for questions at the end, of which there were a few. Despite the talk being scheduled for the last class period of the day, it was well-attended.
  • The slides worked well  My slides are inspired by Lawrence Lessig’s style of presenting, which I also used for my lightning talk in Orlando.  It forces you to think about what you’re putting on each slide and make each slide count.  (I realize I didn’t use this for my Gecko onboarding presentation; I’m not sure if the Lessig method would work for things like that.  Maybe at the next onboarding…)
  • The level of sophistication was just about right, and I think the story approach to creating rr helped guide people through the presentation.  At least, it didn’t look as though many people were nodding off or completely confused, despite rr being a complex systems-heavy program.

Most of the above I credit to practicing the talk repeatedly.  I forget where I heard it, but a rule of thumb I use for presentations is 10 hours of prep time minimum (!) for every 1 hour of talk time.  The prep time always winds up helping: improving the material, refining the presentation, and boosting my confidence giving the presentation.  Despite all that practice, opportunities for improvement remain:

  • The talk could have used any amount of introduction on “here’s how debuggers work”.  This is kind of old hat to me, but I realized after the fact that to many students (perhaps even some faculty), blithely asserting that rr can start and stop threads at will, for instance, might seem mysterious.  A slide or two on the differences between how rr record works vs. how rr replay works and interacts with GDB would have been clarifying as well.
  • The above is an instance where a diagram or two might have been helpful.  I dislike putting diagrams in my talks because I dislike the thought of spending all that time to find a decent, simple app for drawing things, actually drawing them, and then exporting a non-awful version into a presentation.  It’s just a hurdle that I have to clear once, though, so I should just get over it.
  • Checkpointing and the actual mechanisms by which rr can run forwards or backwards in your program got short shrift and should have been explained in a little more detail.  (Diagrams again…)  Perhaps not surprisingly, the checkpointing material got added later during the talk prep and therefore didn’t get practiced as much.
  • The demo received very little practice (I’m sensing a theme here) and while it was able to show off a few of rr‘s capabilities, it wasn’t very polished or impressive.  Part of that is due to rr mysteriously deciding to cease working on my virtual machine, but part of that was just my own laziness and assuming things would work out just fine at the actual talk.  Always practice!

28
Jan 16

for-purpose instead of non-profit

I began talking with a guy in his midforties who ran an investment fund and told me about his latest capital raise. We hit it off while discussing the differences between start-ups on the East and West Coasts, and I enjoyed learning about how he evaluated new investment opportunities. Although I’d left that space a while ago, I still knew it well enough to carry a solid conversation and felt as if we were speaking the same language. Then he asked what I did.

“I run a nonprofit organization called Pencils of Promise.”

“Oh,” he replied, somewhat taken aback. “And you do that full-time?”

More than full-time, I thought, feeling a bit judged. “Yeah, I do. I used to work at Bain, but left to work on the organization full-time.”

“Wow, good for you,” he said in the same tone you’d use to address a small child, then immediately looked over my shoulder for someone new to approach…

On my subway ride home that night I began to reflect on the many times that this scenario had happened since I’d started Pencils of Promise. Conversations began on an equal footing, but the word nonprofit could stop a discussion in its tracks and strip our work of its value and true meaning. That one word could shift the conversational dynamic so that the other person was suddenly speaking down to me. As mad as I was at this guy, it suddenly hit me. I was to blame for his lackluster response. With one word, nonprofit, I had described my company as something that stood in stark opposition to the one metric that his company was being most evluated by. I had used a negative word, non, to detail our work when that inaccurately described what we did. Our primary driver was not the avoidance of profits, but the abundance of social impact…

That night I decided to start using a new phrase that more appropriately labeled the motivation behind our work. By changing the words you use to describe something, you can change how other perceive it. For too long we had allowed society to judge us with shackling expectations that weren’t supportive of scale. I knew that the only way to win the respect of our for-profit peers would be to wed our values and idealism to business acumen. Rather than thinking of ourselves as nonprofit, we would begin to refer to our work as for-purpose.

From The Promise of a Pencil by Adam Braun.