Mozilla has an unlawful harassment policy in place with respect to employment. I believe its conditions should be considered as criteria for participation in communities and communication channels that we, as an organization, own, operate, aggregate or re-publish under our name and trademarks. I’m surprised and saddened that there is any question over this.

This has nothing to do with separating “work” from “non-work” or “mozilla-related” from “personal”, or having a “political” difference of opinion. It has to do with denoting behavior that is not acceptable within a community. There’s no regulation against relating a funny story from outside of work, or showing your vacation pictures to a co-worker. There is a regulation against harassing someone over their sexual orientation (or race, religion, gender, and several other protected matters). That’s the line and you should not cross it.

It is not difficult to have a code of conduct. State it clearly and make sure everyone understands why you have it and enforce it when it is violated. Here is a popular example.

Rust 0.1 release

January 20th, 2012

Another six months and we have our first release. Rust is now at a point where we’d like to invite people to write code in it. It’s not set in stone — things will change, code will break — but we’re comfortable expanding the set of users a little beyond the set of compiler hackers. You can write interesting programs and libraries in it now; we’d like to see adventurous people try to do so, and see how it goes.

Since my last post, Rust gained some new features:

  • Expanded documentation, including a tutorial and library API reference.
  • Working stack growth, stack unwinding and destructors.
  • Improved safe reference analysis and argument-passing logic.
  • Extension of the closure system to handle capture clauses and uniqueness.
  • Maturation of the system of unique pointers and move semantics, to the point where spawning unique closures directly as subtasks works reliably.
  • An x86-64 port in addition to the x86 port, including multi-target compilation.
  • New tools, rustdoc and cargo, to document and manage rust packages.
  • A system of dynamic dispatch, constrained generics and static overloading using decoupled interfaces and implementations, similar to Haskell typeclasses.

Release downloads are available at the Rust website.

Rust progress

August 17th, 2011

It’s been a while since the last update here, and a lot has happened.

At the end of April we successfully bootstrapped, freeing ourselves to focus on rustc, the self-hosted compiler. Over the next few months the compiler was repeatedly rewritten and upgraded by our amazing crew of volunteers, interns and full-time employees.

In terms of “features”:

  • Brian Anderson (now full-time!) implemented a far more reasonable metadata system (with C#-style per-item attributes) and, based on that, a complete integrated unit-testing system.
  • Patrick Walton rewrote the type system, introduced interior vectors and unique pointers, tuned the compiler’s performance substantially (see numbers below) while replacing any parts that seemed unsalvageable, and implemented a new shape-based scheme for type glue.
  • Marijn Haverbeke overhauled our syntax, implemented a pretty printer, added support for many functional idioms (function literals, pattern matching with destructuring), made logging dynamically controllable, wrote an emacs mode and a new scheme for destructors.
  • Rafael Ávila de Espíndola implemented the beginnings of an ‘unsafe’ subdialect, and removed much of rustboot’s nonstandard system interfaces in favour of more normal style. Our compiler driver, linkage model, runtime startup code, memory model and ABI all work much more like “normal” C++ compilers now.
  • Rob Arnold implemented an async IO system based on libuv, and brought a lot of sanity and debugging machinery to our runtime system.
  • Erick Tryzelaar made a number of additional syntax overhauls and build system improvements.
  • Kelly Wilson contributed library code and LLVM interface and driver fixes.
  • Josh Matthews made our error messages show you what’s wrong.
  • Jesse Ruderman added a fuzzer.
  • Tim Chevalier wrote the typestate system. Completely!
  • Lindsey Kuper wrote most of the interesting parts of the object system (self-dispatch, extension and overriding).
  • Eric Holk rewrote our task-switching system to use standard ucontexts, wrote a new multithreaded scheduler for these tasks, and rewrote the communication system as a library, that works under the new task semantics.
  • Michael Sullivan implemented move semantics and environment-capturing closures of various sorts (including an argument-type-inferring one for “blocks”)
  • Paul Stansifer wrote a full, multi-grammar, safe and powerful macro system.
  • Everyone on this list did absolutely amazing amounts of cleanup, bug-fixing, documentation, testing, redesign, dead code elimination, and generally high-quality software engineering. I’m consistently humbled by the volume, thoroughness and precision of the work landing all summer.

More “numerically”:

  • We’ve seen nearly 3000 commits during this period.
  • The compiler doubled in lines-of-code.
  • Its binary image has shrunk to a quarter of its initial size.
  • Build time has improved by a factor of 40 or so since the initial bootstrap; now takes around a minute (depending on hardware) despite doubling of input.

We are now in the process of winding down summer internships and shifting focus to producing a somewhat more-stable “release” that can be used by a wider audience ,sometime in the fall or early winter.

Rust status

March 22nd, 2011

The past few months have involved a long, slow, feature-at-a-time climb toward bootstrapping; we’ve made over 1000 commits since the last update and produced some 20,000 additional lines of rust code. We can now compile the standard library with rustc and are just a few remaining pieces away from compiling rustc with itself. In the process we’ve developed a good set of ideas about where the language could use improvement and are excited to get going on actually evolving it. This summer should be exciting.

In the meantime the team has grown. Patrick Walton has returned from contributing to Firefox 4 to do rust full time, and we’ve been joined by new full-timers Rafael Espindola and Marijn Haverbeke, as well as our first two interns of the year, Lindsey Kuper and Tim Chevalier, and our remarkably effective volunteer Brian Anderson. Everyone’s producing code at a healthy rate now; the combined pace of development is amazing to watch.

Welcome everyone!

Rust progress

October 2nd, 2010

Three months ago we introduced the Rust programming language, a new lower-level language that Mozilla is developing. At that point the bootstrap compiler was just beginning to support interesting constructs, the runtime system only worked correctly in single-threaded mode, and library code was mostly nonexistent.

There has been a fair bit of work since since then:

  • We’ve made over 500 commits to the code.
  • The boostrap compiler and runtime have grown by 6000 lines.
  • The self-hosted compiler — the second version of the compiler, written in Rust and compiled with the bootstrap compiler — has grown by 3800 lines and is now capable of lexing, parsing, and translating some minimal programs (hello world, various expressions) through LLVM to executable code.
  • The standard library has grown by 1000 lines, including a small selection of nontrivial data structures and helpers.
  • We’ve discussed and agreed-on a number of careful shifts to the language semantics and pragmatics:
    • Compile-time constant value items.
    • A rewrite of the tag system in nominal terms.
    • Shifting to toolchain-supported symbols and self-encoded metadata rather than the existing heavy reliance on DWARF metadata.
    • Changes to the effect system and its classification of stateful values, as well as a system for “freezing” snapshots of mutable data (and thawing them later).
    • Reclassification of various syntactic forms from statements to expressions.

I had a hand in a lot of the above work, but several others have been getting involved as well, including interns, MoCo employees and community members:

  • Michael Bebenita overhauled much of the runtime system to work better with threads, including rewriting the lockless communication queues, implementing a proxy model for inter-thread references, and correcting a number of races and lifecycle bugs in the task model.
  • Roy Frostig fixed a large number of semantic translation bugs — particularly concerning vector manipulation and type descriptors — and implemented many important standard library features (containers, I/O functions, debugging helpers).
  • Patrick Walton rewrote the typechecker and fixed several bugs in the DWARF generator and Win32 object file generator, and also helped with the semantics of recursive types and multiply imported modules.
  • Or Brostovski implemented a number of missing language constructs and contributed several tests
  • Jeffrey Yasskin added code to the standard library, the LLVM bootstrap backend and its debug information, the tasking system, destructors and the OSX object file generator.
  • Peter Hull contributed translations of various benchmarks.

Of course there remains a very large amount of work ahead. I’m looking forward to watching the self-hosted compiler mature. The fact that it requires rewriting the existing front- and middle-ends means that we have a chance to adjust a number of subtle, systemic implementation choices with the benefit of hindsight; choices that would otherwise be a bit too costly to revisit in the bootstrap compiler, not obviously worth the effort. The self-hosted compiler also has the pleasant advantage of targeting LLVM alone, which frees us from spending much (or any) time on backend bugs. I expect progress on it to be much quicker.

Error analysis

January 14th, 2009

I also have something interesting to mention: I’ve been working with Taras and Dave‘s static analysis system, writing an error-code checking analysis.

The notion of my particular analysis is that you should be able to tell the compiler that some function like:

  bool tryToDoSomethingScary();

is the sort of function that uses its return value as a possible error code, where for example the return value of false should mean an error occurred, and someone ought to handle the error. If you do so — via a gcc attribute annotation like __attribute__((user("SETS_ERR"))), you can then run my analysis script on code that calls the error-code signaling function, and gcc will enforce a bunch of subtle-but-important rules:

  1. If you call the the function you can’t ignore its return value. Someone has to check it.
  2. If you call the function, and check the return code (as you must), all subsequent possible code paths after the call must satisfy a bunch of checks:
    1. On a path corresponding to a false return — indicating an error being signaled — you must either call a matching function annotated as “handling” the error in question or you must annotate the calling function to return the same type of error itself, and you must return false on the error path.
    2. On a path corresponding to a true return — indicating a lack of error — you must not call to clean up the error, since no error is pending.

This results in something similar to the model of checked exceptions in Java — the compiler won’t let you get away with ignoring, mis-handling or mis-propagating an error code — only using explicit error codes rather than try/throw/catch and exception specifiers.

I’m impressed that this flow sensitive feature can be implemented in 300 lines of javascript and bolted into a production C++ compiler with a week’s work (and a lot of hand-holding from the pros). Everyone who writes C++ should try.

Back from the grave

January 14th, 2009

Memetics

Oh fine. Seven facts of minimal significance, but probably less-well known:

  1. I’ve broken both my hands and both my feet. More recently I’ve had a tooth knocked out and my face torn open. Been broken a lot.
  2. I walked 1300km last year and biked 2500km. Including a single ride of 600km. Despite this, I look very low-key and not sporty at all. I just don’t know how to drive (no points about “cars I’ve owned”) so I generally transport myself “the long way”. I like it that way.
  3. I haven’t eaten meat since I was 13 and I’ve done sitting meditation off and on since about 16. In some practical respects I’m Buddhist, but probably not enough to call it a religious affiliation.
  4. I spent a solid portion of the 90s happily lost in rave culture and flirting with life on communes.
  5. In my childhood I too acquired the skills of figure skating (what the hell?) and, in fact, curling. But not hockey. I don’t even like hockey. This means I cannot run for public office in Canada, but that’s ok.
  6. I have owned and operated a 300bps acoustic-coupler modem and, later, a Fidonet BBS. Me and computers seem to have known each other a while. My first, like that of many here, was a cassette-tape-fed Z80 machine: though mine was a glorious Timex/Sinclair ZX-81. We’re back down to its form factor now with netbooks, only they have a million times as much memory and run a thousand times faster. And don’t come with a ROM BASIC interpreter.
  7. I dislike television and really anything that moves around too fast. I like things that move slowly or, better yet, sit still. My first and possibly favorite job — incidentally where I learned the most about programming — was at a bookstore, where I got to quietly organize shelves and read things. I should probably be a gardener or a librarian, but I’m somewhat hooked on programming.

I don’t like propagating pyramid-scheme internet memes so I’ll stop with that. No further tagging. I think this blog subspace is fully saturated anyways.

Cycle collector landed

January 5th, 2007

The previous attempt at landing the XPCOM cycle collector failed due to some unacceptable performance regressions. Yesterday, after tidying up a few of the more obvious offenders, we appear to have accomplished a landing that only hurts Tp2 by 5-15%, depending on platform and noise. I know this sounds like a big number and I will endeavor to make it smaller, but it was a big change that causes a lot of new pointer operations and a big new cost center in the GC itself. So getting it that low is mildly satisfying. Thanks to jst, vlad, brendan and others for all the hand-holding.

I’ll repeat my previous qualifications of this work, though. Despite the claims in the literature, no real GC system is free — they all cost time for the scanning and space for the transient garbage — and this is possibly one of the least satisfying GC systems because you have to manually add every class you want to participate in it. In the short term, you can expect: some performance loss, some heap increase, and a good number of leak-analysis tools to complain that memory is being leaked due to assumptions they make about the lifecycle of objects.

It is also reasonably likely that the collector will trigger new crashes; gecko still carries a lot of assumptions about pointer lifetimes, and it’s easy to accidentally write a traversal method that violates one. As we bring more classes into the collection regime and make more pointers strong, such opportunities should decrease.

In coming weeks I’ll try to work through each reported problem like this that comes up. Please let me know if you have a specific result that’s worrying you, and make sure to CC me on any bug that has a cycle collector frame in its stack.

Cycle collector landing

November 21st, 2006

Tonight I’ve submitted a mostly-final version of my XPCOM cycle collector for general consumption on the development CVS trunk. It’s a big patch — 230k — and has been in review for several months. In addition to the patch itself, testing builds have been available for a couple weeks. Alas by sheer number-of-files-touched it is likely to still cause lots of regressions I haven’t noticed while testing. There’s also a fair bit of logic in there that I touched but only partly understood. I’ve done my best to integrate reviewer’s comments, but more review and feedback this week would be great. Mostly I’ve been trying to keep it from crashing; unfortunately it’s doing something very delicate that is very easy to get wrong, and crash.

This work removes all the existing “DOM GC” code and replaces it with a general-purpose, multi-language device that can find and disconnect a conservative subset of cyclic garbage in the browser, if the classes involved in the cycle have been modified to play along. This includes both pure XPCOM cycles, and cycles that cross between script language runtimes such as spidermonkey with independent heaps and pointers connected both ways to XPCOM objects. The modifications required to make an XPCOM class participate are reasonably easy to make, and I’ve included a dozen or so examples throughout the DOM and content classes, which should cover all the types previously involved in DOM GC.

There are several runtime knobs, controlled by environment variables.

  • The variable XPCOM_CC_DO_NOTHING, if set, disables the collector entirely: the code is still present — so any regressions caused by reorganization of code may still exist — but it is inert, so it should be possible to isolate regressions due to new dynamic behavior using this variable.
  • The variable XPCOM_CC_REPORT_STATS, if set, causes the collector to print a fair amount of runtime chatter to the standard error stream, describing its activities.
  • The variable XPCOM_CC_FAULT_IS_FATAL, if set, causes runtime logic errors in the collector to trigger a process abort, rather than the standard behavior of disabling the cycle collector for the remainder of the program’s execution.
  • The variable XPCOM_CC_DRAW_GRAPHS, if set, attempts to spawn the graphviz program “dotty” a subprocess
    and feed it a graphic display of any garbage cycle the collector finds, just prior to unlinking the nodes.
  • The variable XPCOM_CC_HOOK_MALLOC, if set, inserts a small additional amount of sanity checking into the collector by ensuring that freed regions are never simultaneously present in the “purple” buffer that the collector ages pointers in.
  • The variable XPCOM_CC_LOG_POINTERS, if set, writes an extremely verbose pointer log to the file “pointer_log” in your current directory. It should only be used in extreme cases such as malfunctioning debuggers.
  • The variable XPCOM_CC_EVENT_DIVISOR, if set to an integer value, controls the resolution of the collector (which is driven from the top-level event loop). If you set this number smaller, more time is spent in the collector but collection increments will be smaller and thus cause shorter pauses. The default value is 64.
  • The variable XPCOM_CC_SCAN_DELAY, if set to an integer value, controls the size of the “purple” buffer pointers are aged in. If you set this number smaller, more time is spent in the collector but there is a shorter delay between forming a garbage cycle and collecting it, so the heap overhead might go down a bit.

In general, unfortunately, you should expect this patch to cause the browser to use somewhat more memory and somewhat more CPU time than before the patch lands. It is not a panacea. The purpose of the patch is only to provide infrastructure for simpler ownership rules. Rather than try to reason about which pointer ought to own which other pointer inside gecko, and which pointers are safe to consider weak or raw, you now get a simple and universal rule. When in doubt you can now:

  • Modify your class to participate in cycle collection
  • Change all the pointer members of your class to owning nsCOMPtrs
  • Ensure that the pointers you can reach transitively through those nsCOMPtrs are also cycle collection participants, or else obviously acyclic subgraphs.