Great work coming out of Mozilla’s Taiwan office

You knew that Mozilla has an office in Taipei now, right? πŸ˜‰

We’ve got a great engineering manager and three great engineers working out of Taipei, so far mostly on Boot2Gecko. I want to briefly call out what they’ve been working on lately. They’ve all just started within the last couple of months and are already kicking butt!

Shian-Yow Wu (shianyow on IRC): Shian-Yow fixed a long-standing bug preventing Wifi from working on b2g phones. He’s lately been hacking on virtual qemu devices that will allow us to test new hardware-interaction DOM APIs, and he continues to debug nasty issues; most recently an infinite loop in a stack dumper and a media crash on startup.

Thinker Lee (thinker): Thinker hit the ground running at Mozilla by setting up a qemu build that runs Firefox on Android (and b2g). He went on to add a fall-back path to our WebGL home screen that lets it work when WebGL isn’t available (uses 2D canvas instead). Lately he’s been implementing the lower-level parts of a proposed DOM “Sensor” API, allowing web apps to access proximity and ambient-light sensors, possibly among others.

Kan-Ru Chen (kanru): Kan-Ru jumped right into porting apitrace, a GL tracing debugger, to Android/EGL. He’s got it up and running already! Now he’s working on getting it working with Firefox for Android :).

And last but not least, James Ho has been managing our team in Taiwan, and continuously finding more good folks for Mozilla to hire in Taipei.

Please join me in welcoming our Taipei office, if you haven’t already.

pdf.js reached its first milestone

Last Friday, pdf.js reached the state we wanted to it to be in before announcing it loudly: it renders the Tracemonkey paper perfectly*. So, we’re announcing it!

Try out version 0.2.

We’re very excited about the progress since the cat was let out of the bag two weeks ago. Below is a comparison of some pages as rendered by the version of pdf.js initially covered by the press and our v0.2 release. In each pair of screenshots, the rendering of the older version is on top, and the rendering of 0.2 is on the bottom.

This is the most dramatic demonstration of pdf.js‘s biggest feature in 0.2: loading Type 1 fonts. (In fact, the difference between the captures above should have been even more dramatic, except that we hard-coded into pdf.js selection of the font used for most body text in the paper, so that we could more easily focus on other unimplemented features.) Dynamically loading Type 1 fonts into a web application was a big challenge. We’re trying to get Vivien to write about it; stay tuned. It’s hard to overstate how important this feature is for pdf.js.

Figure 2 on this page shows off several parts of pdf.js‘s renderer

  • The very obvious improvement in the labels on elements in the figure in 0.2 is due to pdf.js loading TrueType fonts properly.
  • The shadows under the rounded boxes are masked images, which Shaon implemented.
  • The dashed lines are drawn using a new API we’ve added to Firefox’s <canvas> and are in the process of standardizing.

Figure 4 is another dramatic demonstration of the difference made by loading Type 1 fonts and measuring them accurately.

The prettily colored, filled bars in Figure 10 are also thanks to Shaon; they’re “shading patterns” (custom, parameterized functions) that pdf.js evaluates all in JS, drawing resulting pixel values to canvas. These particular bars are “axial shading” patterns, aka “linear gradients”. The text in the description of Figure 10 also looks vastly better in 0.2, as it mixes several font faces that are now being loaded thanks to Vivien.

In Figure 12, we see a nice demonstration of a couple more new features in 0.2: the labels in the figure are being drawn because pdf.js now loads TrueType fonts. The hatched segments of bars in the graph are being rendered faithfully now because Shaon implemented tiled fills of patterns.

And last but not least, tt’s obvious in all the screenshots above that the user interface in version 0.2 is much more usable and prettier than in the initial version. That’s the work of justindarc. This sceenshot shows off a really cool new feature of pdf.js‘s new viewer: a “preview panel”, that pops out when the mouse hovers over the dark bar on the left side of the page. You’ll also notice the lower screenshot, from 0.2, shows the viewer straddling two pages; the first version, shown above, could only display one page at a time.

We chose the pixel-perfect rendering of this paper as our first milestone because getting there required solving some hard problems, and it’s easier to focus attention on one target. We want to prove that a competitive HTML5 PDF renderer really is feasible, and not just fun talk. Many more hard problems remain, but we haven’t come across any so far that are so much harder than what we’ve already solved to make us rethink the viability of pdf.js.

Community

pdf.js has a great and growing community. As we noted above, justindarc totally overhauled the viewer UI. notmasteryet implemented support for encrypted PDFs and embedded JPEGs (among other things). jvierek added a Web Workers backend (among other things) that will be one of the biggest features of our next milestone. sayrer has greatly improved our testing infrastructure. Everyone has done their fair share bug fixing. The list of contributors will probably have grown between the time we write this and the time you read it, so be sure to check out the current list.

More browsers/OSes, more problems

We intend pdf.js to work in all HTML5-compliant browsers. And that, by definition, means pdf.js should work equally well on all operating systems that those browsers run on.

Reality is different. pdf.js produces different results on pretty much every element in the browser×OS matrix. We said above that pdf.js renders the Tracemonkey paper “perfectly” … if you’re running a Firefox nightly. On a Windows 7 machine where Firefox can use Direct2D and DirectWrite. If you ignore what appears to be a bug in DirectWrite’s font hinting.

The paper is rendered less well on other platforms and in older FIrefoxen, and even worse in other browsers. But such is life on the bleeding edge of the web platform.

pdf.js has now reached the point where a significant portion of its issues are actually browserrenderingengine bugs, or missing features. Finding these gaps and filling some of them has been one of the biggest returns on our investment in pdf.js so far.

What’s next?

For our next release, we have two big goals: first is to continue adding features needed to render PDFs (of course!). Our next target is a bit more ambitious: pixel-perfect rendering of the PDF 1.7 specification itself. Work has already begun on this, during the stabilization period for the 0.2 release. Second is to improve pdf.js’s architecture. This itself has two parts: use Web Workers to parallelize computationally-intensive tasks, and allow pdf.js’s main-thread computations to be interrupted to improve UI responsiveness. (Ideally the web platform would allow us to do all computationally-intensive tasks like drawing to <canvas> off the UI thread, but that’s a hard and unsolved problem.)

We can keep moving fast towards rendering the PDF spec because we’re not worried about regressions, thanks mostly to sayrer’s work on testing.

Contribute!

We want pdf.js to be a community driven and governed open-source project. We want to use it for Firefox, but we think there are many cool applications for it. We would love to see it embedded in other browsers or web applications; because it’s written only in standards-compliant web technologies, the code will run in any compliant browser. pdf.js is licensed under a very liberal 3-clause BSD license and we welcome external contributors. We are looking forward to your ideas or code to make pdf.js better! Take a look at our github and our wiki, talk to us on IRC in #pdfjs, and sign up for our mailing list.

Andreas Gal and Chris Jones (and the pdf.js team)

Overview of pdf.js guts

Andreas posted a general overview of pdf.js. I’d like to briefly cover some more-technical parts of the renderer.

pdf.js (currently) parses raw arrays of bytes into streams of PDF “bytecode”, compiles the bytecode into JavaScript programs, then executes the programs. (Yes, it’s a sort of PDF JIT :). The side effect of those programs is to draw on an HTML5 <canvas>. The commands in the bytecode include simple things like “draw a curve”, “draw this text here”, and more complicated things like filling areas with “shading patterns” specified by custom functions (see Shaon’s post). Additionally, the stream of commands itself, and other data embedded within like fonts and images, might be compressed and/or encrypted in the raw byte array. pdf.js has basic support to decompress some of these streams, all the code written in JavaScript.

The rendering of fonts embedded in PDFs using web technologies is a big enough topic to merit its own blog post. A post might eventually appear on Vivien’s, but if not, somewhere else ;).

There are several ways to write a PDF renderer on top of the web platform; pdf.js’s current implementation, drawing to a <canvas>, is just one way. We chose canvas initially because it is (should be) the fastest way for us to draw to the screen. We want the first-paint of pages to be quick so that pdf.js startup feels zippy, and users can start reading content ASAP. Canvas is great for this.

Canvas has problems though. First, it’s missing many features needed to render PDFs. We’ve starting adding some of these to the web platform, when it makes sense to do so, and there’s more to
come. A second, and much bigger problem, is that while canvas’s immediate-mode design allows us to render with minimal overhead, it means that the user agent doesn’t have enough information to allow users to select text, or navigate using accessibility interfaces (through a screen reader, e.g.). Printing canvases with high fidelity is another issue.

The web platform already offers a (potential) solution to these problems, however: SVG. SVG is richly featured, retained-mode, and has its own DOM. In theory, user agents should have text-selection, a11y, and printing support for SVG. (If they don’t, it needs to be added.) So, SVG provides (in theory) the features missing from <canvas>, just at a higher cost in the form of more overhead.

Putting all this together, we currently plan on doing a fast first-paint of pages using canvas, concurrently building an SVG document for the page in the background, and when the SVG document is ready, switching to that. Or if that doesn’t work well, we could implement text-selection (and hopefully a11y) in pdf.js itself, on top of canvas, possibly creating new web APIs along the way. Or if, say, font loading dominates the critical path to first-paint, we might only use SVG and forget canvas. It’s great to have these options available.

There’s a ton of work left on pdf.js, from implementing features to improving the user interface to exploring crazy ideas (like for example using WebGL to speed up rendering). The project is open and the code is libre: we’d love for you to get involved! Have a look at our github and our wiki, or talk to us on IRC in #pdfjs.

Followup to: Filter your tryserver builds, a bit more easily

I made a mistake in not hosting this script from an hg repository; it’s already changed a few times since I originally posted. So, I fixed that. Now one can

hg clone http://hg.mozilla.org/users/cjones_mozilla.com/tryselect
cd $repo
../tryselect/tryselect

Permissions and so forth should be set correctly. (Protip: I symlinked tryselect/tryselect into a directory in my $PATH.)

If you find bugs, please fix them!

Filter your tryserver builds, a bit more easily

(Next in my series of posts designed to make this the most boring blog on p.m.o πŸ˜‰ .)

The new tryserver has the (awesome!) ability to customize builds for each platform, using mozconfig-extra-$platform scripts. For me (and most others?), the common case will be to use the extra files to disable builds for platforms I don’t care about. However, keeping a tryserver-filter patch or patches in my mq, and customizing for each filtered push, didn’t appeal to me. So instead, I wrote a little bash script to automate the process for me.

To use it, download the script linked above to somewhere in your $PATH, then chmod +x tryselect. The script expects you to have an try = ssh://[user]@hg.mozilla.org/try alias in your .hgrc, which you probably already do. Then, change directory to your tree, apply all the patches you want to push, then run tryselect. The script will ask you to list the platforms you want tryserver to build, then confirms those that it will disable. If all goes well, you should see something like the following (which submitted a tryserver job I only wanted built on win32).

$ tryselect
Enter platforms to build, out of android-r7 linux linux64 macosx macosx64 maemo4 maemo5-gtk maemo5-qt win32
win32
Disabling  android-r7  linux  linux64  macosx  macosx64  maemo4  maemo5-gtk  maemo5-qt , is that right? (y/n): y
pushing to ssh://cjones@mozilla.com@hg.mozilla.org/try
searching for changes
remote: adding changesets
remote: adding manifests
remote: adding file changes
remote: added 12 changesets with 1 changes to 27 files
remote: Trying to insert into pushlog.
remote: Please do not interrupt...
remote: Inserted into the pushlog db sucessfully.
now at: 582057-use-createchild

The usual caveat about this script possibly eating your mq and/or your children, and my lack of liability thereof, applies. Enjoy!

Print nsTArrays in gdb

This code was shamelessly stolen and modified from the pvector command from Dan Marinescu’s et al. gdb-stl-views.

##
## nsTArray
##
define ptarray
	if $argc == 0
		help ptarray
	else
		set $size = $arg0.mHdr->mLength
		set $capacity = $arg0.mHdr->mCapacity
		set $size_max = $size - 1
                set $elts = $arg0.Elements()
	end
	if $argc == 1
		set $i = 0
		while $i < $size
			printf "elem[%u]: ", $i
			p *($elts + $i)
			set $i++
		end
	end
	if $argc == 2
		set $idx = $arg1
		if $idx < 0 || $idx > $size_max
			printf "idx1, idx2 are not in acceptable range: [0..%u].\n", $size_max
		else
			printf "elem[%u]: ", $idx
			p *($elts + $idx)
		end
	end
	if $argc == 3
	  set $start_idx = $arg1
	  set $stop_idx = $arg2
	  if $start_idx > $stop_idx
	    set $tmp_idx = $start_idx
	    set $start_idx = $stop_idx
	    set $stop_idx = $tmp_idx
	  end
	  if $start_idx < 0 || $stop_idx < 0 || $start_idx > $size_max || $stop_idx > $size_max
	    printf "idx1, idx2 are not in acceptable range: [0..%u].\n", $size_max
	  else
	    set $i = $start_idx
		while $i <= $stop_idx
			printf "elem[%u]: ", $i
			p *($elts + $i)
			set $i++
		end
	  end
	end
	if $argc > 0
		printf "nsTArray length = %u\n", $size
		printf "nsTArray capacity = %u\n", $capacity
		printf "Element "
		whatis *$elts
	end
end

document ptarray
	Prints nsTArray information.
	Syntax: ptarray   
	Note: idx, idx1 and idx2 must be in acceptable range [0...size()-1].
	Examples:
	ptarray a - Prints tarray content, size, capacity and T typedef
	ptarray a 0 - Prints element[idx] from tarray
	ptarray a 1 2 - Prints elements in range [idx1..idx2] from tarray
end 

Example session

(gdb) ptarray arr
elem[0]: $1 = 1
elem[1]: $2 = 2
elem[2]: $3 = 3
nsTArray length = 3
nsTArray capacity = 4
Element type = int
(gdb) ptarray arr 1
elem[1]: $4 = 2
nsTArray length = 3
nsTArray capacity = 4
Element type = int
(gdb) ptarray arr 1 2
elem[1]: $5 = 2
elem[2]: $6 = 3
nsTArray length = 3
nsTArray capacity = 4
Element type = int

Save yourself some license template copypasta (in emacs)

Add this to your .emacs

(defun insert-mpl-tri-license () (interactive)
  (insert 
"/* -*- Mode: C++; tab-width: 8; indent-tabs-mode: nil; c-basic-offset: 2 -*-\n"
" * vim: sw=2 ts=8 et :\n"
" */\n"
"/* ***** BEGIN LICENSE BLOCK *****\n"
" * Version: MPL 1.1/GPL 2.0/LGPL 2.1\n"
" *\n"
" * The contents of this file are subject to the Mozilla Public License Version\n"
" * 1.1 (the \"License\"); you may not use this file except in compliance with\n"
" * the License. You may obtain a copy of the License at:\n"
" * http://www.mozilla.org/MPL/\n"
" *\n"
" * Software distributed under the License is distributed on an \"AS IS\" basis,\n"
" * WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License\n"
" * for the specific language governing rights and limitations under the\n"
" * License.\n"
" *\n"
" * The Original Code is Mozilla Code.\n"
" *\n"
" * The Initial Developer of the Original Code is\n"
" *   The Mozilla Foundation\n"
" * Portions created by the Initial Developer are Copyright (C) 2010\n"
" * the Initial Developer. All Rights Reserved.\n"
" *\n"
" * Contributor(s):\n"
" *   [YOUR NAME/EMAIL HERE]\n"
" *\n"
" * Alternatively, the contents of this file may be used under the terms of\n"
" * either the GNU General Public License Version 2 or later (the \"GPL\"), or\n"
" * the GNU Lesser General Public License Version 2.1 or later (the \"LGPL\"),\n"
" * in which case the provisions of the GPL or the LGPL are applicable instead\n"
" * of those above. If you wish to allow use of your version of this file only\n"
" * under the terms of either the GPL or the LGPL, and not to allow others to\n"
" * use your version of this file under the terms of the MPL, indicate your\n"
" * decision by deleting the provisions above and replace them with the notice\n"
" * and other provisions required by the GPL or the LGPL. If you do not delete\n"
" * the provisions above, a recipient may use your version of this file under\n"
" * the terms of any one of the MPL, the GPL or the LGPL.\n"
" *\n"
" * ***** END LICENSE BLOCK ***** */\n"))
(global-set-key "\C-xg" 'insert-mpl-tri-license)

(replacing “[YOUR NAME/EMAIL HERE]” of course), then “C-x g” to victory.

Helping ld link libxul more quickly

I used to do most of my development in a Linux virtual machine with 1.5GB of RAM. There, –enable-libxul builds were … painful: linking libxul itself took about 2 minutes and 30 seconds. Ideally, ld or gold would know how to re-link incrementally, and the problem would vanish. But until then, you can help ld out by adding a flag in your mozconfig

...
export LDFLAGS="-Wl,--no-keep-memory"

This tells ld to optimize for memory usage rather than speed. On my 1.5GB VM, this made a big difference because it kept ld from hitting swap as much; the link time went from ~2:30 to ~1:30. Also see https://bugzilla.mozilla.org/show_bug.cgi?id=494068.

CAVEAT EMPTOR: if you have a machine with “lots of RAM”, this flag might actually hurt link times. Also, it’s worth pointing out that the best way to speed up libxul link times is to add more RAM to your build machine, if possible. (My bright shiny new build machine can link libxul in about 5 seconds, so I don’t use –no-keep-memory anymore.)

Introducing porky.py: Low-fat pork

Porky.py (pronounced “porky pie”) is a simple C++ rewriting tool built on top of pork. Porky.py aims to make pork usable for a larger class of code rewriting problems by lowering pork’s high learning curve and making it easier to code up rewrite passes.

If you want to skip the exposition and play with the code, it’s available here. Just follow the pork install instructions to get started.

Background

Porky.py started back in April when I wanted to rewrite a bunch of code. I was (well, am still, sigh) replacing a C API with a new C++ API. Basically, code that looked like this

PRLock* lock = PR_NewLock();
PR_Lock(lock);
PR_Unlock(lock);
PR_DestroyLock(lock);

was going to be changed into this

Mutex* lock = new Mutex();
lock->Lock();
lock->Unlock();	
delete lock;

I quickly estimated that these old APIs were used in O(1000) places in our code, which was way more than I wanted to edit by hand. (I’m lazy, so sue me.) So I wanted an automated tool. However, this rewrite task is just beyond the reach of regular-expression-based tools like sed; existing code could do something like PR_Lock(GetStruct().GetPRLock()), which causes sed to barf. Of course there’s the pork tool, which can eat this kind of rewrite for breakfast, but looking at existing pork tools convinced me that, for this relatively simple rewrite, it was going to be a waste of work to (i) learn pork’s AST classes; (ii) learn its AST visitor idioms; (iii) learn the non-standard utility libraries pork depends on (sm::string, FakeList, …); (iv) learn pork’s patch generation library; (v) code the tool in C++ … sigh.

So I was stuck with either doing a half-assed rewrite with sed and spending a few days fixing up its mistakes, or wasting a week or so coding a pork tool. Half-assed rewrites suck. But in the bug report I filed, I realized that I was naturally describing the rewrite in a way that an automated tool could understand. I have some background in programming languages, so in the spirit of Terence Parr

“Why program by hand in five days what you can spend five yearsdays of your life automating?”

I decided to write my own tool on top of pork.

Porky.py’s specification language

Porky.py’s domain-specific language (DSL) for rewrites was designed with this use case in mind

  • a C++ developer not familiar with porky.py wants to do something like my API rewrite example above
  • doesn’t want to spend several days learning pork
  • and doesn’t want to learn an obscure DSL
  • (it’d be nice if the tool were fast, too)

So these requirements to me implied that, first, the porky.py DSL should be “minimal” in the sense of minimal additional syntax beyond C++’s — less to learn. And second, porky.py should target expression-level rewrites (as API changes usually are) rather than statement level. Statement-level rewrites complicate things.

Below is a working porky.py solution to the rewrite problem posed above. You can decide for yourself whether it meets my criteria.

rewrite SyncPrimitiveUpgrade {
  type PRLock* => Mutex*
  call PR_NewLock() => new Mutex()
  call PR_Lock(lock) => lock->Lock()
  call PR_Unlock(lock) => lock->Unlock()
  call PR_DestroyLock(lock) => delete lock
}

The rewrite rule type PRLock* => Mutex* means: everywhere the type “PRLock*” appears, change it into “Mutex*”. The second kind of rule here, PR_Lock(lock) => lock->Lock() is more interesting; it means that, at any callsite matching

PR_Lock($lock$)

where $lock$ is any expression, change this line into

$lock$->Lock()

These kinds of rules are porky.py’s big advantage over sed et al.: because pork.py has access to a C++ AST through pork, it can match patterns that require strictly more power than regular expressions provide. One can write rules like call Foo(a, b, c) => c.Method(b, a), and the rule will rewrite call sites like Foo(x().y().z(), r(s(t)), u.v.w()) into u.v.w().Method(r(s(t)), x().y().z()).

And finally, porky.py provides the creature comfort of one-liner shell invocations for really simple rewrites

porkyc -e 'call SomeFun(a, b) => OtherFun(b, a)'

After which the compiled pork tool can be invoked. (Docs forthcoming on MDC.)

Code rewriting workflow when using porky.py

After writing porky.py, I used it to edit a large quantity of code in a couple of hours. These patches haven’t all made it into mozilla-central yet (for a variety of reasons), but I wanted to show the steps I took to generate them. This will eventually find its way into an MDC guide.

$ porkyc -m sync_primitive_upgrade.porky
  (outputs and compiles code in |SyncPrimitiveUpgrade.code/|)
$ SyncPrimitiveUpgrade.code/dorewrite ~/mozilla-code/*.ii -x *nspr* > mozilla-code.patch
   (does n-way parallel rewrite on matching files; n depends on your system)
   (doesn't include files matching the pattern *nspr* in the patch)
   (writes patch to stdout)

Next, I would apply this patch and compile, fixing up problems by hand (hey, porky.py is a prototype). Then it was hg qdiff and the patch was up for review. I could generate these patches much faster than they would have been able to be reviewed; this ended up being the bottleneck.

Eventual goal for porky.py

I’d like it to support this rewrite

rewrite FooToBar {
  class Foo => Bar {
    member mMember => member_
    method Method(a1, a2) => method(a2, a1)
  }
}

which would entail

  • rename class Foo into class Bar
  • rename type “Foo” into type “Bar” (including Foo*, Foo&, …)
  • change calls to Foo constructors into Bar constructors
  • rename declaration of Foo.mMember into Bar.member_
  • convert accesses of ((Foo)inst).mMember into ((Bar)inst).member_, (and similary for inst->mMember, …)
  • rename declaration of Foo::Method into Bar::method
  • rename implementation of Foo::Method into Bar::method
  • convert calls to ((Foo)inst).Method(a1, a2) into ((Bar)inst).method(a2, a1) (and similary for inst->Method(a1, a2), … and similary for subclasses of Foo …)

I should note that having porky.py rewrite declarations and definitions is not so important (though it would be nice!): there is only one declaration/definition. Rewriting uses is much more important, since there are any number of uses.

I won’t implement this kind of rewrite until I need it. Sorry! But please feel free to dive in to the porky.py code and do it yourself!

How porky.py fits into the “rewrite tool space”

Rewrite tools have to trade off several factors. It’s good to have a small, familiar DSL, as these are easier to learn and remember. But it’s also good to have a large and expressive DSL, for raw rewrite power. The table below is my attempt to fit porky.py into the space of relevant rewrite approaches I’m aware of. It compares porky.py with

  • pork. Rewrites specs are written in C++, which is not obscure to C++ programmers. Rewrite specs are very verbose. Any possible rewrite of C++ code can be expressed in pork. Pork is very fast.
  • XML+XSLT. Rewrite specs are written in XML/XSLT modeled on a particular C++ AST; very obscure. Rewrite specs are relatively concise. Any possible rewrite can be expressed. Very slow.
  • Tree transformation (e.g. in ANTLR). Very obscure DSL. Rewrite specs are relatively concise. Can express (usually) any possible rewrite. Usually relatively slow.
  • Coccinelle/SmPl. DSL relatively familiar. Rewrite specs concise. Can express most statement-level rewrites. Can be fast.
  • porky.py. DSL familiar. Rewrite specs concise. Express some expression-level rewrites. Fast.
  • sed. DSL familiar. Specs concise. Extremely limited power. Very fast.
  	            <---- More LoC(bad) ----- Fewer LoC (good) --->
                    <-- Less obscure(good) -- More obscure(bad) -->

	               No DSL  |  Some DSL  |  More DSL  |  All DSL
                     +---------+------------+------------+-----------
        All possible |  Pork   |    ???     |    ???     |  XML+XSLT/tree trans.
	  ~Statement |         |            |coccinelle* |
	 ~Expression |         |  porky.py  |            |
	       Names |         |            |            |
 Crappy rename hacks | sed**   |            |            |
	-------------
 	 * only works for C code
	 ** assumption: regular expressions are well-known enough that 
	    negative "DSL" connontations don't apply.

To be honest, the biggest lesson I learned from this project is that pork can be a good lower-level “engine” for higher-level tools. I didn’t know about Coccinelle when I wrote porky.py; if I had, I might have tried retargeting it to pork instead of starting from scratch. I prefer some of porky’s syntax/semantics to Coccinelle’s, but I think the additional complexity of SmPL adds a compelling amount of power over porky.py’s simpler DSL. Upgrading SmPL to parse C++ and retargeting it to pork might be a good project for someone else.

But of course, since Coccinelle doesn’t understand C++, we’re “stuck” with porky.py for the foreseeable future ;).

Porky.py for language nerds

Porky.py is implemented as a relatively simple source-to-source translator written in Python. It converts porky.py specifications into a C++ header containing rewrite rules defined in a sort of “bytecode.” This header is included by a general porky.py C++ tool that uses pork. This tool “interprets” the rules, and if a rule matches part of the C++ AST, the tool generates a patch hunk according to the porky.py spec. This is fairly similar to how a regular expression engine would implement a “replace” function, although the matching is obviously fairly different.

There were a few interesting problems that arose while I designed the porky.py language. The first was what the semantics of rule matching should be. The issue is that multiple rules can match the same program text. For example, in the spec

call Foo => Bar
  (means "rewrite all calls to Foo into Bar, regardless of arguments")
call Foo(a, b) => Bar(b, a)
  (means "rewrite only the two-argument version of Foo into Bar, reverse the args")

both rules will match Foo(1, 2). Which should be used?. My solution was use the most “specific” rule. I defined what “specific” meant by the following rules. First, a call pattern with arguments is “more specific than” a call pattern without arguments. And second, a “literal” pattern is “more specific than” a wildcard pattern. For example, Foo::kSomething is more specific than f. Porky.py implements these heuristics by ordering all the rewrite rules at compile time by decreasing “specificity” and then attempting matches in that order. The first rule to match is chosen for the rewrite.

A second issue that arose was how rewrite specifications should express “literal” patterns — i.e., match this exact text — vs. “wildcard” patterns — i.e., match any expression in this syntactic slot. The problem arises in this rule call Foo(Bar, a) => Baz(a, Bar). Are “Foo”, “a”, and “Bar” literals or wildcards? I didn’t want to add special syntax for wildcards because of the “minimal DSL” design goal. So, my solution was to be as “greedy” as possible about choosing wildcard variables; I think this is likely to be least surprising. In the example above, “Foo” shouldn’t be a wildcard because if it were, it would match any function call with two arguments. That would be silly. But, both “a” and “Bar” are wildcards; in fact, any identifier used as a C++ “expression variable” (other than function names) are treated as wildcards. The “escape hatch” is C++ namespace qualification; if in the first example “Bar” was meant to only match some global symbol “Bar”, then the pattern could have been written as call Foo(::Bar, a). There are numerous other heuristics possible here, and I may change porky.py’s depending on feedback.

A third issue was how porky.py expressions should be typed, i.e., what their C++ type should be. For example, this porky.py rule seems innocent enough call foo->Bar() => foo->Baz() when one is thinking “foo is a Foo instance,” but how can porky.py glean that information? (Note that this is not a problem for the RHS “foo”, since its type is already known by the time porky.py is ready to generate a patch hunk.) My solution to this was to have porky.py programmers write these LHS method calls in desugared form; in the example above, call Foo::Bar(foo) => foo->Baz(). Eventually, though, I would like to use the class syntax I introduced above to resolve this ambiguity. For example: class Foo { call Bar => call Baz } (note, though, that this isn’t implemented yet). C++ inheritance also complicates things here. When implementing porky.py I punted on inheritance (hey, it was five day’s work!), but I think the problems it presents have reasonable solutions.

Hacking porky.py

If you’re interested in taking up any of the porky.py extensions I suggested here, or just want some porky.py technical support, send me an e-mail at cjones@mozilla.com or drop by the #static channel at irc.mozilla.org.

Multi-process Firefox, coming to an Internets near you

Benjamin Smedberg recently discussed the motivation for splitting Firefox into multiple process, so I won’t recap that here. Instead, I want to demonstrate what we’ve accomplished so far. The video below shows our nearly Phase I-complete browser. (Back and Forward don’t work yet, but are relatively easy to add.)

First, I browse around. Nothing particularly exciting there, except that two Firefox programs are running — Firefox itself, and gecko-iframe. The second program is new: it’s drawing the web pages to the screen. Currently in Firefox, this is all done within the same program.

The fun comes when I kill -9 this gecko-iframe, the “tab” containing mozilla.com. To the non-geeky, invoking kill -9 on a program causes it to crash IMMEDIATELY. This simulates what would happen if, say, you tried to run a buggy plugin and it got itself into trouble. Notice that only the “content” disappears when the page crashes; the user interface itself keeps running as if nothing happened. This is a big step forward! If I were to kill -9 the current version of Firefox, everything would die, user interface and tabs.

With Firefox protected from buggy pages and plugins, more fun is possible. This video shows me pressing a “Recover” button that relaunches the page that just crashed. There are many more possibilities for recovering from these errors, and I’m excited to see what our user interface folks cook up.

This demo shows off a lot of hard work from Ben Turner, Benjamin Smedberg, Boris Zbarsky, and Joe Drew. (We also have plugins running in their own, separate processes, in an incomplete way. However, the plugins still refuse to draw to the screen and so wouldn’t make for a very good demo.) Drop by #content on IRC or read mozilla.dev.tech.dom if you want to find out more details of what the Electrolysis team is up to.