Main menu:

Site search

Categories

Archive

Console for hacking Mozilla on Windows

This post is just a little tip for hacking Mozilla on Windows. It’s well known that the basic Windows console program is, uh, not good, compared to a Unix console. But it turns out there’s a really nice project on SourceForge named Console that provides multiple tabs, good fonts and scrollback, keyboard shortcuts, and a few other goodies.

Chris Leary asked me to write this up so that other people know about it (and people have been asking lately). Justin Dolske was the person who pointed me to the project in the first place.

Anyway, a few tips for using Console for Mozilla hacking:

  1. It’s just a zip file containing a program. There’s no install, you just run it.
  2. For mozilla-build, you want to create a new “Tab” in the settings. For example, I have one called “Mozilla ia32″, with Shell set to “C:\mozilla-build\start-msvc9.bat” and Startup dir set to “C:\mozilla-build”.
  3. With the latest version of mozilla-build, it doesn’t work right if you just do that. What will happen is that the tab will start up, then shut down as a standard Windows console opens. (Very annoying, because if it’s the only tab, it’ll even close down Console, so you can’t easily fix it.)

What you need to do is find your startup batch file (e.g., start-msvc8.bat) and change the last line from:

    start /d "%USERPROFILE%" "" "%MOZILLABUILD%"\msys\bin\bash --login -i

to

    "%MOZILLABUILD%"\msys\bin\bash --login -i

Console project on SourceForge

Grill Engineering

Time for a weekend-themed blog post.

Background: I always found getting steak done right to be one of the hardest cooking feats. My typical result was to cook the steak for a while, until the outside is kind of burned and it looks done, then serve it up to find the center is cold and red. And there seems to be a lot of contradictory, context-free, unvalidated advice out there, so it’s hard to learn better by reading. Recently I took an engineering approach to grilling and I think I’ve got it figured out. I used lessons from the JägerMonkey project: start by reading up on different designs, analyze their differences, try to figure out who’s got it right, and do some experiments to clear up unanswered questions. I ended up learning 4 big things:

1. Let the steak thaw before cooking. I’ve found it much easier to get the inside done right without charring the outside if the steak is near room temperature. I leave the steaks in the refrigerator overnight or during the day, and then on a cutting board for about 30 minutes before cooking.

2. Season liberally. I used to season steak lightly or not at all, but I’ve found it’s much better if I use plenty of salt and pepper. Natalie swears by sea salt, and I insist on freshly ground pepper. Watching restaurant chefs seems to be the best way to get a baseline for how much to add, and then you can add more or less as you like it.

Adding a little bit of oil before cooking also seems to be standard restaurant practice. I use a little bit of macadamia oil because it does well with high heat and some chef told me to in a YouTube video I can’t find right now. I think it’s mostly to just help the steak not stick to the grill.

Some sources say that putting salt on meat before cooking dries it out, but that is completely wrong.

3. Cook by temperature. This is the most important point: With an accurate thermometer, you can repeatably cook steak to the exact doneness that you like. Some people say poking steaks with a thermometer makes the juices run out, and instead recommend prodding it with a finger to test firmness, or various other things. But I’ve found that I can stick the steaks with a thermometer as many times as I want without harming them, and the thermometer is much more precise than any other method. Maybe if you are a restaurant chef and have cooked 3000 steaks you can do it better with a finger-squeeze, but I sure can’t.

As with everything else, the web disagrees on what temperature is medium rare, medium, and so on. Part of the problem is that the temperature at the center will continue to increase for a few minutes after taking the steak off the grill, so when a temperature is quoted, it’s not clear whether a quoted value is the temperature you cook to on the grill, or the final temperature.

The critical decision is when to take the steak off the grill, so that’s the temperature you need to figure out. I’ve found that stopping cooking at 125 °F yields a good medium rare steak for me, and 133 °F a medium steak for Natalie. It probably goes up by 5-10 °F afterward, but I haven’t done that experiment yet.

For a one-inch-thick steak, it takes my grill about 8 minutes to cook to 125 °F, so I set the heat on high, cook for 2 minutes, rotate 45° (to create a nice cross-hatched grill mark pattern) and cook for 2 minutes, flip and cook for 2 minutes, rotate 45° again. After another minute or so, I start taking temperature readings with an “instant-read” meat thermometer. Once it’s close to done, I hold the thermometer in and remove the steak from the grill immediately once it reaches the desired temperature.

The keys are: (a) experiment to find out at what temperature it’s done the way you like it, (b) experiment to find out about how long it takes to get to that temperature, (c) monitor the temperature continuously once it is almost done, and (d) use a high-quality thermometer. I use this one, because as an engineer I like fancy instruments and I enjoy taking the temperature of random objects using the infrared function.

4. Rest the steaks. This one isn’t controversial, but not everyone has heard about it. In general, meat should be left to “rest” for a few minutes after cooking. I haven’t figured out exactly how long it should be, but 10 minutes seems a little too long, and I recently saw a random web page that recommended half of cooking time. That would be 4 minutes for my one-inch-thick steaks, so I might try that, but usually it just ends up being however long it takes to finish setting the table and serve the other dishes.

That’s it. That’s all I know, and it’s enough to cook steaks just how we like them. The same basic idea works for pork, chicken, and fish–the main difference is the target temperature.

Crankshaft

The V8 team has dropped Crankshaft, a new JIT system for JavaScript, into their bleeding-edge repo. According to their blog entry, it doubles their speed on 3 of 8 V8 benchmarks, and improves page load time by 12% on JS-heavy pages.

First off: Congratulations to the V8 team. It looks like great work, pushing forward what kinds of things JS can do in the browser. I look forward to checking out the code.

Analysis. I haven’t looked into the details yet, but their blog post has a good summary and I can make some guesses based on my own knowledge of the subject. I think the key features are:

  • Dynamic recompilation. Crankshaft introduces an optimizing compiler that does complex optimizations, such as register allocation and loop-invariant code motion. These optimizations take time, so they would make startup slow if that was the only compiler. But Crankshaft also has a base compiler that starts fast but doesn’t optimize very much: probably less than the V8 compiler, in fact. Only if the code is predicted to run many times will it be compiled with full optimization.
  • Profile-driven type specialization. That means Crankshaft records the types of variables and the targets of function calls at runtime, and then recompiles methods specialized to those types and targets.

I have to point out that runtime type specialization for JavaScript was pioneered by Mozilla’s Tracemonkey project. It looks to me like Crankshaft adds three new things to the type specialization mix: (1) instead of recording a trace once and then doing type specialization, they profile over multiple iterations so they can gather more information, (2) compiling whole methods instead of linear traces, introducing a bit larger scope to the optimizations and reducing code size, and (3) using ICs along with type specialization.

Another indication that Crankshaft and Tracemonkey are fundamentally related: Crankshaft gets a big boost over V8 on the benchmarks deltablue, richards, and crypto. Tracemonkey gets a big boost over JaegerMonkey on deltablue, richards, and splay.

(Historical note: Most of the fundamentals of JIT optimization were established in the research language Self in the 80s and 90s. Subsequent work has typically focused on porting those techniques to new languages, adapting them to modern processors, and making various incremental improvements. In the 90s and 00s that was done with Java, which gave us our modern high-performance Java JITs. It looks like the 00s and 10s will see it done for JavaScript. It’ll be interesting to see how close JS perf gets to Java.)

Response. The Mozilla JavaScript team and developer community definitely have the skills and resources to enhance our dynamic type specialization system with ICs, more profiling data, wider compilation scope, and whatever else we can think of. So we won’t get left behind.

Also, we’ve already been working on static-analysis-driven type specialization. This means using static analysis to discover the types and targets ahead of time and then compiling with type specialization. The Self researchers found static and dynamic analysis to be about equally effective for optimization, but we won’t know whether that’s true for JS until we’ve tried it.

Brian Hackett created and implemented the type inference project, which is documented in bug 557407 and bug 608741. The code is in the JaegerMonkey repository. Brian is currently fixing bugs and integrating his work into the JaegerMonkey engine, and he already has some very promising performance results.

So, plenty to do after Firefox 4 comes out. In the words of David Anderson, “The game’s back on.”

Kraken Benchmark Update

colinpj tweeted me “Can you post kraken results using the same browser versions + setup?” Here you go:

Browser Sunspider update

IEBlog recently showed some Sunspider results that show them currently ahead of us. I just want to correct that a little bit. :-)

The latest Firefox shown on their graph is Firefox 4.0 beta 6, which is well behind our nightly releases at this point, including in performance. Their graph shows the Chrome 8 nightly, but nightlies for no other browsers. So I decided to do a comparison test using Sunspider 0.9.1 on the latest version I could get for each of the big 5 browsers. Results:

So, not terribly different from their results, except that it shows we’re a hair faster as of now, at least on this machine. We are currently finishing up a last few perf projects before we turn to focus on getting it ready for release. I’m expecting us to get 3-10% faster by then, but that’s just a guess. I’m sure IE is still working on perf as well.

A final point is that the graph shows pretty clearly that Sunspider scores are converging: the days when one browser would be 2x or 6x as fast as another are gone. Now it’s more like 1.1x or 1.2x. I know there are a few more tricks we can play that are too complicated to do before Firefox 4.0, but they would make only a small dent in Sunspider scores. By next year, either we’ll all be fast and basically done with JS perf work, or else it will be time to update the benchmark to help drive the next round of perf improvments.

Update: sdwilsh pointed out to me that the IEBlog post shows a date of “10/25″ for Firefox, which isn’t the build date of beta 6. So it might actually be a nightly build from a week or two ago. Not clear.

Jägermonkey: it’s in ur browser!!!

At the beginning of this year, the Mozilla JavaScript team started a new project, code-named JägerMonkey, with a simple goal: make us fast.

Our previous major engine upgrade, TraceMonkey, gave Firefox 3.5 a big speed boost. But while the technology inside TraceMonkey makes it faster than any other engine on certain programs (then and now), it doesn’t help other programs as much. And the web has grown more complex, with more and more JavaScript-intensive demos, apps, and games. And the competition has been getting a lot tougher, with engines that could run fast on bigger and prettier web apps. We knew we needed another major upgrade for Firefox 4.0, to make us fast all around.

So, we went off for 8 months of studying the classic research, reverse engineering the competition, measuring, experimenting, designing, prototyping, analyzing performance, scrutinizing assembly code, redesigning, coding, and lots and lots of debugging. David Anderson and I intend to blog more about the techniques and technologies we used in our copious free time in the weeks between now and release.

As part of the project, we revamped the JavaScript engine’s fundamental value type, touching about 20,000 lines of code, which I compared to a vascular system transplant in an earlier post. We imported a couple of basic components, the assembler and the regular expression compiler, from WebKit’s JavaScriptCore. We created a new cross-platform whole-method JIT compiler in about 23,000 lines of code. It supports x86, x86-64, and ARM in an almost entirely shared compiler code base, the only JS engine that does so (to our knowledge). And it all works together with the existing TraceMonkey trace JIT compiler.

You can try the new JavaScript engine now in Firefox 4 beta 7. If you try them, you should see:

  • Big improvements in benchmark scores. Those aren’t the main goal–but they are a really convenient target for us to aim at.
  • Things just feel faster, especially big JavaScript-heavy things like Gmail and Facebook. That’s subjective, so as an engineer I feel a bit funny touting it, but that’s what early users are saying, anyway. :-]
  • Cool demos and games work great now. You can play a good game of Super Mario Bros in JavaScript now. Or play some Gameboy. Or try a fluid simulator.

Keep in mind that these are only preview builds, and we are not done yet, which means:

  • We should be a little bit faster yet by the time Firefox 4 is released. In particular, we’re still working on making function calls faster, which should speed up pretty much every non-tiny JavaScript program.
  • If you come across something where our speed is not up to scratch, let us know! (For that matter, if you come across something that Jägermonkey works great on, we’d feel good hearing about that too!) We still have time to fix performance issues or add a key optimization or two. Filing a bug is the most convenient way for us (and this link should save you from any need to dig through Bugzilla). But the important thing for us is to find out, so always feel free to just send us an email.

For me, one of the most satisfying parts of this project has been working together as a team. You can just feel it when a team really comes together, each person knowing what their teammates are up to and naturally supporting and depending on each other. Both the Jägermonkey team and the larger JavaScript team really came together this year–it’s been great.

Another cool thing is that we dared to give key pieces of the project to our interns this summer, and they all came through!

The rest of the JavaScript team made our project possible: they kept the lights on in the rest of the JS world, and put together some critical components Jägermonkey needs to work correctly. We got some nice help integrating with the rest of the browser from mrbkap and peterv. The community helped us out, especially with testing, finding performance problems, and, most of all, cheerleading and moral support. I’d especially like to acknowledge the people outside the core JM team who contributed code: platform engineer Brian Hackett, who created many excellent optimizations; Julian Seward, who needs no introduction, and did us a big favor in porting over WebKit’s assembler; Bill McCloskey, who just started but already wrote a JM patch and is optimizing integration with the tracer; and contributor Jan de Mooij, who wrote several optimization patches.

Finally, here’s the Jägermonkey team: from left to right, Andrew Drake, Alan Pierce, Sean Stangl, David Anderson, Luke Wagner, Chris Leary, and Dave Mandelin.

JaegerTweets

Just a quick note: I’m pretty busy these days getting JaegerMonkey ready to land. But I’m now sending out small status updates on my twitter account ‘dmandelin’.

JägerMonkey Update: Getting Faster

Time for another JägerMonkey update: how far we’ve come, what’s happening next, and our plan to bring it all together in time for Firefox 4.

How far we’ve come. So far this year, we’ve done two huge things:

  • Switched the JavaScript engine’s basic value representation from a pointer-sized value to 2 new 64-bit representations, one for x64, the other for x86 and ARM. This is a huge patch, touching 20,000+ lines of code–think of doing a full vascular system transplant surgically. Luke Wagner just landed this change to the TraceMonkey repository. The main reason for doing this is to enable better JIT code generation, but we are already seeing some small-to-medium speedups on certain programs.
  • Written the JägerMonkey method JIT compiler for x86 (with ARM support mostly there). One of the key challenges was generating good code out from SpiderMonkey’s stack-based bytecode. Stack-based bytecodes tend to spend a lot of time reading from and writing to the stack compared to register-based bytecodes like Nitro’s. We designed a compilation strategy that works with our register allocator to boil away most of the stack traffic. We simulate stack operations during compilation and then generate “equivalent” code that keeps things in registers instead of in stack memory. The compiler also has fast paths for arithmetic, PICs, and all the other usual dynamic language JIT stuff. David Anderson led this effort, ably assisted by Sean Stangl.

At this point, our JIT can generate code about as good as Nitro or V8, except for a few optimizations that we are missing, such as fast paths for the mod operator or comparing floating-point numbers. We also need to make a few more improvements to our register allocator. And, of course, we need to bring up the x64 version of the JIT. But overall, the JIT code is looking very good.

All in all, JägerMonkey is now about 3x faster than the baseline interpreter we started with.

Remaining Performance Work. The areas where our performance is still really hurting are in the runtime: function calls, strings, and regular expressions:

  • Regular expressions. The benchmarks are kind of heavy on regular expressions. We started with a simple regular expression compiler, created by me, extended by Luke Wagner. But there are still a lot of regular expressions it doesn’t compile, which run in the slower regular expression interpreter. Because we now use the same assembler that Nitro does, we can use their regular expression compiler, Yarr, too. Chris Leary took on the job of porting over Yarr to SpiderMonkey, and it’s about to land.
  • Strings. SunSpider is pretty heavy on strings, especially string concatenation and string replace operations. I would imagine those are pretty common for web code, too, so it’s a good thing to optimize. Ropes help a lot with string concatenation, so JavaScript intern Alan Pierce coded up some ropes, which are also about to land. Alan is now working on replacement and other string operations that need performance help.
  • Function calls. As of today, function calls in SpiderMonkey are very slow compared to the competition. One of the main problems there is that we have a very large stack frame, that encodes all kinds of optional elements and duplicate copies of information available elsewhere. The current design is convenient for a basic interpreter, but it can’t deliver the kind of fast JavaScript people now expect. Luke Wagner will be applying his surgical skills to the “Stack Frame Evisceration” subproject, which will leave us with a lean stack frame. That plus some JIT improvements should give us fast function calls.

Once we get these items and the JIT improvements already discussed, we should be fast. There are about 30 bugs on file for JM performance, some easy, some hard, some compiler, some runtime, some big wins, some small–anyone who wants to help make us fast should check out the list.

Real Artists Ship. Being fast only counts if it ships. Getting us ready to ship is the priority focus right now. Shipping is mostly about getting out a beta and finding and fixing bugs. There are a couple of big chunks we need to do for a beta JM:

  • x64 JIT compiler. Sean Stangl is moving right along from the x86 JIT compiler to the x64 version. It will be basically the same design, but should be even more effective because x64 has so many more registers.
  • Integration with the trace JIT. Of course, we already have a JIT, the tracing JIT, which generates excellent code for certain programs, especially simple math kernels. So we need to be able to switch back and forth between the method JIT and the trace JIT, hopefully at the ideal times. David Anderson is now working on this. Combining the two systems most effectively will take a lot of tuning, which will have to be ongoing as the performance characteristics of both JITs are still improving.
  • Debugging JIT code. Running 4x slower as soon as you turn on Firebug is not as good as, say, not running 4x slower. We are going to make it possible to debug jitted JavaScript code, so there should be minimal slowdown during debugging. JägerMonkey intern Andrew Drake is taking care of this part. So far, he has already solved the hard problem, setting breakpoints in JIT code, by adding a dynamic recompilation feature to JM. Once he fills in the API implementations, it should be ready to go.

That’s our plan. Right now, things are going well–we are actually having a kind of traffic jam landing JavaScript patches (including both our perf work and other JS work). We’ll update as work continues.

JägerMonkey: the “halfway” point

JägerMonkey has reached a halfway point: we’ve closed about half the performance gap between our baseline performance (with no tracing) and the competition. You can see this on arewefastyet.com, a site David Anderson created to track our progress. Thanks also to the anonymous contributor who gave us an improved page design.

So far. That improvement represents about 6-8 weeks of work. Major performance improvements we did during that time:

  • Polymorphic inline caching (PICs) for object property access. We actually had a pretty good system for optimizing properties before, the property cache. But the property cache requires calls to C++ functions, taking us off the super-fast native jit code. PICs are similar to the property cache, but are more amenable to jit code.
  • More compiler “fast paths”. There are two basic ways to implement an operation in a compiler like JM: either calling out to a C++ “stub function”, or inline with the jit code in a “fast path”. We added fast paths for more operations, so we can potentially run about 80% of the operations in the SunSpider/v8 benchmarks in pure jit code. (We’re aiming for 95-99%.)
  • Register allocation and local optimizations. We’ve enhanced the compiler so that it uses machine registers more efficiently, trying to hold values in registers and reuse them instead of always loading from and storing to memory.
  • Improving global variables. This one is still in progress, but we’ve already posted some perf wins from it. We’re completely overhauling the way global variable accesses are resolved and compiled to make them the fastest they can be in a JM-style system.

I want to add that we’ve referred to JSC (WebKit’s JS engine) and V8 frequently. We’ve been striving to build on what’s been figured out already rather than rediscovering everything. In particular, we took a lot of the design ideas for PICs and globals from JSC, and some more design ideas for PICs and the concept for register allocation from V8. So, credit and thanks to the JSC and V8 teams and their open source efforts.

Next. We have a ton of work left to do, and it’s not easily summarized, so I’ll just mention some highlights.

The biggest ongoing piece of work is our new JavaScript value representation. In the old interpreter, a value is represented by a machine word with 1-3 tag bits and 29-31 bits (on a 32-bit machine) of value payload. The biggest cost of this scheme is that because floating point numbers require 64 bits, so they don’t fit. Instead, floating point numbers are stored on the heap, and the tagged value contains a pointer. This makes creating, reading, writing, and cleaning up floating-point values much more expensive.

The new values are planned to be 128 bits, with a full 64-bit payload. Thus, floating-point numbers can be stored directly in the value. Also, the tag bits are off to the side so they don’t have to be added or removed with bit operations.

Strings and regular expressions are also scheduled to get some attention soon.

Finally, we are going to teach the debug API (behind Firebug and Venkman) to debug compiled JavaScript. So, with JägerMonkey, it will be possible to run a debugger but still run JS fast.

Final thought. The other big piece of work starting now is to get JägerMonkey to work inside the browser. You can build a browser with JM today, but you probably won’t get too far before crashing. Fixing that is next on my list.

JägerMonkey & Nitro Components

After our recent blogs about JägerMonkey, some articles out there gave the impression that we were removing nanojit, throwing everything away, or doing all sorts of other radical things that we are not in fact doing. I don’t have a huge problem with that–this is complicated stuff and it’s hard to get right. So Chris Blizzard made a post correcting some of the misconceptions. I thought it might also be easier to see what we’re doing in a picture of the major system compoents of SpiderMonkey, TraceMonkey, JägerMonkey, and Nitro:

The size of each box is proportional to the number of lines of code in that component. (I just measured the file size, including comments and whitespace. LOC is a rough measure of complexity anyway, so it doesn’t matter much.)

(By the way, I might have the JSC/Nitro terminology wrong; I’m not sure exactly what they call the different parts. I’m using “Nitro” to refer to the method JIT, and JSC to refer to the other parts of the engine, but Nitro might actually be a name for the whole engine that includes the method JIT. There is a similar ambiguity in the use of SpiderMonkey/TraceMonkey in Mozilla.)

The key point is that the only piece of WebKit that we are importing is the assembler, which we are using beneath our method JIT. We need our own method JIT because the method JIT’s job is to translate bytecode to cross-platform assembly, and SpiderMonkey and WebKit each have their own bytecode. Also, the generated code interfaces heavily with the giant green box (which implements strings, arrays, regular expressions, dates, etc), which is different for each system.

Note also that we are definitely keeping TraceMonkey, because it has excellent performance on the right kind of code. There is no simple description of the “right kind of code”; one of the main reasons we are doing JägerMonkey is to provide more predictable high performance. JägerMonkey and TraceMonkey are going to work together, TraceMonkey running when it can and JägerMonkey when it must. We’re working on that right now–they work together for basic cases already, and we should have even advanced tracing, such as recursion, working with JägerMonkey soon.