24
Aug 11

hg qedit

On his blog, Paul O’Shannessy came up with an ‘hg qedit’ alias that opens up an editor on your .hg/patches/series file for reordering your patch queue. It’s a nice simple solution to a common problem, so obviously I felt compelled to muck it up.

Here’s my version, for insertion into your ~/.hgrc:

[alias]
qedit = !S=$(hg root)/.hg/patches/series; cp $S $S.bak && perl -pale 'BEGIN { chomp(@a = qx(hg qapplied -q)); die if $?; @a{@a}=(); }; s/^/# (applied) / if exists $a{$F[0]}' $S > $S.new && ${EDITOR-vim} $S.new && sed -e 's/^# .applied. //' $S.new > $S
                                                                                                                                                                                                                   # Did you see this by scrolling over?
                                                                                                                                                                                                                   # I want better code snippet support

This fixes the main problem with zpao’s solution, which is that it’s too clean and simple.

No, wait, that’s not a problem.

The problem is that when I edit my series file, I often forget that I have some patches applied and end up reordering applied patches, which makes a complete mess. The above alias opens up an editor on your series file, only it also inserts comments showing which patches are already applied. (If you really, really want to mess yourself up, go ahead and reorder the commented lines. You’ll get what you deserve.)

Here’s what my queue looks like when editing the series file:

# (applied) better-dtrace-probes
# (applied) try-enable-dtrace
# (applied) bug-650078-no-remote
bug-677985-callouts
bug-677949-gc-roots
hack-stackiter

Come to think of it, mq really shouldn’t let you mess up that way in the first place. It knows the original patch names for your applied patches (unless you are really determined to make your life difficult, and commit things on top without going through mq at all). It could detect when you reordered applied patches, and just undo what you did. And call you names. But maybe that would slow things down.

Update: it wasn’t working for jlebar, which turned out to be because he had added qapplied=-v to his [defaults] section. The above is now fixed for that scenario by adding a -q flag to hg qapplied.


02
Aug 11

Zombie Hunting

Armless Zombies?

I’ve been looking at bug 669730 where enabling Firebug on a page (http://nytimes.com/ to be precise) results in the page’s compartment living forever. This is easy to see, now that we have the incredibly useful about:memory and its per-compartment breakdown. (What’s a compartment? It’s a memory space to keep related garbage collectable objects in. See the compartment paper, or for some more detail about how they are used in Firefox, try my contexts vs compartments writeup, though it’s more about contexts than compartments.)

I managed to find the object keeping the compartment alive, and I thought I should document what I did to either help other people hunt down zombie compartments, or beg for better tools, or both. (Note that I haven’t actually fixed the bug, so this is a little premature, but the hunting process is far more likely to be reusable and of general interest than the specifics of this leak.) Oh, and I didn’t actually figure everything out; I just kind of stumbled across the right answer.

I have a zombie compartment, so something in the compartment isn’t getting collected. That means there’s at least one GCthing alive in the compartment that shouldn’t be. The “inner” objects aren’t of much interest, so what really matters is that there’s at least one unwanted root. I want to figure out what that root is.

Things I know of that can be roots are pointers gathered by the conservative stack scanner, cross-compartment wrappers, and explicitly added GC roots.

The conservative roots are unlikely to matter here, because this leak survives returning to the event loop. The cross-compartment wrappers are what I initially suspected. I think that whenever XPCOM points to a JS object, or at least a JS object in a content compartment as this one is, it goes through a cross-compartment wrapper and the wrapped object is considered to be a GC root. So I want to see the objects rooted by cross-compartment wrappers.

I guess the wrappers I care about will actually live in a different compartment and point into the nytimes.com compartment. But it doesn’t seem to matter in this case, because the only function I could see to set a breakpoint in is JSCompartment::markCrossCompartmentWrappers() and in my test runs, it never seemed to hit. It looks like maybe it only gets called when doing a compartmental GC, and we probably don’t do many of those on a zombie compartment. Still, how do those roots get marked? I still don’t know, because while wandering around the code trying to figure out what was going on, I stumbled across the right place to watch for the third set of roots — explicitly added roots — and I took a detour to check those out. (Thinking about it, it’s not the cross-compartment wrappers that are the roots, it’s the objects they point to. Maybe those end up in the explicitly-added roots list? Dunno.)

Specifically, MarkRuntime() in jsgc.cpp iterates through a runtime’s gcRootsHash and calls gc_root_traversal on each one. That grabs out the pointer value and name (yay!) of each root and scans it. So all I needed to do was check each of these roots to see which compartment it’s in, and stop when the compartment is the one I care about. Fortunately, gc_root_traversal calls MarkIfGCThingWord and it already computes the compartment. (It’s just a bit of bit masking and pointer chasing to do manually, so it’s not a big deal anyway.)

Conditional breakpoints are great and everything, but from the name of the function it sounded like it might get called a lot, so I just crammed my own debug code into the routine:

    static JSCompartment *interesting_compartment = NULL;
    if (aheader->compartment == interesting_compartment)
        printf("root: %p kind %u\n", (void*)addr, thingKind);

Then I reran under gdb. I still needed the address of the compartment (unfortunately, about:memory only shows compartment pointers for chrome compartments). So I looked for something looping over compartments, found several, and set a breakpoint in one of them. I’m not sure which one. They all loop over rt->compartments. For each compartment, I displayed the principals->codebasevalue:

(gdb) display (*c)->principals ? *(*c)->principals : 0

Then I ‘n’exted through until I found the nytimes.com one. With that pointer in hand, I set a breakpoint on my added code, above, and set ‘interesting_compartment’ to my magic pointer value. This printed out the address of the root in question, together with its ‘thingKind‘ which was zero. A quick look at jsgc.h showed me that zero means FINALIZE_OBJECT0, and the code just after my printf showed that I can cast that to JSObject*. A call to js_DumpObject((JSObject*)0x7fffd0a99d78) told me that this was an ‘Error’ object. Even better, when I walked up the stack one level, I could see that the root was labeled “JSDValue”.

So JSD is hanging onto a content Error object, probably one that it grabbed from hooking exception throwing or catching. Is it JSD not discarding something when you turn it off, or Firebug holding onto the object itself? I don’t know yet.


Learnings:

  • We need an easy way to get the pointer value of all compartments. Maybe in the ?verbose=1 output of about:memory?
  • Enumerating roots is handy. We should expose a function to dump out the roots given a compartment, so we could do this whole analysis via the chrome-privileged Web Console.
  • That means we need a way to refer to compartments from JS. Perhaps a weak map from principals’ codebases to JSCompartment* objects?

Related: see Jim Blandy’s bug 672736 for adding a findReferences JS call that gives all of the incoming edges to a JS object. I originally misinterpreted that to mean displaying the full path from a GC root to an object, and I started out trying to use findReferences by grabbing any object in the zombie compartment and calling findReferences on it. But I stopped when I realized that, knowing as little about the memory layout as I do, it was probably easier for me to find the roots themselves than figuring out how to look into the chunks/arenas/arena pools/whatever for the compartment to grab out random GCthings. And all I wanted was the root, so findReferences wouldn’t be of interest unless it crossed the XPConnect boundary and told me what was keeping the JS object alive via some sort of wrapper.

Now please, someone comment and tell me how I could have done this much more easily…


26
Jun 11

Heredity puzzle

I know this is wrong, but I’m going to use my privileged status as a source for Planet Mozilla (ok, it’s not that privileged) to point people to a non-Mozilla-related puzzle on my personal blog, because the few friends I have who read it are lame and haven’t come up with any answers yet. (Did I mean “the few friends I have, who read it” or “the few who read it”? None of your business.) And I really want somebody to come up with a simple, understandable proof.

Warning: it involves incest, intergenerational sexual relations, and time travel. But what doesn’t, these days?

 


26
Jun 11

Firefox at work

As with several other people, I’m going to put my opinions on the whole “Firefox hates enterprise users” kerfuffle here. Why here? Because this way I don’t have to pretend to have read everyone else’s thoughtful and incisive comments on the mailing list thread, and because the conversation isn’t going to move from the mailing list to here so I can safely jot out my unconsidered opinions and then back away, quickly. (I am in fact not reading every message in the thread. I scan through it once every 1-2 days and look for messages written by a handful of users who tend to say things I find worth hearing.)

The conversation seems to be generating far more heat than light. One thing that strikes me is that some questions are worth considering, and others aren’t. For example: “should Mozilla treat enterprises as a priority?” is not worth considering. It’s nearly meaningless. Consider these answers: “Yes, of course it should! Mozilla cares about everybody!” “No, it costs too much given the small slice of our user base that it represents.” Is anybody happy with either of them? Can anyone figure out what to do based on either conclusion? I can’t.

So I wanted to write out some questions that I think are worth considering. But first, a pet peeve: I’m not going to use the word “enterprise”. It barely means something as a noun, and means everything and nothing as an adjective (which is worse than meaning nothing, because then at least readers don’t imbue it with whatever meaning it holds in their own heads.) Anyway, here goes:

  • If corporations abandoned Firefox, what impact would it have on Mozilla’s mission?

Forget for now why they’re abandoning FF. Maybe we decided to heighten security by rendering all HTTP pages in unselectable ROT-13 text, requiring our users to learn to decode them from memory or switch to HTTPS. Maybe we punish bad JS/server side coding practices by automatically detecting them and posting unencrypted passwords to a public server. Whatever it is, what impact would it have? How many people would not want to use a different browser at home and at work, and what would be the impact on our market share? What are the influence patterns that matter to us (eg new-to-the-Web users pick their browser based on what they or their friends know about from work)? How many add-on authors would support multiple platforms? Standardize on a single non-Firefox platform? Are corporate users already comfortable with using multiple browsers (eg IE6 for the intranet, something else for everything else)?

  • What are we actually changing that is relevant to corporate users?

I think the answer is something like: we’re releasing features at a faster cadence. The average feature per unit time metric isn’t intentionally being changed, though MoCo is doing a lot of hiring so that’ll probably ramp up too for reasons other than our release policy. And we’re no longer separating features from fixes.

IM(naive)O, business users care about the first (frequency of features becoming available), but not enough to override other concerns. That’s why long-term support versions exist. Actually, that’s an oversimplification: business users are the same as anyone else, and would much prefer to have the most features the soonest, but those damn IT departments get mad at them when they install the latest version of FirewallBusterSupreme the day before it’s officially released. Forgoing shiny new features is the cost of ensuring stability and predictability, and you never want the scheduling and cost of upgrades to be any more at the whim of your vendor than absolutely necessary. You have a core business function to worry about — unless of course your core business happens to be advising other businesses about the impact of software upgrades. (Which is a sucky business, by the way; it’s another one of those where your customers are pretty much guaranteed to hate you. Your only function is to tell them “no”.) So the key bit really is the separate of features from fixes — or really, “stuff that is likely to break my users” from “stuff that is relatively safe and likely to keep me closer to the status quo than not having it would be”. An obvious example of the latter is security fixes — the risk from the software change is less than the risk from people starting to exploit some new vulnerability.

  • What level of support would make the difference to “enough” corporate users?
  • What are the relevant types of support?

If 60 months of long term support for selected versions isn’t enough for an IT department, then nothing will be, so forget about those users. And what does “long term support” mean, anyway? It isn’t a binary distinction. 60 months of “all security fixes we ever make for any version” is obviously unsustainable. I’m not sure “critical security fixes” is adequately complete as a description either. The severity of a security fix leaves out many relevant factors: backwards compatibility, divergence from trunk, maintainability, etc.

  • What do various support options cost the Mozilla organization?

Cost, by the way, isn’t just measured in money coming out of MoCo’s pockets. Focus, maintainability, goodwill, brand, freshness, risk, etc.

  • What could we do to make it easier for someone (perhaps us) to better support IT department-blessed use?

If we offered up a package (access to sensitive bugs + cash + agreement to support specific components + testing infrastructure + …) to attempt to lure 3rd parties into maintaining older versions “for us”, would anyone bite? (“For us” in quotes because we’re the Mozilla community, and they’d immediately become part of “us” if they accepted.)

  • What is the connection between a rapid release cadence and long term support?

They’re not diametrically opposed, but rapid releases obviously introduce difficulties for long term support. You could do rapid releases without affecting long term support at all, if all you’re talking about is the frequency of releases. But we’re not; we want to release new features quickly. In the gray area are incompatible changes to existing functionality that aren’t critical for moving the Web forward.

  • What other groups are effected by the same long term support issues as “enterprise” (sorry) users?

Add-on authors have already been brought up. We at least have a defensible story there (mainly, “use the add-on SDK”.) That community could be usefully subdivided, but who else is affected?

Them’s all the thoughts that’ve leaked out of my brain so far. I’ll try to do a better job of keeping them inside where they belong.