What is the point of non-fatal assertions?

When you run a debug build of Firefox you get a lot of stuff printed to stderr.  Some of it looks like this:

 ++DOCSHELL 0x7f53d836b800 == 3
 ++DOMWINDOW == 5 (0x7f53d836c078) [serial = 5] [outer = (nil)]

and I imagine it’s occasionally useful.

You also see a lot of warnings like these, which I just saw (with duplicates removed) when I just started up Firefox, loaded www.mozilla.com, and then exited:

 WARNING: 1 sort operation has occurred for the SQL statement '0x7f53e3d32208'.  See https://developer.mozilla.org/En/Storage/Warnings details.: file /home/njn/moz/mc2/storage/src/mozStoragePrivateHelpers.cpp, line 144
 WARNING: GetDefaultCharsetForLocale: need to add multi locale support: file /home/njn/moz/mc2/intl/locale/src/unix/nsUNIXCharset.cpp, line 146
 WARNING: Ignoring duplicate observer.: file /home/njn/moz/mc2/modules/libpref/src/nsPrefBranch.cpp, line 594
 WARNING: not an nsIRDFRemoteDataSource: 'remote != nsnull', file /home/njn/moz/mc2/rdf/datasource/src/nsLocalStore.cpp, line 312
 WARNING: NS_ENSURE_SUCCESS(rv, 0) failed with result 0x8000FFFF: file /home/njn/moz/mc2/content/base/src/nsContentUtils.cpp, line 2527
 WARNING: NS_ENSURE_SUCCESS(rv, rv) failed with result 0x80040111: file /home/njn/moz/mc2/content/base/src/nsFrameLoader.cpp, line 415
 WARNING: NS_ENSURE_SUCCESS(rv, rv) failed with result 0x8053000C: file /home/njn/moz/mc2/content/base/src/nsGenericElement.cpp, line 5484
 WARNING: NS_ENSURE_TRUE(aURI) failed: file /home/njn/moz/mc2/content/base/src/nsContentUtils.cpp, line 4706
 WARNING: NS_ENSURE_TRUE(!(mAsyncExecutionThread)) failed: file /home/njn/moz/mc2/storage/src/mozStorageConnection.cpp, line 784
 WARNING: NS_ENSURE_TRUE(pusher.Push(aBoundElement)) failed: file /home/njn/moz/mc2/content/xbl/src/nsXBLProtoImplMethod.cpp, line 327
 WARNING: nsExceptionService ignoring thread destruction after shutdown: file /home/njn/moz/mc2/xpcom/base/nsExceptionService.cpp, line 197
 WARNING: SQLite returned error code 1 , Storage will convert it to NS_ERROR_FAILURE: file /home/njn/moz/mc2/storage/src/mozStoragePrivateHelpers.cpp, line 113
 WARNING: Subdocument container has no content: file /home/njn/moz/mc2/layout/base/nsDocumentViewer.cpp, line 2398
 WARNING: Subdocument container has no frame: file /home/njn/moz/mc2/layout/base/nsDocumentViewer.cpp, line 2418
 WARNING: Subdocument container has no frame: file /home/njn/moz/mc2/layout/base/nsDocumentViewer.cpp, line 2418

The NS_ENSURE_SUCCESS and NS_ENSURE_TRUE ones in particular look like assertion failures.  I’ve heard before that Firefox assertions (outside the JS engine) are non-fatal.  I was wondering about this last week and had planned to ask someone about it in more detail.

But before I did that, I spent several hours yesterday debugging the crash in bug 654573.  Turns out the problem is that this assertion in mozStorageConnection.cpp fails:

 NS_ENSURE_FALSE(mAsyncExecutionThread, NS_ERROR_UNEXPECTED);

This causes this warning to be printed out at run-time:

 WARNING: NS_ENSURE_TRUE(!(mAsyncExecutionThread)) failed: file /home/njn/moz/mc1/storage/src/mozStorageConnection.cpp, line 799

Oh, and just to make things really complicated, here’s the definition of NS_ENSURE_FALSE, from nsDebug.h:

 #define NS_ENSURE_TRUE(x, ret)                                \
   PR_BEGIN_MACRO                                              \
     if (NS_UNLIKELY(!(x))) {                                  \
        NS_WARNING("NS_ENSURE_TRUE(" #x ") failed");           \
        return ret;                                            \
     }                                                         \
   PR_END_MACRO
 #define NS_ENSURE_FALSE(x, ret)
   NS_ENSURE_TRUE(!(x), ret)

Oh look;  if the condition fails it not only prints out a warning message, it also does an early return.  And that’s exactly what was causing the crash, because some code that followed the assertion wasn’t run, which meant that a database connection wasn’t closed properly, which caused a use-after-free error.

If this assertion had been fatal, I wouldn’t have spent several hours yesterday debugging this crash, because there wouldn’t have been a crash;  there would have been an assertion failure, and the problem would have been immediately obvious.  (As it happens, I eventually worked out the problem when I noticed that warning message among the pages of output that Firefox produces.)  If the assertion had been non-fatal but did not have the early return, there also probably wouldn’t have been a crash, but it’s hard to know, because once an assertion fails you’re pretty much in no-man’s land.

So, in summary, here are my thoughts about assertions:

  • Non-fatal assertions are close to useless.  They don’t get fixed.  The warnings just get lost among the noise of all the messages that go to stderr.
  • Non-fatal assertions that return early are worse than useless.  Especially since the early-return behaviour is entirely non-obvious from their names.

One suggestion I heard on IRC yesterday was that there just aren’t enough developers in general to fix all the assertion failures if we were to make them fatal.  But in contrast, the JS engine is able to use fatal assertions because it has a higher developer-to-code ratio than the rest of Firefox.  (This makes me also think of compiler warnings, where there’s a similar disparity between the JS engine and the rest of Firefox.)

Because this is Mozilla, I’m sure there is a ton of history behind this current situation that I’m unaware of.  I’d love to know what that is, and why we let obvious defects — because an assertion failure unambiguously indicates there’s a defect, be it in the code or the assertion itself — remain.  Thanks!

 

27 Responses to What is the point of non-fatal assertions?

  1. Benoit Jacob

    The only point I know for non-fatal assertions is to set XPCOM_DEBUG_BREAK=break to make them raise(SIGTRAP). But I would wholeheartedly support a change making them default to SIGABRT. If someone needs something like the current default behavior of NS_ASSERTION, that’s fine, but I think that ‘assertion’ is a misnomer for that. (see e.g. the standard assert() macro).

  2. I am not up on the rationale, but my understanding is that NS_ENSURE_* are not considered assertions; they’re meant to be one-line early-return-on-error-condition macros for contexts where what we really ought to use is exceptions. I don’t know why they print messages on failure; it does seem inappropriate for that use case.

    I am with you 100% on NS_ASSERTION and its relatives (NS_NOTREACHED is the only one I remember right now), which really are non-fatal assertions for supposedly can’t-happen conditions. My understanding of that is that we can’t just flip them to fatal because some of them are triggered a lot, but dbaron has a Plan.

  3. NS_ASSERTION is a non-fatal assertion. NS_ENSURE_X is not meant to be an assertion, fatal or otherwise – it’s a construct that’s used all over the codebase to signify that if certain condition does not hold, the function will return the provided value. It’s also noisy (ie. it has console spew). As such, I think the target of this post is a bit misdirected – whoever added the problem line was certainly aware of the early return, since that’s an integral part of it.

    • Nicholas Nethercote

      As such, I think the target of this post is a bit misdirected – whoever added the problem line was certainly aware of the early return, since that’s an integral part of it.

      It’s difficult to know if they were aware of it, but an early return was certainly the wrong thing to do here — it’s what indirectly caused the crash.

  4. Nicholas, those are warnings, not assertions. They’re an entirely different beastie from XPCOM assertions (which are also not fatal). For one thing, they don’t trigger the XPCOM_DEBUG_BREAK hook you mention.

    The NS_ENSURE_* macros whole purpose is to be early returns. In fact, there have been proposals to remove the NS_WARNING from them and only leave the early return part. There have also been proposals to stop using them altogether because those not familiar with the code might not realize they’re early returns.

    So in this case the real issue was just some code that did an incorrect early return. It would have been just as bad if it had done:

    if (mAsyncExecutionThread)
    return NS_ERROR_UNEXPECTED;

    except maybe easier to spot for someone not familiar with NS_ENSURE_*.

    • Nicholas Nethercote

      Zack, Boris, Josh: interesting, thanks. Indeed, the ENSURE macros have terrible names.

  5. Gecko distinguishes between “warnings” (unusual cases) and “assertions” (which indicate bugs). What you’re seeing here are warnings, not assertions.

    The primary purpose of NS_ENSURE_FALSE is to have an early return. (See https://developer.mozilla.org/En/Developer_Guide/Coding_Style#Use_the_nice_macros .) The warning is a side effect for the benefit of debugging.

    If you want to argue against the NS_ENSURE_* macros, argue that it’s bad to hide early returns, or that the entire failure model of Gecko C++ is busted because it relies too heavily on early returns.

  6. I, too, find it annoying that many warnings are printed during normal startup and shutdown. Warnings are for unusual cases, and startup and shutdown are not unusual.

    See https://bugzilla.mozilla.org/show_bug.cgi?id=fx-noise for a list of noisy warnings.

  7. I don’t think it matters much whether NS_ASSERTION becomes fatal.

    Tinderbox regression tests will catch bugs either way. So will fuzz-testing: over 500 of my NS_ASSERTION bugs have been fixed. And so will Bob Clary’s crash repro farm.

    Perhaps it matters for the few developers who do their normal web browsing with debug builds. But there you have a tradeoff between ensuring bugs are reported and scaring people away from browsing with debug builds.

    Brendan and Waldo have argued for a move to fatal assertions. Like you, they both primarily work on the JS engine, which has fatal assertions. I wonder if any Content or Layout developers feel strongly either way.

    I only really care in the case of assertions that correlate with memory safety.

  8. In addition to the above…

    There’s been talk of making assertions fatal (at least for when running tests). The problem, aiui, is twofold:

    * There are number of asserts hit in old crufty code, but no one is sure if they’re an indication of a real problem or just an unusual but ok situation.

    * There are enough asserts that it’s impractical to fix them all quickly enough to make them fatal, especially if more asserts get added/triggered in the meantime.

    I think dbaron’s make some progress in the last year or so with adding expected-assert counts to some tests (reftest? xpcshell?) so that we at least stop backsliding. It’s a start!

  9. Steve Fink

    I had no idea of the true nature of NS_ENSURE_* either. Now that I’m educated, I still dislike it.

    First, if the main point is to be an early return, then it should be called NS_RETURN_IF/NS_RETURN_UNLESS. Or NS_RETURN_IF_TRUE/NS_RETURN_IF_FALSE. Whichever. I have no problem with an early return, since RAII makes it a fine thing to do. But it’s a total footgun if its name doesn’t make that really, really obvious.

    Except you can’t, because it’s *not* just syntactic sugar for an early return. That wouldn’t require any sugar.

    if (!mAsyncExecutionThread) return NS_ERROR_UNEXPECTED;

    is way more clear and no longer. Ok, fine, make that

    if (!mAsyncExecutionThread)
    return NS_ERROR_UNEXPECTED;

    but if you’re making a macro just to avoid a coding convention that insists on

    if (!mAsyncExecutionThread) {
    return NS_ERROR_UNEXPECTED;
    }

    and that bothers you for a common case, then your coding convention is breaking your common case.

    So clearly it’s not just syntactic sugar for an early return. The only added functionality is the warning, and that repeats the condition, so suddenly using the macro makes sense. But only if the warning is really desired.

    Hence NS_ENSURE_TRUE really means NS_WARN_AND_RETURN_UNLESS. That’s pretty verbose, so the funky “ensure” name isn’t totally unreasonable.

    But it seems to me like there are 3 different ways that NS_ENSURE_* could be used:

    1. The user is lazily using it as an early return when the failure condition is reasonable and expected, even if uncommon. These uses should be replaced with ‘if (!cond) return RV’.

    2. The user is using it like an assertion. These should be replaced by something that is fatal in DEBUG builds.

    3. The user is correctly using it for situations that need to be reported to a human so that they may take some action, but which are expected enough that they should still allow things to proceed.

    What are these latter situations? They seem a little weird. And assuming they exist, and in the great numbers that our current code suggests, wouldn’t it be better to do

    static bool ns_inline_warn_if(bool cond, const char* msg, bool wantTrue) { … }

    #define NS_WARN_IF(cond) ns_inline_warn_if((cond), #cond, true)
    #define NS_WARN_UNLESS(cond) ns_inline_warn_if(!(cond), #cond, false)

    and then replace NS_ENSURE_TRUE(mycond, NS_ERROR_FOO) with

    if (NS_WARN_UNLESS(expect-true-cond))
    return NS_ERROR_FOO;

    and likewise

    if (NS_WARN_IF(expect-false-cond))
    return NS_ERROR_BAR;

    ? Or some similar reshuffling that exposes the return statement and declares what it’s going to do? Those can be changed mechanically, too, and then it would be easy to convert ‘NS_WARN_UNLESS’ to a plain ‘!’ if you later realize the warning is unnecessary.

    I don’t understand the attitude that it’s fine to spam stderr with crap. When someone is working on an unfamiliar part of the code base, watching stderr is a universally accepted way to discover that something horrible yet detectable is going on, and perhaps that the poor newbie is doing something boneheaded. Universally accepted where it hasn’t been screwed up, that is, as we have done so here.

    I really don’t see why we can’t adopt a scorched earth policy of shutting everything up and then re-adding things believed to be useful. And if something is useful to only a specialized audience, then tag those warnings so they only show up in a separate log. Is our logging facility inadequate for that purpose? PR_LOG makes you create your diagnostics channel; is that too heavyweight? If so, we need an alternative.

  10. I’m completely unfamiliar with Firefox codebase, but from the other comments, i’d like to show how the Glib used by GTK handles the same thing.

    For assertions: g_assert (and other g_assert_* derivatives)
    http://developer.gnome.org/glib/stable/glib-Testing.html#g-assert

    For unusual cases:
    g_return_if_fail, g_retun_val_if_fail, g_return_if_reached, g_return_val_if_reached.
    From the name, you immediately understand they do an early return.
    http://developer.gnome.org/glib/stable/glib-Warnings-and-Assertions.html#g-return-if-fail

    Your whole ambiguous stuff should be renamed.

  11. I occasionally run into JS assertions (often caused by trying to use Venkman, but there have been other reasons). Over the years there have been enough of them that I have made them non-fatal in my tree. Sometimes the browser crashes anyway, but that’s what just-in-time debuggers are for.

  12. Robert O'Callahan

    I have no love for NS_ENSURE_*. They could go away or stop warning as far as I’m concerned.

    Having NS_ASSERTION be non-fatal is valuable for debugging. It means you can run a test or set of tests and discover a number of bugs (possibly related), instead of just one followed by an instant crash.

    Now, if the first assertion indicates memory corruption, it’s quite likely that subsequent reports are meaningless or bogus, so I’m fine with asserts that indicate memory corruption being fatal. But that’s probably where the JS experience doesn’t extend to the rest of Gecko: in the JS engine, an assertion failure very often does mean bad code was generated or memory was corrupted and “all bets are off”. But in layout, for example, an assertion often only means that an element was laid out in the wrong place. It’s not the end of the world and there is no need to abort the test suite or crash the user’s browser.

    I used to dogfood debug builds. One reason I stopped was that I kept hitting fatal JS assertions (yes, I filed bugs, but it got tedious).

  13. Robert O'Callahan

    > I’d love to know what that is, and why we let obvious
    > defects — because an assertion failure unambiguously
    > indicates there’s a defect, be it in the code or the
    > assertion itself — remain.

    As you now know, the warnings you see aren’t assertion failures and aren’t unambiguously indicating defects.

    But still, you’re right that we aren’t trying to fix all known assertion failures — indeed, “we let obvious defects remain”. The reason is simply priorities. It’s the same reason we don’t simply stop adding new features and performance improvements for ten years while we fix all valid Bugzilla bugs. I hope it’s obvious that that would not be the best path for Mozilla or our users.

  14. First let me say that I love fatal assertions as they make it clear when I’ve found an exceptional condition which should have a bug filed in order for it to be fixed.

    As roc mentioned, the draw back to fatal assertions is their stopping the browser and making it impossible to find or diagnose other bugs which are hidden by the fatal assertion. For the most part, the JavaScript engine is a good example of how to use fatal assertions: fatal assertions are to be immediately fixed and not tolerated.

    Other components aren’t quite as conscientious about their fatal assertions. During crash testing, there have been cases where common, frequently occurring fatal Aborts have gone unfixed for significant periods of time. This is even the case in the JavaScript engine (Assertion failure: !p I’m looking at you!).

    These unfixed fatal assertions have the effect of hiding other bugs from detection which may be far more serious than the condition flagged by the unfixed fatal assertion. In my opinion, any condition which causes a fatal assertion should be one which is *unacceptable* and which will result in an immediate fix or back out of the offending patch. If the condition does not meet that criteria, it should not be fatal.

    As Jesse mentioned, the crash test automation does detect and record non-fatal ASSERTIONs though not much has been done with the results so far. As we get our new UI (thanks to Mark Cóté) I hope that it will become a useful tool in finding interesting bugs. It would be possible for the crash automation to detect and record WARNINGs in a similar fashion to its treatment of ASSERTIONs. Making WARNINGs more visible to analysis may be of help in diagnosing bugs.

    My viewpoint is:

    fatal assertions should only be used in cases where the condition reflects an unacceptable state which must be fixed immediately. In other words, they are automatically blockers. If it is not important enough to be fixed immediately, it should be immediately converted to a non-fatal assertion.

    fatal assertion messages should have a clear format. Currently we have at least two families: The JavaScript Engine with “Assertion failure: ….” and Abort with “ABORT: …”. We should have a common fatal assertion message format.

    non fatal assertions should be only used for conditions which reflect the existence of a bug which should be fixed but are not blockers.

    warnings should only be used for informational messages and not to indicate the presence of a bug.

    log messages should indicate which to which process they belong. It is increasingly unclear to me whether a particular message is coming from the content process or the plug-container process. The situation will become even more confusing as we go forward with e10s.

  15. 1) It looks from the comments like nobody really likes the NS_ENSURE_* macros. Probably that means that they should be replaced by something people like. Of course, in our codebase, that’s a huge project (and I forget the probably long discussions to find likable replacements – but maybe that can be solved by sitting bz, roc and others into a room at an all-hands and have them discuss until white smoke is rising).

    2) I thought that NS_ASSERTION was actually fatal in debug builds nowadays – and remember that we run tests on debug builds for every checkin…

  16. I actually like NS_ENSURE_*, especially NS_ENSURE_STATE. I try to use it when I really want to see the warning in the console.

  17. Given that there are two categories of NS_ASSERTION, we should make that plain in the code. Anything that hits in practice should become NS_FAKE_ASSERTION. That allows us to prevent regressions from known-good assertions, and to add new assertions which won’t regress.

  18. I’ve certainly worked mostly on the JS engine, but I’ve worked on other code, including layout and content code, and have used and seen fatal assertions there too. They usually work pretty well, too, as long as you’re careful about using them only when you really understand the code (or being willing to pounce quickly if they break).

  19. I would support a global search-and-replace change of NS_ASSERTION to something that makes it more obvious that it’s not fatal; I have seen actual bugs caused by someone forgetting that it isn’t.

    I’d personally like NS_NOTREACHED to act like NS_RUNTIMEABORT – i.e. fatal even in non-debug builds – because in the contexts where it is or should be used, you get *better* code that way, not to mention fewer spurious uninitialized variable warnings. Of course, dbaron has this thing about not putting default cases in switch statements over enums (which is where the NS_NOTREACHED would go) so we’d have to convince him he’s wrong, first. (The uninitialized variable spew in layout has caused me to miss bugs in my new code that I’d have caught if there were fewer ignorable warnings.)

    • Nicholas Nethercote

      Of course, dbaron has this thing about not putting default cases in switch statements over enums (which is where the NS_NOTREACHED would go) so we’d have to convince him he’s wrong, first.

      So when you add a new element to the enum you have no protection? Ugh, he is wrong. I’ve written lots of code in a language (Mercury) which catches missing enums in switch statements at compile-time. It’s wonderful. Run-time checks are the next-best thing, and no checks is the worst.

  20. Robert O'Callahan

    I completely agree with what bc says.

    Paul: NS_FAKE_ASSERTION is a silly name. NS_NONFATAL_ASSERTION would be better, although personally I don’t think we should change the names. We already have NS_ASSERTION (non-fatal) and NS_ABORT_IF_FALSE (fatal), and both are extremely widely used. Also, you may not be aware of this but we do have infrastructure to keep NS_ASSERTIONs in reftests from regressing. Assertions don’t need to be fatal for tests to detect them. That work does still need to be extended to mochitests.

    Zack: the idea that “assertion” means “fatal” and non-fatal assertions are “wrong” is simply a product of your upbringing :-). The bare meaning of “assertion” is simply “something that is true at this point in the code”.

    This discussion should really have been in dev.platform.

    • Nicholas Nethercote

      Zack: the idea that “assertion” means “fatal” and non-fatal assertions are “wrong” is simply a product of your upbringing :-) . The bare meaning of “assertion” is simply “something that is true at this point in the code”.

      libc’s assert() is fatal. That’s enough of a precedent that people (me included) are surprised that NS_ASSERTION is not fatal. You can argue that we’re reading too much into the name, or you can accept it’s a problem in practice and try to think of ways to avoid it.

  21. So when you add a new element to the enum you have no protection? Ugh, he is wrong. I’ve written lots of code in a language (Mercury) which catches missing enums in switch statements at compile-time. It’s wonderful. Run-time checks are the next-best thing, and no checks is the worst.

    He wants there to be no default case because (some) C++ compilers will throw you a warning if a switch doesn’t exhaust an enumeration *and* there’s no default case. Which I can see the point, but I’m never going to find those warnings amid the uninitialized variable chatter so by me it doesn’t actually accomplish the goal.

    … It occurs to me that the way we do assertions now, routing all classes of diagnostic through one function (NS_DebugBreak) whose behavior is unpredictable at compile time, we don’t actually get to take advantage of optimizations based on knowing that a particular control path terminates the program.