DXR gets more correct, less case-sensitive

DXR, Mozilla’s evolving codebase search engine, has been taking patches at a furious rate these last two months. A great deal of work has gone into a UI refit, still in progress, which will improve discoverability, consistency, and power. Meanwhile, we have kept pushing more immediately enjoyable enhancements into production.

Cleaning Out The Pipes

One of these is a complete rewrite of our HTML generation pipeline. DXR pulls metadata about code from a number of disparate sources: syntax coloring from Pygments, locations of functions and macros from clang, links to Bugzilla bugs from a glorified regex. It then encodes those as begin-end pairs of text offsets, which it stitches together to make the final markup. However, the stitching was previously handled by a teetering state machine, stuffed info a single monolithic function with zero test coverage, replete with terrible mystery. As it turned out, it had been generating grossly invalid markup for some time. Fortunately, modern browsers are equally replete with terrible mystery and managed to make some semblance of sense out of things like </a></a></a>.

But now that’s gone away. The rewrite brings…

  • Correct markup
  • Support for line-spanning regions, as for multi-line comments or strings
  • Support for Windows line-endings (of which we did have a few in mozilla-central)
  • Full test coverage
  • And, perhaps most importantly in the long term, it modernizes our plugin contract by supporting annotation regions which overlap. This lets us enjoy truly decoupled plugins which no longer have to care if they’re used with others that emit overlapping regions. We can add plugins that support more languages and more types of analysis without having to worry about whether they’ll play nicely with the existing ecosystem. It also makes development of plugins outside the DXR codebase more practical.

    Other Improvements

    Other user-visible improvements include…

  • Case-insensitive searching for plain text. This is now the default.
  • Exposing values of constants using tooltips
  • Results now show in alphabetical order by path rather than in random order, so you can rule out entire directory trees more easily.
  • Searching for Layers.cpp:45 takes you straight to that line of the file.
  • Lexing .h files as C++ rather than C means we now highlight all those pesky C++ keywords.
  • We now syntax-color preprocessor directives in JS.
  • We’ve introduced override and overridden queries.
  • No more “l” in line-number anchors means no more mistaking them for “1”.
  • Fixed an off-by-one in line annotation position.
  • No longer consider uninitialized struct or class members to be var refs.
  • Support non-UTF-8 encodings of source files.
  • Distinguish identically named functions in different anonymous namespaces.
  • Thanks to James Abbatiello for lots of analysis improvements, Nick Cameron for the handy line-number search and syntax coloring, jonasac for several great fixes, and Schalk Neethling for a huge amount of work toward getting the UI refit out the door. If you’d like to join the DXR hacking community, we’ve got a nice ramp-up paved out for you and some easy bugs tagged.

    New UI Teasers

    As for the upcoming UI refit, there’s plenty in store:

  • A natural integration of the now fairly disjoint browse and search modes
  • Easy discoverability of all 26-or-so search filters: no more figuring them out through hearsay or by spelunking through the code
  • No more unpredictability of interface elements like the Advanced Search panel
  • First-class support for multiple trees, to be followed by more actual trees
  • A real query parser. You can express quotation marks without resorting to regexes, and you can use quoted strings as arguments to filters.
  • No more astonishing, disappearing splash page
  • Check out our mockups and our sometimes-broken staging site, and do keep the feedback coming. All of the above work was motivated by the comments you’ve already given us.

    Happy hacking!

    3 responses

    1. Nicholas Nethercote wrote on :

      “Easy discoverability of all 26-or-so search filters”

      Call ’em A to Z… amirite?

    2. Erik Rose wrote on :

      Optional Fortran-77-based query parser. Got it! 🙂

    3. Martin Hoggard wrote on :

      Now that’s a load of information to get the ball rolling. Keep writing in.