06
May 14

my code search engine

Christian Legnitto wrote a blog post where he mentioned Firefox developers being forced to deal with “crufty code-search tools” (and many other perceived suboptimalities in the development process).  I’m looking forward to reading his followup, but I also thought it was worth blogging about what I use for my day-to-day code search needs.

I use Emacs’s rgreprgrep (and its cousin commands grep and lgrep) executes grep on a group of files in a given directory, recursively. The grep results are then displayed in a hyperlinked format for easy browsing between matches. Well, being Emacs, I use it with a few modifications of my own.  Here’s my setup.

First, a small utility I wrote for making quick selections with a single keypress:

(defun character-prompt (alist description)
  (message "Select [%s]: "
           (apply #'string (mapcar #'car alist)))
  (let* ((ch (save-window-excursion
               (select-window (minibuffer-window))
               (read-char)))
         (x (find ch alist :key #'car)))
    (cond
      ((null x)
       (message (format "No %s for character: %s" description ch))
       (sleep-for 1)
       (discard-input)
       (character-prompt alist description))
      (t (cdr x)))))

This function gets used in the small wrapper I wrote around rgrep. Some preliminaries first, like where the Firefox tree lives, files that contain overly long lines and therefore mess with Emacs’s hyperlinking, and directories that I generally don’t deal with in my day-to-day work.

(defvar froydnj-mozilla-srcdir (expand-file-name "~/src/gecko-dev.git/"))
(defvar froydnj-mozilla-ignored-files
  (list "searchindex.js"
        "jquery.js"
        "jquery.min.js"
        "interfaces.js"
        "socket.io.min.js"
        "jquery-ui-1.7.1.custom-min.js"
        "edit_area_full.js"
        "php.js"
        "packed.js"
        "socket.io.min.js"
        "named-character-references.html"
        "edit_area_full_with_plugins.js"
        "string-tagcloud.js"
        "string-unpack-code.js"
        "check-string-tagcloud.js"
        "check-string-unpack-code.js"
        "string-unpack-code.html"))
(defvar froydnj-mozilla-ignored-directories
  (list "nss" "nsprpub" "js/src/tests" "intl/icu"))

Next, the way I select subsets of files to search in. I learned after writing all this that rgrep already has built-in functionality for this (see the grep-files-aliases variable), but I like my setup better.

(defvar froydnj-mozilla-files
  '((?a . "*")                    ; All of it
    (?c . "*.[cm]*")              ; C/C++/Obj-C
    (?C . "*.[cmh]*")             ; Same, plus headers (and HTML, sadly)
    (?h . "*.h")
    (?H . "*.html")
    (?i . "*.idl")
    (?j . "*.js*")
    (?l . "*.lisp")
    (?m . "Makefile.in")
    (?p . "*.py")
    (?v . "*.java")
    (?w . "*.webidl")))

Finally, the wrapper itself, which prompts for the search pattern, the filename pattern, makes sure the directories and files above are ignored, and executes the search.

(defun froydnj-mozilla-rgrep ()
  (interactive)
  (let ((regexp (grep-read-regexp))
        (files (character-prompt froydnj-mozilla-files "filename pattern"))
        (grep-find-ignored-files (append grep-find-ignored-files
                                         froydnj-mozilla-ignored-files))
        (grep-find-ignored-directories (append grep-find-ignored-directories
                                               froydnj-mozilla-ignored-directories)))
    (rgrep regexp files froydnj-mozilla-srcdir)))

One other bit that I find useful is a custom name for each buffer. By default, the rgrep results are deposited in a buffer named *grep* (likewise for grep and lgrep; the following advice applies to them automatically), and future greps overwrite this buffer. Having records of your greps lying around is occasionally useful, so I’ve changed the hook that determines the buffer name for rgrep. The comments that refer to compilation are from a previous job where it was useful to launch compilation jobs from within Emacs. I keep meaning to tweak those bits to launch mach in various configurations instead, but I haven’t done so yet.

(defun froydnj-compilation-buffer-name (major-mode-name)
  (let ((cfg-regexp "\\([-0-9_.a-z/+]+\\.cfg\\)"))
    (cond
      ;; We're doing local compilation, stick the name of the release
      ;; configuration in the buffer name.
      ((or (string-match "^cd /scratch/froydnj/\\([^ ;]+\\)" command)
           (string-match "^build-config" command))
       (string-match cfg-regexp command)
       (concat "*compilation " (match-string 1 command) "*"))
      ;; We're doing remote compilation, note the machine name and
      ;; the release configuration name.
      ((string-match "^ssh \\([^ ]+\\)" command)
       (let ((machine (match-string 1 command)))
         (string-match cfg-regexp command)
         (concat "*compilation@" machine " " (match-string 1 command) "*")))
      ;; grep.el invokes compile, we might as well take advantage of that.
      ((string-equal major-mode-name "grep")
       (if (boundp 'regexp)
           (concat "*grep for " regexp "*")
           "*grep*"))
      ;; We have no idea, just use the default.
      (t
       "*compilation*"))))

(setq compilation-buffer-name-function 'froydnj-compilation-buffer-name)

Search times are comparable to web-based tools, and browsing results is more convenient. It has its shortcomings (overloaded C++ method names can be a pain to deal with, for instance), but it works well enough for 95%+ of my searching needs.