Avoiding Catastrophic Backtracking in Apache RewriteRule Patterns

In the 0.9.2 release of MDN, we are going to fix some integration issues between Django, Deki, and phpBB using Apache mod_rewrite.

A horrible thought occurred to me… What if one of our RewriteRule directives was written in a way that would cause catastrophic backtracking? Ouch. This could lead to slow performance or no page rendering. Symptoms may include hours of headdesking and painful mod_rewrite debugging.

What is Catastrophic Backtracking?

Regular expression engines work by walking through a string trying to match a pattern. They may backtrack when they test the next character and the pattern doesn’t match, but other permutations exist that could form a valid match still exist. The engine backs up and retries the next possible path towards a match.

A few extra steps is always acceptable for the magic that is regular expressions, but pattern matching gets catastrophic when we’re talking about thousands of steps for inputs that are obviously invalid. We want our regular expressions to not match as quickly as possible to avoid catastrophic backtracking.

A poor regexp profiled in RegEx Buddy

Making this mistake is easier to do that one would think. The screenshot above is a simple contrived example. We try to find sentences that end with the letter "d". "hello world" should match and "hello worldz" should not. We can see several backtracking steps in the output on the right. The best thing to do is to test your regexp for negative cases with long input. Maybe you are Mike Shaver and can guess from timing your code, code reviewing your regexp, or writing unit test that exhibits catastrophic backtracking… but for us mere mortals this voodoo approach to finding backtracking issues is error prone. Problems may not surface until your system is under heavy load.

I tried a different tact, which is much more empirical. I installed RegEx buddy, a windows only tool (don’t worry, I have Windows safely caged in a VM on Ubuntu). This tool is marketed as a great resource for learning regexps, but it is also perfect for profiling your regular expressions. It will execute a pattern against a sample input and show you the total number of steps it took to match or fail.

Screenshot of RegexBuddy testing Rewrite Rule patternI put in my candidate Apache mod_rewrite patterns into the UI and feed it good and bad urls. I made sure that good urls matched in a reasonable amount of steps, but more importantly I tested how many steps a “bad” url took. Fake input like /en-US/some/wacky/page/should/fail/fast?foo=bar&baz=buz should stop within a dozen or less steps. Don’t just test for successful matches and call it a day. How does a small amount of non-matching input do? How about a large amount of non-matching input? If the engine takes thousands of steps before calling it quits, you know you’re gonna have issues – a slow web server… or worse.

RegEx Buddy is closed source and commercial, so I’d love to hear about FOSS tools that profile regexps.

6 responses

  1. Daniel Einspanjer wrote on :

    There are other Regex IDEs out there and at least one free (for personal use) one, http://www.weitz.de/regex-coach/ but I’ve never found one that compares to RegexBuddy.

    Also worth noting that RegexBuddy will run under Wine / Crossover in addition to a VM. That is much more comfortable for me.

  2. Toni Hermoso Pulido wrote on :

    I don’t know whether it’s comparable to RegexBuddy, but there is one based on Mozilla technology, commercial though, Komodo Rx Toolkit (http://www.activestate.com/blog/2010/11/rx-20-komodo-60)
    And I just found: https://addons.mozilla.org/firefox/addon/2077/ as well.

  3. Neil Rashbrook wrote on :

    So, did you discover any bad RegExes?

  4. Austin King wrote on :

    @Daniel – thanks, I had seen that. Edi Weitz writes a lot of awesome Open Source software

    @Toni – I’ll have to check to see if Komodo can profile regexp, thanks for the tip.

    @Neil – Nope, more of a reality check and I was found temporarily sane.

  5. Wladimir Palant wrote on :

    You can flip the javascript.options.relimit preference in Firefox to make it test for exponential regexps (switched on by default in Firefox 4). This might not be quite as comfortable as RegEx Buddy but if you test your regexp in error console on bad input and it throws you know that something is wrong.

  6. hector wrote on :

    Thanks for the post. Find it very useful. Thanks a lot for sharing it up. Great work!