Gecko’s new image cache

Now that I’ve finally checked bug 466586 in to the mozilla-1.9.1/Firefox 3.5 development branch, I consider the design of Imagelib’s cache finished. I planned on blogging about this a while ago, but other problems distracted me.

When I joined the Mozilla Corporation’s gfx group in February of 2008, I was tasked with what seemed like a simple job: create a hashtable-based cache for imagelib, so it no longer had to use necko’s memory cache. (The work to implement this new cache was tracked in bug 430061.) While this seemed like unnecessary reimplementation, I was assured by Stuart and Vlad that necko’s memory cache was meant for an entirely different class of object, and that the large images stored in it were crowding out those objects (such as pages loaded over SSL).

Initially, this seemed like a simple job, but it turned out to be a multi-month effort that involved a lot of rewriting, debugging, collaboration, and patience. The last two attributes were especially embodied by Boris Zbarsky, who went out of his way to help me debug problems I didn’t understand, reviewed far too many iterations of patches, and was generally helpful in a way that I think exemplifies Mozilla’s community spirit. Thank you, Boris.

The most important fruit of all this labour is the reduction in memory use it made possible: a clever eviction policy lets us halve the size of the cache while maintaining the same real-world performance.

The remainder of this post will be a detailed explanation of the cache’s design, how it is implemented, and how I came to the decisions I made. I plan on rolling this into into an MDC article at some point, so if you have questions, please ask them.

Image loading is managed by two classes: imgRequest and imgRequestProxy. imgRequest is the class that takes care of the network load and decoding images, but since the same image can be loaded multiple times, when you call imgLoader::LoadImage() you get a reference to an imgRequestProxy. imgRequestProxy is a lightweight handle that implements nsIRequest and imgIRequest; it gets events from, and forwards events to, its imgRequest.

Since imgRequest does the actual work, it’s the class that needs to be kept in the cache. To make doing this a bit easier, I created a data class, imgCacheEntry, that holds a strong reference to imgRequest (which also holds a strong reference to imgCacheEntry, creating a cycle that is broken only once the imgRequest has no more listeners), and also contains useful information, such as last touched time and size. It is these imgCacheEntries that the cache manipulates.

The cache itself consists of three data structures:

  1. A hash table from URI to imgCacheEntry. Every image in memory is accessible from this hash table unless it has been explicitly removed from the cache.
  2. An instance of my favourite data structure of all time, the heap. (Dijkstra invented the heap when he needed a faster way to implement his shortest-path graph algorithm, because that is the sort of thing that Real Computer Scientists do.)

    Once there are no more listeners for a given image – say, once a user has navigated away from a page – we add the image to this heap, and evaluate whether anything needs to be removed from the cache due to space restrictions. If so, the image at the top of the heap is evicted, and (since there are no other references to the image) it is deallocated. (I plan a blog post on precisely how I came to the eviction scheme used in imagelib in the near future.)

    (A brief aside about performance: Even though heaps are relatively quick, updating one on every change to the cache turned into a 2x performance hit. The main problem is that updating the heap implemented in the STL (I believe a binary heap) is an O(n) operation. I investigated a couple of different replacement data structures for solving this – one, described in a paper called “Implicit dictionaries with O(1) modifications per update and fast search” turned out to not be applicable to the problem; another, Fibonacci heaps, looked promising, but its attributes weren’t precisely right. In the end, it turned out that lazy evaluation – marking the heap as dirty, and re-heapifying only when necessary – solved the performance problem.)

  3. Our memory tests showed that we weren’t freeing up as much data after all tabs had been closed, so I added an nsExpirationTracker to the mix.

    nsExpirationTracker runs a timer to forcibly expire elements from the cache after some length of inactive time. (This is currently set to 10 seconds, which feels a little quick. I suspect we’d be better off if the image cache expiry time was the same as the bfcache expiry time, 30 minutes.) Like the heap, images are only added to the expiration tracker once they have no listeners, since evicting a cache entry that has a reference count above 1 makes website authors depending on previously loaded images being available cry.

These three structures combine with imgRequest and imgRequestProxy in probably-overly-complex ways to implement our image caching strategies. I had lots of problems along the way – from many, many leaks that took weeks cumulatively to debug, to trying to understand necko’s twisty passages, to debugging topcrashes without ever having seen the crash myself – but it’s made me a better developer, and given me a chance to understand a lot more about this code which I have come to own.

7 comments

  1. Joedrew wrote:
    > The main problem is that updating the heap implemented
    > in the STL (I believe a binary heap) is an O(n)
    > operation.

    Are you sure? push_heap and pop_heap should be O(log n):

    http://www.cppreference.com/wiki/stl/algorithm/push_heap

  2. Yes, that’s the whole point! I worded this poorly.

    Whenever I removed an element, though, through Cancel() or other reasons, or – and this is much more important – an element’s weight got changed, either because it got another cache hit, or we loaded more information from the network, we had to heapify all over again. Heapification is O(N).

    Fibonacci heaps have an O(1) |decrease key| operation, but unfortunately that doesn’t map directly to the changes that happen while images are loading and being used.

  3. pop_heap, decrease_key, then push_heap is still O(log n), compared with make_heap of O(n). I agree that esoteric heaps may have better run-times. Do you need to change multiple weights at once? How big is n?

  4. n is small, on the order of a couple hundred elements at the most (5 MB cap on the heap’s contents). There were two problems that I saw: at at least one point, I was convinced that I needed access to both decrease_key _and_ increase_key – which isn’t available in most heaps, and isn’t implemented on Mozilla’s target platforms’ STLs as far as I can tell.

    The second problem was that Fibonacci heap, as I saw it, didn’t support increase_key in an O(1) way. And doing even O(log(n)) work on each of these elements (small n at any one time, but large churn) was, in my mind, a non-starter.

    And, when it came to choosing between implementing a heap on my own and just trying lazy updating, _of course_ I chose the latter. :)

  5. Joedrew wrote:
    > And doing even O(log(n)) work on each of these elements
    > (small n at any one time, but large churn) was, in my
    > mind, a non-starter.

    Let’s ignore esoteric heaps and concentrate on STL heaps. I don’t understand the use case well enough, but many O(log n) may beat a few O(n) even for small n. Probably best to benchmark push_heap and pop_heap instead of guessing. O(log n) may be faster than you think!

  6. I admit that I haven’t benchmarked this at all, but there is honestly no reason to – image cache manipulation went from the #1 hotspot to not even in the profile for a Tp run when I changed to dirty bits from always updating.

    (One problem is that we can get lots of “change key” notifications in a row as an image incrementally downloads and decodes. This is a lot of useless churn. I suppose I could combine both push/pop and dirty bits, but as I said, it’s not a hotspot at all.)

  7. Measure twice, cut once. ;-)

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>