Rant on Library IO

So I’ve been trying to figure out how optimize disk IO startup. I looked into IO caused by libraries and turns out that apps with big libraries are screwed. Here is how I came to this conclusion:

Gnomer’s research on startup pointed out that dumb readahead leads to wins in terms file io. So I wrote some code and sure enough, reading in libxul on top of our main() function does indeed result in a significant measurable speed-up on both Linux and OSX.

From the gnome page I found a link to some diskstat stuff. There lay a presentation with graphs that appear to show that OpenOffice has a much better cold IO pattern than Firefox. Given that there are some strong similarities between our application layouts I went digging to see if OpenOffice does something funny. And oh boy, it does do funny page reordering on Windows and “slightly-smarter-than-dumb-readahead-style library prefetch” on Linux…

So here is an innocent question: Why is page-reordering not done as a PGO step? I mean shouldn’t you fire up your app, feed some info back to the linker and be done with it? Another question: Why can’t we mark certain files as “keep this whole file in ram if someone asks for part of it to be paged in”?

So is the only way to fast application startup via static linking? It sure is easy to

posix_fadvise(open(argv[0],O_RDONLY),  POSIX_FADV_WILLNEED);

Are these hacks still the state of the art in making apps with large libraries startup fast?

Update: Found some mentions of GNU Rope unfinishedware and a relatively recent blog post


  1. someone did attempt reordering at some point: http://mxr.mozilla.org/mozilla-central/source/tools/reorder/

  2. Hilarious, you found a moz version of G[nu]rope!

  3. Funny that I had a look the grope paper yesterday, but could not find the source anywhere. The mxr link may be useful, I’m sure you can make it work! 🙂

  4. Not convinced the mxr link is any use, this thing is awfully old and never moved since check-in :
    1.1 waterson%netscape.com 2001-11-30 First checked in.

  5. Visual C++ PGO does do block reordering:
    but I’m not sure if it’s exactly what you’re asking for here.

    Note that we’re not building with PGO on Linux or OS X right now. Last time we tried it it wasn’t a big win, plus it always seems to hit GCC bugs.

  6. There is a lot to be said for static linking.

    A lot of the things (limited disk space, limited RAM) that made shared libraries so attractive are no longer an issue.

    Static linking (with link-time optimisation) allows for nice compact code that cold-loads very quickly.