30
Jul 10

What I learned at SIGGRAPH 2010: Wrap-up

Because I apparently don’t know how to read a calendar, I booked my SIGGRAPH 2010 travel to start one day before SIGGRAPH began, and end on the last day of SIGGRAPH, meaning I missed everything that happened on the last day. Having been to SIGGRAPH several times, I have learned the hard way several times that you definitely want to arrive on the first day (when the technical papers “fast-forward” ultra-lightning talks session is held) and leave after the last day, so you can attend whatever technical papers sessions and courses might be held on that day. (At minimum, take the red-eye out on the last day, but I don’t do that because red eyes are for suckers.

Even so, I learned a lot this year. It’s pretty amazing what gradient-domain filtering can do to images and video. GPU techniques continue, but from my point of view, they are less focused on the pattern of past years: “precompute for 17 hours and you can play with this in real time!” In the past, there was a lot of “we have this hammer and by God are we going to use it for every nail”; people are more realistic now.

There was also a lot more research focusing on fully automatic results. I saw a paper presented that offered a way of automatically figuring out how a set of gears worked just from the geometric model; all the user had to do was select the part that drove the system (like a drive shaft or hand crank). In past years, I suspect that system would have required the user to specify what each type of gear was, and maybe even how it turned. This is really the holy grail for people like me who are passionate about things that Just Work; we have all sorts of research that makes automatic solutions possible, and using it is immensely satisfying.

There was a lot of focus on validating research results with user study. Of course, most of these user studies comprise very small groups — around 20 people, from what I saw — but they provided a lot of good input on the applicability of the methods these researchers discovered.

Overall, I was very impressed with this year’s SIGGRAPH. A lot of researchers have spent a lot of time combining several years’ worth of work on various topics, and created some very compelling user experiences out of it. I highly recommend searching for “SIGGRAPH 2010″ on YouTube or the like; there is guaranteed to be something that’s up your alley there. (For example, Sony’s 360° 3D display, or automatically generated sound from rigid body fractures, or perhaps best of all, what I’d like to call “Photosynth for video,” except it gives you 3D animation from one pose to another. Seriously, check this out.)

I’ve enjoyed summarizing some of the interesting things I saw at SIGGRAPH, and I hope that people on Planet Mozilla have found it interesting and useful. There’s a lot of fantastic research out there; SIGGRAPH is just the tip of the iceberg. The ACM hosts many conferences on many different topics, and it’s one of many international bodies dedicated to research in computer science. I highly recommend people interested in computer science find a field they’re interested in and attend a conference on it. Research is great!


29
Jul 10

What I learned at SIGGRAPH 2010, day 4

Image/video modification and presentation

Ever since the image retargeting/seam carving paper was published in 2007, it seems like the research world has been on fire with methods of retargeting images with increasingly better results, later being extended to retargeting video. If you haven’t seen what can be done with this increasingly important method of image manipulation, I encourage you to look on Youtube for “image retargeting” or “liquid resizing.”

The important insight about retargeting is that it emphasized that the gradient operator is incredibly important in image manipulation, especially when combining multiple images. By solving with a least-squares solver for minimal gradient differences between seams of an image, you can get very shockingly great results.

I don’t have a reference for this, but I have been told that it is the case that human vision is more sensitive to gradient differences than absolute pixel values; that is, we can compare things that are directly adjacent much better than we can separately. In a discussion I had with a presenter after a session on the first day of SIGGRAPH, he brought up a corollary to this: we are more sensitive to temporal changes than side-by-side comparisons, meaning that we can see as the changes occur temporally much better than we can by comparing pixel values manually by looking back and forth. (This has great implications to photo editing software, which often presents “Before” and “After” shots side-by-side, or even on different monitors.)

One of the most interesting things I saw in the video sessions today was a method of generating a dense temporal “film strip” allowing you to very easily and intuitively scrub through video. Imagine a film strip showing the most salient frames from a video, only instead of in separate rectangles, the frames were blended together by using gradient optimization, and if you want to do a more fine-grained search, you can zoom in to this “film strip.”

Unfortunately this method required a significant amount of offline processing; when I asked the presenter whether it was applicable to streaming video, he thought it would work best if a simple uniform sampling of frames (rather than a method of finding the “most important” frames in a given time range) was used, and the “film strip” was filled in from the right as more video was downloaded. I’m still not convinced whether the performance can be acceptable, but the user experience for searching through videos was especially compelling.


27
Jul 10

What I learned at SIGGRAPH 2010, day 3

GPU rendering

GPUs sample pixels in a perhaps non-intuitive way. For multi-sample anti-aliasing, a pixel that intersects only one triangle is sampled for coverage data several times, but shaded only once, because texture lookups are appropriately filtered, and shading is the most expensive part of the GPU pipeline. The shaded colour is then attenuated by the coverage value. The only time shading happens multiple times per pixel is when multiple triangles cover a single pixel; in that case, the GPU samples each triangle separately.

In [1], the paper’s authors propose adding another step to the GPU pipeline that merges adjacent triangles that share an edge. This makes it possible to shade only once, reducing the amount of work necessary.

“2.5D” cartoons

[2] detailed a new technique for generating 2D cartoons that have some of the behaviour of 3D cartoons while maintaining the simple 2D look. It amounted to separating different strokes of the drawing into different layers, each of which could be rotated and occluded to rotate around the character. The key part of this is that each of the separate parts of character were “billboards” that always faced the viewer. They could be occluded by other parts, but you couldn’t look behind them. Further, once you defined what, say, your character looked like from the front and the side (perhaps his nose changes, and one of his ears is invisible, but his mouth probably looks the same), the system automatically lets you rotate between those two positions by interpolating between the drawings, and since it knows the relative ordering of your character’s parts, you can even rotate all the way around the character, and each part of the character will disappear as it’s occluded by the character’s body.

ASCII art

[3] was a great improvement of the libaa of old, though it did rely on a way of making outlines/vectors out of images that wasn’t detailed. They overlaid the vectorized images with a grid (of the size of the ASCII art image you want to generate), and then matched the line segments in each of those grids the known shapes of the font they use. Because these matches are sometimes close, they then perturb the lines (in a very controlled way) to try to get a better fit. Iterating on this produces some pretty great results which exceed the ability of ASCII artists to reproduce images, though the artists’ results were still preferred by a small majority for overall look.

1. Fatahalian, K., Boulos, S., Hegarty, J., Akeley, K., Mark, W., Moreton, H., Hanrahan, P. 2010. Reducing Shading on GPUs using Quad-Fragment Merging. ACM Trans. Graph. 29, 4, Article 67 (July 2010), 8 pages. DOI = 10.1145/1778765.1778804 http://doi.acm.org/10.1145/1778765.1778804.

2. Rivers, A., Igarashi, T., Durand, F. 2010. 2.5D Cartoon Models. ACM Trans. Graph. 29, 4, Article 59 (July 2010), 7 pages. DOI = 10.1145/1778765.1778796 http://doi.acm.org/10.1145/1778765.1778796.

3. Xu, X., Zhang, L., Wong, T. 2010. Structure-based ASCII Art. ACM Trans. Graph. 29, 4, Article 52 (July 2010), 9 pages. DOI = 10.1145/1778765.1778789 http://doi.acm.org/10.1145/1778765.1778789.


26
Jul 10

What I learned today at SIGGRAPH 2010, day 2

User interfaces for tweakable settings

For those unfamiliar with the field, tweaking a computer graphics rendering often involves playing with dozens of values, some with obvious meanings (colour), some less obvious (randomness). Lots of people have spent lots of time trying to refine these models, but today at SIGGRAPH 2010 a paper about user studies on a subset of these parameter-tweaking interfaces was presented.

In this study, three types of interfaces were evaluated: two which amounted to tweaking numeric values using sliders, and one which involved searching for the type of effect you’re looking for visually. The study consisted of users trying to match given outputs using the three methods as well as creating an entirely new rendering to fit in with an existing scene. The users who were participating in this study were all novices: they’d never done any rendering before this study.

Everyone involved followed the same pattern: playing with the controls to get a handle on what each of them does, then “blocking out” the values (getting them in the neighbourhood of correct) and moving on to the next set. Then, once each of the values was sort of correct, you go back and tweak the rendering by smaller and smaller amounts until you converge on the correct output.

Interestingly, though, users found (and the amount of time spent to get the correct value agreed with them) the slider interfaces about equally easy, and much easier than the visual search. In many cases, users artificially constrained their visual search to a slider-like couple of results in order to make their search easier. The visual search was found too cumbersome; while blocking was around as easy, tweaking was much more difficult.

However, precisely the opposite was found when the task was to generate something new to fit in with an existing scene. While the visual search was still just as difficult to tweak, its more unconstrained nature made it much easier for users to find something they liked as a starting point, and in the end users were happier with their results than with the slider-based approach.

What is the take-away message? In my mind, it’s that, when you need to make small changes iterating towards a goal, providing a highly granular and more easily tweakable interface is of utmost importance; however, when you’re just starting to create something, providing a more visceral, less controllable interface gives users a good starting point. Ideally, you’d provide a hybrid of both approaches, allowing users to define their direction in broad strokes and then tweak it quickly using more detailed controls.

Computer Graphics in history

I went to a presentation given by Richard Chuang (formerly from PDI) and Ed Catmull (from Pixar, and CG lore). It focused on a course that Catmull and Jim Blinn taught at Berkeley in 1980, and which Chuang audited via a microwave link to his workplace at HP. There was a lot of history of computer graphics in this course, and it was quite transformational in Chuang’s life; a year later he helped found PDI, which was later bought by Dreamworks.

The most important thing I took from this panel presentation was that you should always start with the hardest part of your project, because the easier parts will be informed by the choices you make. The specific example given was the choice many initial implementers of hidden-surface algorithms (occlusion, depth buffers, etc) made to ignore anti-aliasing, figuring it was a simple extension of their work. As it turns out, anti-aliasing is hard, and it is made even harder if your hidden-surface algorithm throws away data that you’d need to anti-alias, like the specific points in your polygons.


25
Jul 10

What I learned today at SIGGRAPH 2010

  • Sharp has a prototype 5-colour display (RGBYC) that is simply gorgeous, and will make people who just bought their 4-colour (RGBY) display jealous.

Image Statistics

  • If you make your image’s histogram flat, or its cumulative histogram have a roughly 45-degree slope (these are equivalent), its contrast will be higher and people will like how it looks.
  • However, matching histograms between images, while it can make one image have the colour palette of another, doesn’t produce pleasing results.
  • If you calculate the x and y gradients of a large volume of natural images (pictures of the real world), there is a large peak at a gradient value of zero. This implies that most of the natural world is composed of large homogenous surfaces instead of having lots of edges.
  • Further, the distribution of those gradients, both in x and y, falls off very quickly, implying that those edges that exist are mostly low contrast.
  • Finally, the distribution of those gradients is symmetric about 0, meaning that surfaces are mostly on top of the same background, like a window in a house.
  • If you take the power spectrum of an image, on average it follows the power law: A = 1 / fβ. If you plot these spectra on a logarithmic scale, you get a straight line with a slope of β.
  • Human beings are most sensitive to slopes of β = 2.8 to 3.2, but the average spectral slope of images is about 2.0. This implies that we’re tuned to see things that are really coarse, even though the average scene isn’t that way.
  • If you control for orientation, you figure out what an image is a picture of very effectively just by looking at their 2D power spectra!
  • PCA, Principal Component Analysis, is simply a method of computing the eigenvalues and eigenvectors of a set of data. This lets you figure out (by looking at which eigenvalue is the largest) where most of the variance of a set of data is; by sorting the values, you can figure out what your most important components are.

25
May 10

Hardware accelerating Firefox

Bas Schouten has posted a lot about hardware accelerating Firefox, specifically with Direct2D and OpenGL. However, there’s been some confusion as to what it is we’re going to hardware accelerate, and how we’re going to do it. This blog post aims to be the definitive reference for hardware acceleration in Firefox.

Please note: We have committed to turning on certain parts of hardware acceleration in developer previews, but that’s not a guarantee of those bits shipping in Firefox 4. We’re going to try hard to ship some form of acceleration, though.

Layers

Layers is a technology that lets conceptually simple, but computationally expensive, parts of the web get offloaded to GPUs. Examples of this include transparency, scaling, composition, and simple animations of those attributes. Since we’re dealing mostly with images on the GPU, we can also accelerate operations on these images, like converting colours; this is especially useful in the case of video, because we can get the GPU to convert from video’s native YCbCr to the GPU’s native RGB, and then scale the result, for example to fullscreen. Both of these operations are expensive on the CPU, but relatively free on the GPU.

The downside of layers is that, in order to get the sort of benefit that’s possible from GPU acceleration, a lot of analysis of the document needs to be done to identify the parts that need to be separated into their own layers. Some of this is relatively easy, like background images; other parts are much harder. Also, layers is designed to accelerate only portions of the web; this means that, in the common case, we will render most of the web page using software, then do only the hardest/slowest part using the GPU directly.

Code currently in mozilla-central supports layers in three modes: a basic software-only mode (using Cairo); a mode that combines software with OpenGL; and a mode that combines software with Direct3D 9 (D3D9). At the time of this writing, only full-screen HTML5 video is rendered using our GPU accelerated (OpenGL and D3D9) layers backends, but in the future we plan to accelerate all of our rendering. To test this hardware acceleration, right-click on a video and select “Full Screen.” GPU accelerated full-screen video is turned on by default; you can check its status in your error console.

Direct2D

Direct2D (D2D) is a Cairo backend that GPU accelerates everything displayed by Firefox. Some parts of the web are more amenable to acceleration than others; for example, moving and scaling images around (like in photos.svg) is much faster. The average page that displays a lot of text and perhaps a few images won’t see nearly as big a speedup, though. The Direct2D Cairo backend doesn’t require any special analysis like the layers backends require, because everything in Firefox is already rendered using Cairo.

The downside to Direct2D is that it’s only supported on Windows Vista + Platform Update and Windows 7, and only then on relatively new GPUs (generally, GPUs that support DirectX 10).

Direct2D support is currently turned off by default, but can be turned on manually using about:config.


30
Mar 09

Gecko’s new image cache

Now that I’ve finally checked bug 466586 in to the mozilla-1.9.1/Firefox 3.5 development branch, I consider the design of Imagelib’s cache finished. I planned on blogging about this a while ago, but other problems distracted me.

When I joined the Mozilla Corporation’s gfx group in February of 2008, I was tasked with what seemed like a simple job: create a hashtable-based cache for imagelib, so it no longer had to use necko’s memory cache. (The work to implement this new cache was tracked in bug 430061.) While this seemed like unnecessary reimplementation, I was assured by Stuart and Vlad that necko’s memory cache was meant for an entirely different class of object, and that the large images stored in it were crowding out those objects (such as pages loaded over SSL).

Initially, this seemed like a simple job, but it turned out to be a multi-month effort that involved a lot of rewriting, debugging, collaboration, and patience. The last two attributes were especially embodied by Boris Zbarsky, who went out of his way to help me debug problems I didn’t understand, reviewed far too many iterations of patches, and was generally helpful in a way that I think exemplifies Mozilla’s community spirit. Thank you, Boris.

The most important fruit of all this labour is the reduction in memory use it made possible: a clever eviction policy lets us halve the size of the cache while maintaining the same real-world performance.

The remainder of this post will be a detailed explanation of the cache’s design, how it is implemented, and how I came to the decisions I made. I plan on rolling this into into an MDC article at some point, so if you have questions, please ask them.

Continue reading →


26
Feb 09

How you can help with recent image library changes on mozilla-central

Over the course of Mozilla’s 1.9.1 branch development, I’ve made a number of pretty important changes to Gecko/Firefox’s image library. I plan on blogging more about those changes in the future, but in a nutshell, instead of using necko’s memory cache to store decoded images, imagelib (libpr0n) now uses its own hash table, eviction criteria, and resurrection methods to cache decoded images.

As part of the changes for bug 466586, I’ve made it so that images that are still in use – for example, an image displayed in a webpage that you have open – are always accessible from the cache. Only once the image isn’t being used anymore do we start thinking about removing it from the cache (and freeing up its memory). Unfortunately, my current patch (available on the bug, and checked in to mozilla-central) has some latent bugs that I have never been able to reproduce. I do have proof that these bugs exist: in particular, several crashes have shown up when expiring images from the cache since I checked in the patch.

Currently, this code is only checked in to mozilla-central, i.e., the 1.9.2 or mozilla-central branch. I don’t have any test cases for these bugs (though I’m trying to get URLs for the crashes), but if you see a crash in your up-to-date Minefield build (2009-02-18 or newer) that contains “img” in the stack trace, or you get an abort in your debug build in imgCacheEntry, please file a bug with what you were doing, whether it’s reproducible, what URL you were at, and any other details you can find. (MDC has some good information about how to report and get information on crashes in Firefox.)

The bug this patch fixes is a Firefox 3.1 beta 3 blocker, so the more users this code has (and consequently the more eyes we have on it), the better.

UPDATE: Bug 480352 has been filed on this unknown crash, along with some bare-bones instructions on how you can help.


09
Jan 09

Sept choses

Chris Blizzard tagged me, and this is as reasonable an introduction to Planet as any other. I’m Joe Drew, otherwise named JOEDREW! \o/ (apparently by Deb originally, but these things take on a life of their own). I currently work on imagelib in the Mozilla Corporation’s graphics group, out of MoCo’s Toronto office. I plan to blog about pictures on the open web; the exploits of Jeff Muizelaar, my partner in (graphics) crime here in the Toronto office; and non-Euclidean geometry.

Here are the rules for this particular meme

  1. Link to your original tagger(s) and list these rules in your post.
  2. Share seven facts about yourself in the post.
  3. Tag seven people at the end of your post by leaving their names and the links to their blogs.
  4. Let them know they’ve been tagged.

My seven things:

  1. I’ve had five non-retail jobs in my life:
    • In Grade 11, I had a summer job doing Microsoft Access and VB consulting at a small, now-defunct shop called Idea People. Surprisingly, this was my first job, even before any retail work I eventually did.
    • Three of my jobs came through the University of Waterloo‘s co-op program:
    • After graduating, I continued working at Side Effects Software for a couple of years, on all manner of 3D-related things. (I consider this the “same job” as my co-op terms at Side Effects.)
    • Finally, I found myself applying to and being hired by the Mozilla Corporation, my experiences at which is more or less what this entire blog will be about.
  2. Without my wife, who at the time was my girlfriend, I might not have gone to Waterloo at all. It was between U of T and Waterloo; U of T had offered me a substantial scholarship, on-campus residence, and had a professor call me to sell me on the school. Waterloo gave me no money and told me to find my own place to live. Lisa convinced me that Waterloo was still the right choice, which turned out to be true: without my co-op job experience, I would be in a very different place, career-wise.
  3. I’ve had precisely one significant other throughout my life: Lisa, my wife. We are literally highschool sweethearts who survived the separation of university to get married after we graduated.
  4. Before I wised up and moved to the city, I owned a car – a 2005 Saturn Ion. I wrecked it in a hilariously poorly-thought-out attempted U-turn to avoid a long line for left turns.
  5. I have never left North America, but Lisa was born in Malta.
  6. In the past, I was a fairly active Debian developer; I even organized DebConf 2 at York University here in Toronto. (These days, I am pretty disgusted with Debian; it seems it’s more about forcing other people to do what you want through voting than working out the best technical solution for users.) I have not used a Linux system at home since 2004.
  7. I wrote mpg321 as a Free alternative to mpg123 (back when mpg123 was still non-Free). I had all manner of plans to extend that software, but they fell by the wayside, as things do. I now use iTunes and my iPhone to listen to music.

And now, as is part of this meme, I get to tag folk.

  1. Vladimir Vukićević, my manager, because he doesn’t blog enough.
  2. Ryan North, because he is awesome and draws Dinosaur Comics.
  3. Shawn Wilsher, who doesn’t eat shawarma for unknown reasons.
  4. Dave Townsend, who now looks like a little boy.
  5. Jenny Boriss, because she makes me laugh.
  6. Katie Bonnar, who I beat out for Prime Minister of Student Council because of damn dirty tricks.

    And finally,

  7. Jeff Muizelaar, because he told me not to tag him.