Jul 10

What I learned at SIGGRAPH 2010: Wrap-up

Because I apparently don’t know how to read a calendar, I booked my SIGGRAPH 2010 travel to start one day before SIGGRAPH began, and end on the last day of SIGGRAPH, meaning I missed everything that happened on the last day. Having been to SIGGRAPH several times, I have learned the hard way several times that you definitely want to arrive on the first day (when the technical papers “fast-forward” ultra-lightning talks session is held) and leave after the last day, so you can attend whatever technical papers sessions and courses might be held on that day. (At minimum, take the red-eye out on the last day, but I don’t do that because red eyes are for suckers.

Even so, I learned a lot this year. It’s pretty amazing what gradient-domain filtering can do to images and video. GPU techniques continue, but from my point of view, they are less focused on the pattern of past years: “precompute for 17 hours and you can play with this in real time!” In the past, there was a lot of “we have this hammer and by God are we going to use it for every nail”; people are more realistic now.

There was also a lot more research focusing on fully automatic results. I saw a paper presented that offered a way of automatically figuring out how a set of gears worked just from the geometric model; all the user had to do was select the part that drove the system (like a drive shaft or hand crank). In past years, I suspect that system would have required the user to specify what each type of gear was, and maybe even how it turned. This is really the holy grail for people like me who are passionate about things that Just Work; we have all sorts of research that makes automatic solutions possible, and using it is immensely satisfying.

There was a lot of focus on validating research results with user study. Of course, most of these user studies comprise very small groups — around 20 people, from what I saw — but they provided a lot of good input on the applicability of the methods these researchers discovered.

Overall, I was very impressed with this year’s SIGGRAPH. A lot of researchers have spent a lot of time combining several years’ worth of work on various topics, and created some very compelling user experiences out of it. I highly recommend searching for “SIGGRAPH 2010″ on YouTube or the like; there is guaranteed to be something that’s up your alley there. (For example, Sony’s 360° 3D display, or automatically generated sound from rigid body fractures, or perhaps best of all, what I’d like to call “Photosynth for video,” except it gives you 3D animation from one pose to another. Seriously, check this out.)

I’ve enjoyed summarizing some of the interesting things I saw at SIGGRAPH, and I hope that people on Planet Mozilla have found it interesting and useful. There’s a lot of fantastic research out there; SIGGRAPH is just the tip of the iceberg. The ACM hosts many conferences on many different topics, and it’s one of many international bodies dedicated to research in computer science. I highly recommend people interested in computer science find a field they’re interested in and attend a conference on it. Research is great!

Jul 10

What I learned at SIGGRAPH 2010, day 4

Image/video modification and presentation

Ever since the image retargeting/seam carving paper was published in 2007, it seems like the research world has been on fire with methods of retargeting images with increasingly better results, later being extended to retargeting video. If you haven’t seen what can be done with this increasingly important method of image manipulation, I encourage you to look on Youtube for “image retargeting” or “liquid resizing.”

The important insight about retargeting is that it emphasized that the gradient operator is incredibly important in image manipulation, especially when combining multiple images. By solving with a least-squares solver for minimal gradient differences between seams of an image, you can get very shockingly great results.

I don’t have a reference for this, but I have been told that it is the case that human vision is more sensitive to gradient differences than absolute pixel values; that is, we can compare things that are directly adjacent much better than we can separately. In a discussion I had with a presenter after a session on the first day of SIGGRAPH, he brought up a corollary to this: we are more sensitive to temporal changes than side-by-side comparisons, meaning that we can see as the changes occur temporally much better than we can by comparing pixel values manually by looking back and forth. (This has great implications to photo editing software, which often presents “Before” and “After” shots side-by-side, or even on different monitors.)

One of the most interesting things I saw in the video sessions today was a method of generating a dense temporal “film strip” allowing you to very easily and intuitively scrub through video. Imagine a film strip showing the most salient frames from a video, only instead of in separate rectangles, the frames were blended together by using gradient optimization, and if you want to do a more fine-grained search, you can zoom in to this “film strip.”

Unfortunately this method required a significant amount of offline processing; when I asked the presenter whether it was applicable to streaming video, he thought it would work best if a simple uniform sampling of frames (rather than a method of finding the “most important” frames in a given time range) was used, and the “film strip” was filled in from the right as more video was downloaded. I’m still not convinced whether the performance can be acceptable, but the user experience for searching through videos was especially compelling.

Jul 10

What I learned today at SIGGRAPH 2010, day 2

User interfaces for tweakable settings

For those unfamiliar with the field, tweaking a computer graphics rendering often involves playing with dozens of values, some with obvious meanings (colour), some less obvious (randomness). Lots of people have spent lots of time trying to refine these models, but today at SIGGRAPH 2010 a paper about user studies on a subset of these parameter-tweaking interfaces was presented.

In this study, three types of interfaces were evaluated: two which amounted to tweaking numeric values using sliders, and one which involved searching for the type of effect you’re looking for visually. The study consisted of users trying to match given outputs using the three methods as well as creating an entirely new rendering to fit in with an existing scene. The users who were participating in this study were all novices: they’d never done any rendering before this study.

Everyone involved followed the same pattern: playing with the controls to get a handle on what each of them does, then “blocking out” the values (getting them in the neighbourhood of correct) and moving on to the next set. Then, once each of the values was sort of correct, you go back and tweak the rendering by smaller and smaller amounts until you converge on the correct output.

Interestingly, though, users found (and the amount of time spent to get the correct value agreed with them) the slider interfaces about equally easy, and much easier than the visual search. In many cases, users artificially constrained their visual search to a slider-like couple of results in order to make their search easier. The visual search was found too cumbersome; while blocking was around as easy, tweaking was much more difficult.

However, precisely the opposite was found when the task was to generate something new to fit in with an existing scene. While the visual search was still just as difficult to tweak, its more unconstrained nature made it much easier for users to find something they liked as a starting point, and in the end users were happier with their results than with the slider-based approach.

What is the take-away message? In my mind, it’s that, when you need to make small changes iterating towards a goal, providing a highly granular and more easily tweakable interface is of utmost importance; however, when you’re just starting to create something, providing a more visceral, less controllable interface gives users a good starting point. Ideally, you’d provide a hybrid of both approaches, allowing users to define their direction in broad strokes and then tweak it quickly using more detailed controls.

Computer Graphics in history

I went to a presentation given by Richard Chuang (formerly from PDI) and Ed Catmull (from Pixar, and CG lore). It focused on a course that Catmull and Jim Blinn taught at Berkeley in 1980, and which Chuang audited via a microwave link to his workplace at HP. There was a lot of history of computer graphics in this course, and it was quite transformational in Chuang’s life; a year later he helped found PDI, which was later bought by Dreamworks.

The most important thing I took from this panel presentation was that you should always start with the hardest part of your project, because the easier parts will be informed by the choices you make. The specific example given was the choice many initial implementers of hidden-surface algorithms (occlusion, depth buffers, etc) made to ignore anti-aliasing, figuring it was a simple extension of their work. As it turns out, anti-aliasing is hard, and it is made even harder if your hidden-surface algorithm throws away data that you’d need to anti-alias, like the specific points in your polygons.

Jul 10

What I learned today at SIGGRAPH 2010

  • Sharp has a prototype 5-colour display (RGBYC) that is simply gorgeous, and will make people who just bought their 4-colour (RGBY) display jealous.

Image Statistics

  • If you make your image’s histogram flat, or its cumulative histogram have a roughly 45-degree slope (these are equivalent), its contrast will be higher and people will like how it looks.
  • However, matching histograms between images, while it can make one image have the colour palette of another, doesn’t produce pleasing results.
  • If you calculate the x and y gradients of a large volume of natural images (pictures of the real world), there is a large peak at a gradient value of zero. This implies that most of the natural world is composed of large homogenous surfaces instead of having lots of edges.
  • Further, the distribution of those gradients, both in x and y, falls off very quickly, implying that those edges that exist are mostly low contrast.
  • Finally, the distribution of those gradients is symmetric about 0, meaning that surfaces are mostly on top of the same background, like a window in a house.
  • If you take the power spectrum of an image, on average it follows the power law: A = 1 / fβ. If you plot these spectra on a logarithmic scale, you get a straight line with a slope of β.
  • Human beings are most sensitive to slopes of β = 2.8 to 3.2, but the average spectral slope of images is about 2.0. This implies that we’re tuned to see things that are really coarse, even though the average scene isn’t that way.
  • If you control for orientation, you figure out what an image is a picture of very effectively just by looking at their 2D power spectra!
  • PCA, Principal Component Analysis, is simply a method of computing the eigenvalues and eigenvectors of a set of data. This lets you figure out (by looking at which eigenvalue is the largest) where most of the variance of a set of data is; by sorting the values, you can figure out what your most important components are.