Is it possible to prove that one user interface is quantitatively better than another user interface? Is interaction design all about personal opinion? How does one tell what is good interface, and what is a bad interface? If two people disagree on a design, how do you tell who is right?
These are all questions that have been coming up around Mozilla, in threads on mozilla.dev.apps.firefox, and on irc.
Here are some of the approaches the Mozilla UX team have found useful to bring some science to a field normally thought of as being predominately classified as art. Unfortunately, we don’t have all the answers, each of these approaches to quantify usability have particular draw backs that have to be taken into account.
Your Mom is Not Statistically Significant
…of course that doesn’t mean she isn’t a very nice person, simply that there is only one of her. It’s important not to take your own personal usage patterns into account when designing interfaces, or discussing proposed designs. For instance, if you use a del.ico.us extension every day, and have tagged several thousand pictures on Flickr, it’s still important to remember some studies have found that only 7% of internet users say they tag information online. Or, you may personally use the history sidebar all the time, but you also need to consider that some studies have found that only .02% of page visits originate from history, and only 8% of users know about the feature. Often people will realize that they skew a bit more technical than your average user, so they will discuss the Web browsing habits of their mom (serving the role of non-technical archetype), but that’s just trading one sample size of one for another.
So the first way to bring science to user experience design is usage data. However, usage data has its problems. First, it’s easy to assume that the data is true because it is expressed quantitatively, but the data could still be wrong for a wide variety of reasons. Secondly, the data doesn’t capture everything. Certain observations, like “sometimes auto-completing URLs can lead to embarrassing situations” are impossible to pull out of usage logs. And thirdly, this data only tells you how users currently use an interface, not how they would prefer to use an interface.
In particular, usage data isn’t very helpful when designing brand new features. Studies about Web page revisitation might help a designer make more informed designs about a new bookmarking and history interface, but what about microformat detection? Since it is new, there are simply no usage studies to take into account.
Cognitive Performance Modeling
One way of quickly evaluating the efficiency of a new interface is through cognitive performance modeling, using a program like CogTool. By evaluating an interface with a simulated user, designers can rapidly iterate on their design without having to constantly bring in real users for testing. For example, cognitive modeling was used extensively by Google and NASA to improve the user interface of tab browsing in Firefox 2. For a feature we are considering for Firefox 3, microformat detection, we could use cognitive modeling to demonstrate how much time the average user would save when completing a task like adding an event to their calendar. However, cognitive performance modeling is based on the assumption that the user already knows how to interact with the interface. Additionally, it can’t tell you if users will actually want to use a particular feature, just how efficient the feature’s UI is.
You Can’t Ask Users What they Want
Analyzing the efficiency of a UI for a new feature like microformat detection doesn’t answer the question “do users actually want to quickly send events to their calendar?” Unfortunately, getting quantitative data for a question like this isn’t very easy. One direct way to is to literally ask users if they find this type of feature interesting. Unfortunately, this isn’t a very reliable source of data, because we can’t assume they actually know what they want. For instance, In a study conducted in the 1950s, people were asked whether they would prefer lighter telephone handsets, and on average, they said they were happy with the handsets they had (which at the time were made rather heavy for durability). Yet an actual test of telephone handsets, identical except for weight, revealed that people preferred the handsets that were about half the weight that was normal at the time (MIT 6.831 lecture).
Another problem with asking users what they want is that they have preconceived notions of what is possible. For instance, Dean Kamen conducted a number of user interviews with people in wheelchairs. Often, when people were asked how they would improve their wheel chair, they would focus on mundane improvements, like having some type of mud flaps to stop water from flying up from the wheels when in rain. Dean Kamen described the interviews saying “I thought they would have wanted to be able to travel up and down stairs.”
A Mozilla employee (who shall remain nameless), once told me about their involvement in one of the first user studies of tabbed browsing. This unnamed Mozilla employee remembered: “I didn’t like them, and I said I didn’t think other people would find them very useful. And tabbed browsing turned into one of our most popular features…”
So it would appear that the only way to get really accurate data on features is to let people actually use the feature for awhile, long enough for the novelty to wear off, and long enough to hammer out teething problems, and then analyze if people feel the feature is useful.
I should also note that just because you can’t always directly ask users what they want, that doesn’t mean that you can’t learn an incredible amount about their needs through observation. Often you can get a better sense of their problems from watching them complete tasks than from literally asking “what problems do you think we should solve.”
Also, just because asking users what they want isn’t always a reliable source of data, that doesn’t mean we don’t love hearing your opinions on what we should improve :)
Quantitative Aspects of Visual Design
Visual design is not purely art, there are different aspects of it that can be quantified. For instance, leveraging specific visual variables:

And Gestalt principles:

Aesthetics are subjective, but complexity is not. We can produce quantitative measures of how complicated an interface is, both in terms of interactive elements and in terms of visual elements. And these are really not the same thing. For instance back and forward in Firefox consists of 4 visual elements and 4 interactive elements:

Back and forward in Safari consists of 1 visual element, and 4 interactive elements (including press and hold):

Unfortunately simplicity is not necessarily king, pressing and holding the back button isn’t exactly discoverable, but the controls are noticeably simpler.
Quantitative Aspects of Usability: Heuristic Evaluation
There are a lot of usability heuristics we can use to evaluate an interface, identifying usability problems without ever having to rely on our personal views of what makes something easy or hard to use. These heuristics serve as great guidelines for identifying specific usability problems.
Intuition and Experience
While all of these quantitative aspects of interface design can help during the design process, decisions will often still come down to the designer’s intuition, which is gained through experience and training. Just as developing software can be full of trade offs, (like using compression to trade processing power for storage space), interface design is also full of trade offs (like simplicity versus discoverability). Knowing the right choice when dealing with these types of tradeoffs often comes from experience, but having all of the above sources of quantitative data certainly can help.