When we were receiving feedback during the development of the Windows themes for Firefox 3, one question tended to come up a lot:
Why are there three distinct visual styles of controls in the main toolbar?
This is a really good question, and something that I don’t think I adequately explained during the design process. Here are the current main toolbar icons:

And as you may have noticed if you customized your toolbar configuration, there are some intentional height differences in the set when using large icons:

So you might be wondering (or regularly emailing me):
How did we end up with those?
It’s kind of a long story. Going back to some of the very initial sketches of Firefox 3 about a year ago (before my post about the wireframes), we were interested in using the control layout shown below for the Windows theme. The very original design never got past the whiteboard stage, but here is a recreation of it using some of the final graphics for Firefox 3.

In my notes I was referring to this design as “Solar,” as a reference to the back button being the Sun and smaller controls orbiting around it. What I personally really liked about this design was that it structurally grouped controls like back, forward, reload and stop, with the contents of the location bar. This visual design made it conceptually clearer that controls like back and forward would effect the contents of the location bar, because they were physically joined. Some alternate designs that I also really liked by Mike Beltzner and Madhava Enros moved the stop button inside of the location bar, having it only appear during page load, along with some Safari-style progress bar feedback replacing the throbber.
Solar was modified to what eventually became the keyhole design primarily due to implementation considerations. We couldn’t create multiple versions of the same controls in the customization palette, so that made it impossible to provide users with both the combined control set of Solar, along with individual controls that would give users the flexibility to create any configuration they wanted (“Mr. Potato Browser, and his bucket of parts”).

Another implementation consideration was that for performance reasons we couldn’t make modifications to the toolbar graphics based on the current control layout. This would have allowed us to draw the etch in the background on Solar, and to group reload and stop into a single visual form on OS X’s Proto theme (I should note this is now possible using adjacent selectors without the performance problems):

Ultimately we still wanted to ship the visual hierarchy aspects of the original Solar design in Firefox 3, as opposed to releasing a design with equally weighted controls (similar to Firefox 1 and 2). However, we had to make several modifications and compromises that hurt both the original design and user’s ability to customize their control scheme in Firefox 3:
-We dropped the etch that was grouping the controls on Windows
-We shipped the keyhole shape for the navigation controls, but we unfortunately made it impossible for users to separate the back and forward buttons when customizing their configuration
-We kept the size of controls like reload and stop a little smaller to maintain the visual hierarchy of controls, but not so small that they would look incredibly strange when the user customized their control scheme
-We returned the home button to the main toolbar (it had been placed on the bookmarks toolbar), due to some negative feedback from beta testers who didn’t like us modifying the traditional core set of browser controls.
That’s the long answer for why the icons on windows are currently different heights, they allow us to have some visual hierarchy in the default control scheme, without totally breaking customization in the process.
What’s so special about visual hierarchy anyway?
Visual hierarchy is a term more often used in the context of static design work, like print and some forms of Web design, however it also can bring some real benefits to interactive interfaces. To create a visual hierarchy a designer uses various visual variables like color, contrast, texture, shape, position, orientation and size to draw the user’s attention to some elements, while simultaneously drawing the user’s attention away from other elements. The designer can also choose to visually group related controls together.
To move the discussion to a more theoretical level, let’s consider both physical interfaces like remote controls and stuff that Apple makes, as well as software interfaces like a Web browser.
Here are two extremes when it comes to visual hierarchy in a physical interface, the TiVo remote vs. the PS2 remote (or really the TiVo remote vs. any number of remotes designed by a variety of consumer electronics companies for several decades). The TiVo remote designers went through literally hundreds of design iterations, while the PS2 remote’s design appears to be primarily based on the most logical layout for the underlying circuit board.
The two remotes are similar in that they both avoid an extensive use of color, however they differ greatly in the way that they leverage the shape, size, and location of different controls.

This is a stupid comparison, physical interfaces are totally different from software!
Well, yes and no. A good physical interface will not directly translate well to a good software interface, but I believe a lot of the underlying principles of what makes any interface good or bad are still the same. For instance, Fitt’s law is applicable in a variety of different physical environments, using different limbs (or eyes), and different types of input devices, as long as you adjust the constants a (start / stop time) and b (speed of the device). Similarly, when it comes to creating a visual hierarchy, there are a few underlying principles that if applied can improve any interface, regardless of if it is a remote control, or a Web browser toolbar. These properties are universal, and here are four benefits that come to mind:
1. The Ability to Create a Lighter and Simpler Appearance (Better Visual Design)
By combining shapes and visually grouping related components, you can create interfaces that are simpler. For instance, the view of the TiVo remote above and the section of the Sony remote from the play / pause / stop controls on up both contain 28 controls, but the Sony remote seems more overwhelming. Or to look at a more specific example, the directional pad on the TiVo remote looks like 1 control, while the directional pad on the Sony remote looks like 4 controls. We see this visual collapsing together of controls in Apple products like the iPod and the new MacBook Pro trackpad, as well as in other Web browsers.

2. The Ability to Give Dominant Controls More Weight (Better Interactive Design)
Something I heard pretty often in comments and message board posts when we were proposing design work for Firefox 3 was:
I’m not dumb, I know where the back button is, it isn’t like I’ve had any trouble finding it in the past.
That’s true, but how many milliseconds does it take for you to visually target the control you want to click on? Can you find it with very early visual processing? Making the most important control visually dominant still makes the control easier to locate, even if you already know where to look, and makes the interface as a whole feel simpler because you can more effectively ignore all of the other controls every time you are on your way to hitting the most important one.
One way to tell how visually dominant a control is, is to blur the design (or close one eye and squint) in order to roughly approximate early visual processing. If a control passes this test the user should be able to visually target it at a very quick glance.
Here are a few examples. You can still see the navigation controls in Firefox, and the pause button on the TiVo remote.

So I, like you, am not dumb. I know exactly where the pause button on my PS2 control remote is. However, the graphics card in my brain has to devote far more resources to helping me to target and hit it compared to when I am trying to hit the pause button on my TiVo remote. The same goes for OK buttons on dialogs in Windows compared to OS X, or when choosing which Web browser to open.
3. Easier Control Differentiation (Better Interactive Design)
When it comes the slow and fast forward buttons on my PS2 remote, I actually am dumb. I can never remember exactly where they are, and the glyphs look similar enough in low light that I regularly hit the wrong one accidently. But it’s very different with my TiVo remote, even without looking at one or holding one, with just picturing the remote in my mind, I can quickly recall where the slow button is — it’s directly south of the pause button (which is sort of the capital city).

We mentally encode locations based off of directions from memorable landmarks. In the case of the PS2 remote, the nearest major landmark for the slow button is the play button, which is large and pretty easy to both remember and find. But from there on in, there aren’t any more landmarks, was the slow button in the first row or second row? On the TiVo remote, literally every single control is in a memorable direction from a major landmark (with the exception of the numeric keypad). The result of this is that after learning all of the button positions, you can pretty effectively use the remote without ever actually looking at it, but instead just by feeling the shapes of the controls and moving from known landmarks to the more obscure controls like Mute or Info. It isn’t impossible to both learn and remember the position of all the controls in the PS2 remote’s grid, but it is considerably harder.
So how do varying control shapes and memorable landmarks effect Web browser design? Now, unlike a remote control, you probably aren’t going to try to use a browser without directly looking at it, as that would kind of defeat the purpose. But what if we tested a browser design by taking away a key visual cue: the glyphs or symbols drawn on the controls:

With the glyphs removed as a possible indicator, the shape of the keyhole can still be used to identify back and forward. Also, landmarks like the keyhole and the location bar can be used as a reference point for more obscure controls like reload and stop. Reload can be mentally encoded as “east of keyhole” instead of “third from the left, or second from the right.” Even merging reload and stop together helps a little bit since it causes them to each have a more unique shape. When merged, stop and reload can even use each other as landmarks.
4. Creating an Iconic Form (Better Branding)
This final consideration isn’t related to improving visual or interactive design but rather the larger realm of product design. Creating a visual hierarchy based on things like size, shape, position and color results in more memorable and recognizable products. For instance, the TiVo remote is incredibly iconic, it’s shaped like a peanut and has a large bright yellow button in the middle. The PS2 remote in contrast looks very similar to other remotes by companies like JVC, Philips, Panasonic, Pioneer, Onkyo, Samsung, and the many others that have a tendency to place small black buttons in a grid.
We decided to use the keyhole shape for our navigation buttons because we wanted to have a visual element that made Firefox consistent between different operating systems, and also to visually differentiate us from other browsers on each system.

One final question I’ve gotten a lot:
Aren’t Asymmetrical Forms Ugly and Symmetrical Forms Innately Beautiful?
Yes. But, there is an important caveat: this only applies to anthropomorphic forms. For instance, a good way to design a disturbing looking robot is to make one eye larger than the other (and believe me, we also seriously put a lot of thought into robot design here).

Laputa Robot by Hayao Miyazaki
While not related to symmetry, it also helps if the robot is very tall.
…Back to Toolbar Customization
Ok, so maybe there are a few reasons why you can build a better interface if you use visual hierarchy, but I want to be able to totally customize my toolbar and not have my icons appear at all sorts of different shapes and sizes that look really silly when you move them into a completely different order!
I totally agree (with that statement that I just wrote to myself), designing a great default user interface that leverages visual hierarchy, and also letting users completely customize the configuration of their toolbar should not be mutually exclusive goals. Unfortunately, kind of like how the user interface design of the PS2 remote was based on the most logical configuration of the underlying circuit board, our platform doesn’t (currently) allow us to do things like placing multiple instances of the same control in the toolbar customization palette, or a group of related controls. We also are somewhat limited in our ability to adapt the visual style of controls based on their placement relative to other controls.
Toolbar customization is really important, and so is visual hierarchy, so we need to start fixing the limitations of our platform.