Mozilla at SCALE 11x

Sheeri

Mozilla has a great presence at SCALE11x, the 11th Annual Southern California Linux Expo. This annual conference is completely volunteer run and one of the best Linux conferences around. Mozilla sponsored Friday’s DevOps Day LA and were part of the selection committee for MySQL Community Day.

See Firefox OS phones
Want to see a FirefoxOS mobile phone in action? Want to learn how to write apps for FirefoxOS? Come visit us at booth #14 and we will show you the phones and how to write a great app.

Giveaways
Also at booth #14, we are giving away swag! Get the traditional Firefox logo sticker or the new blue rocketship Firefox Marketplace stickers. We also have Firefox lanyards and pens, Mozilla stickers and “I support the Open Web” wristbands.

Want a T-shirt?
We have a limited supply of T-shirts. If you want one, tweet this sentence, filling in why you love Firefox:

I

Then come to the Mozilla booth (#14) and show us your tweet! Choose between the traditional Firefox logo tee, as modeled by Casey on the right, or the new Firefox Marketplace soft blue T-shirt modeled by me on the left:

Sheeri and Casey at the Scale11x Booth

Mozilla Talks at SCALE
Mozilla presented three informative talks at SCALE11x to full audiences:
Sheeri Cabral (Database Engineering) presented “Are You Getting the Best Out of Your MySQL Indexes?” at the MySQL Community Day, and the PDF slides are available.

Brandon Burton (Web Operations) presented “Puppet at Mozilla”. Slides are available on speakerdeck.

Chris Turra and Brandon Burton (Web Operations) presented “Simple Patterns for Scaling Websites: Some Lessons Learned at Mozilla”, and the slides are available here.

We look forward to seeing you this weekend at SCALE 11x!

Improving FTP Cluster Availability

jakem

2

Over the last several months we’ve had intermittent trouble with our FTP cluster that powers ftp.mozilla.org, as well as acting as the origin host for the primary product delivery CDNs, download.cdn.mozilla.net. Despite the name, the vast majority of the traffic is actually Apache-powered HTTP and HTTPS, not FTP.

The Problem

This problem manifested as a simple cessation of HTTP services on one node at a time. The quick fix was to restart Apache which usually worked. Sometimes, it still wouldn’t come back up upon restarting it, or would fail to start – rare, but common enough to be recognized. Generally speaking, simply trying 2 or 3 times would eventually restore everything to a working state. Fortunately, breakage during normal operations was rare.

Making the problem much worse, any time we changed a relevant config in Puppet, Puppet would attempt to restart Apache. Since it doesn’t always come back, this was frequently a fatal blow to one or more nodes. Even worse, they’d generally all get the same treatment within 30 minutes of each other… so one change would generally be fatal to at least one node, if not multiple ones.

Of course our load balancer would certainly notice that a node has bit the dust and would pull it out, but much damage is already done- any in-flight connections are lost and have to be re-established. This is very disruptive to some things, especially automation and testing, which frequently don’t respond well to service outages or disconnections. Or rather, they respond perfectly, by screaming loudly.

Mitigation

The first (and simplest) fix was to reduce the extra pain caused by Puppet. Statistically the vast majority of events were coming from this situation as far as we could tell. This was very straightforward. The relevant docs are here: http://docs.puppetlabs.com/references/latest/type.html#service.

There are 2 relevant flags: “restart” and “hasrestart“. The former defines a custom command that puppet should use to restart the service in question, when needed. The latter is simply a flag, indicating if the service in question already has a “restart” command built in that puppet can use directly.

We had set both of these. The former was attempting to change the restart command to “graceful“, which is an Apache-specific restart mode that is much gentler to existing connections when restarting (they’re allowed to finish, rather than being abruptly cut off). The problem here is, the latter overrides the former. If you tell puppet that a service has a restart command (hasrestart => true), it will use that and ignore your custom one.

So that’s the first half of the fix – we simply removed the “hasrestart => true” line, making graceful restarts work properly. This way, at least new config changes won’t break all in-progress operations.

The Real Problem

This was a puzzling issue because it’s one that we haven’t run into anywhere else (and we run a lot of Apache nodes). Clearly something was unique about the configuration on these nodes.

It didn’t take long to turn up a very promising suspect: this is the only RHEL6 clusters where we make heavy use of the “worker” MPM in Apache, rather than “prefork”. It’s also one of the most heavily-loaded cluster we run, in terms of requests-per-second.

Most of the errors generated by Apache were unhelpful, like this:

[Thu Jan 03 15:23:23 2013] [crit] (22)Invalid argument: ap_queue_pop failed

This is obscure enough as to be virtually unactionable. Hundreds of lines like this (and other similarly-obscure errors) across the cluster. Buried in the muck, however, were some very interesting gems like this:

[Thu Jan 03 15:23:23 2013] [alert] (11)Resource temporarily unavailable: apr_thread_create: unable to create worker thread

These happened often even on start-up, although typically Apache would still start. This is what ultimately led us to the real problem.

Our Apache config for the worker MPM looked like this:

    ThreadLimit          100
    ThreadsPerChild      100
    StartServers          30
    ServerLimit           50
    MaxClients          4000
    MinSpareThreads       75
    MaxSpareThreads       75
    MaxRequestsPerChild  500

Let’s break this down:

  • ThreadLimit is the maximum number of threads a worker process can have. Default is 64. A restart is needed to raise this.
  • ThreadsPerChild is how many threads are actually created per worker process. A worker always has precisely this many threads within it (plus 1 additional thread that doesn’t process requests). Default is 25. This can be raised up to ThreadLimit with just a “graceful” restart.
  • StartServers is the number of worker processes that spawned at start.
  • ServerLimit is the maximum number of worker processes that can be spawned. A restart is needed to raise this.
  • MaxClients is the maximum number of simultaneous requests that can be served.
  • MaxSpareThreads is the maximum number of idle threads Apache will attempt to keep on hand. If it has more than this, it will kill off worker processes (each with #ThreadsPerChild threads) until it has less than this many idle.
  • MinSpareThreads is the minimum number of idle threads Apache will attempt to keep on-hand. If it has fewer than this, it will spawn new processes (each with #ThreadsPerChild threads) until it has enough.
  • MaxRequestsPerChild is how many requests a worker process will serve (across all its threads) before it will automatically perish and be respawned.

One noteworthy point that’s not immediately obvious when starting to use the worker MPM is that Apache does not ever change the number of threads per process. That is fixed at the value of ThreadsPerChild. If Apache needs more or less processing threads, it will spawn or kill off whole worker processes, with that fixed number of threads per process.

Now, some observations:

  • 30 servers spawned at start * 100 threads each = 3000 available threads. Given a MaxSpareThreads of only 75, on the surface this seems like a large mismatch. Most of these 30 processes will be killed off immediately in an attempt to reduce the number of spare threads, unless we really do need that many (spoiler: we don’t).
  • ServerLimit of 50 * 100 threads per process = 5000 max possible threads, but MaxClients is only 4000. This is okay but slightly odd, and misleading if you don’t work with “worker” very often. Side note: The other way around won’t work.
  • MaxRequestsPerChild of only 500 means that each thread will serve (on average) only 5 requests before being killed. In all but the most horrible of memory-leaking applications, this is very wasteful and results in lots of unnecessary “churn” in worker processes.

All that is technically okay, but appears to be sub-optimal. Without knowing anything about the traffic itself, it’s hard to know what parts (if any) of that to criticize. The real secret to our troubles lies buried within that tantalizing “unable to create worker thread” error.

What’s Wrong?

After much gnashing of teeth, we ultimately discovered this seemingly-unrelated bug within Red Hat’s Bugzilla: Bug 432903 – /etc/security/limits.conf should reduce the risk of forkbombing.

To make a long story short (too late, I know), it turns out Red Hat instituted a global soft limit to the number of processes any user (including root) can spawn. This default is 1024. It’s been that way for a very long time. We just haven’t had the misfortune of running up against it. It’s likely that previous incarnations of this service (where the above settings came from) had already removed or modified this limit.

In Linux this limit counts threads just like processes. So:

  • 1 “master” httpd process
  • 30 child processes on start-up
  • Each child has 1 “overhead” thread
  • Each child has 100 “processing” threads
  • Total of 1 + 30 + 30*1 + 30*100 threads on start-up == 3061 count needed!

This is the source of all our troubles – we’re trying to spawn far more threads than the OS will allow us to do.

It also unveils an interesting observation: since there was no possible way Apache was ever spawning more than 1024 workers, we can now guarantee that our worker MPM settings above are most definitely poor for our environment. If we really did need more than 1024, we’d be hitting these errors during normal usage, rather than primarily on service restart. StartServers was in effect highlighting a weakness of the configuration that we would inevitably encounter if we ever did have that much traffic.

The Fix

Sadly, you cannot overcome this with a simple /etc/security/limits.conf entry for the “apache” user. Apache creates these threads as the root user, before it drops privileges, so they count against root’s limit (even though they later drop privileges to a normal user). You could certainly set a higher limit just for root but that may not be desirable, as many things follow a pattern similar to Apache (start as root, drop privs).

The best trick we’ve found is to tweak the Apache start-up workflow in such a way as to run “ulimit -u 4096” or similar, right before starting Apache. Thankfully, there’s an easy way to do this in RHEL6 without having to modify the actual init script. The file /etc/sysconfig/httpd is sourced by the default init script… you can simply add this line in there, and it’ll be correctly applied next time Apache is started. We already manage this file via Puppet anyway, so this was a trivial 1-line fix for us.

With that fix in place, we were able to go back and refine our worker MPM settings. Here’s the new config:

    ThreadLimit          150  # up from 100
    ThreadsPerChild      150  # up from 100
    StartServers           8  # down from 30
    ServerLimit           16  # down from 50
    MaxClients          2400  # down from 4000
    MinSpareThreads       75  # same
    MaxSpareThreads      500  # up from 75
    MaxRequestsPerChild 5000  # up from 500

You can see a number of improvements here:

  • ServerLimit * ThreadsPerChild = MaxClients. Not strictly necessary, but easy to keep a handle on things. I found it best to think about MaxClients as if it were a derived value. It isn’t, but if you set it like it is, it functions as a nice sanity-check on the other settings.
  • StartServers * ThreadsPerChild = a sane number of starting workers, compared to other settings.
  • Min and MaxSpareThreads set much more sanely, rather than set to equal (and very low) values. Apache doesn’t have to kill or spawn processes as frequently.
  • MaxRequestsPerChild is much higher (though still somewhat low per-thread), further reducing process churn.

Even though MaxClients and ServerLimit are now much lower, we’re actually able to handle much more traffic now due to the ulimit tweak.

In the End

You could consider this to be a case of premature optimization coming back to bite us. In this particular case these settings were actually migrated from older servers which did not exhibit this particular problem. We weren’t optimizing so much as attempting to maintain consistency with earlier incarnations of this cluster, because we knew that would work. We were wrong.

The trouble with this problem was not that it’s a particularly tricky thing. It really isn’t, ulimit settings are something most Linux sysadmins grow accustomed to working with at some point. The real problem here was the infrequency and unpredictability of running into the issue, coupled with settings that were believed to be appropriate. It just required someone to take some time out to dig into why this crazy little nonsense was happening every now and then.

If you have any questions, please feel free to ask in the comments below!

Percona Repo Conflicts With Itself

Sheeri

5

On a RHEL 6.3 server, we are running MySQL 5.1 (not MariaDB, not the Percona patched version, plain old regular MySQL):

$ rpm -qa | egrep -i "percona|maria|mysql" | sort
mysql-5.1.61-4.el6.x86_64
mysql-libs-5.1.61-4.el6.x86_64
mysql-server-5.1.61-4.el6.x86_64
percona-toolkit-2.1.8-1.noarch
percona-xtrabackup-2.0.5-499.rhel6.x86_64
perl-DBD-MySQL-4.013-3.el6.x86_64
$

We went to do a system update today, and had a problem with dependencies, because the Percona repository wants to replace mysql-libs with Percona’s shared libraries. How rude!

# yum update mysql-libs
Loaded plugins: downloadonly, rhnplugin, security
Setting up Update Process
Resolving Dependencies
--> Running transaction check
---> Package Percona-Server-shared-51.x86_64 0:5.1.67-rel14.3.506.rhel6 will be obsoleting
---> Package Percona-Server-shared-55.x86_64 0:5.5.29-rel29.4.401.rhel6 will be obsoleting
---> Package Percona-Server-shared-compat.x86_64 0:5.5.29-rel29.4.401.rhel6 will be obsoleting
---> Package mysql-libs.x86_64 0:5.1.61-4.el6 will be obsoleted
--> Finished Dependency Resolution

Dependencies Resolved

==============================================================================================================
Package Arch Version Repository Size
==============================================================================================================
Installing:
Percona-Server-shared-51 x86_64 5.1.67-rel14.3.506.rhel6 percona 2.8 M
replacing mysql-libs.x86_64 5.1.61-4.el6
Percona-Server-shared-55 x86_64 5.5.29-rel29.4.401.rhel6 percona 787 k
replacing mysql-libs.x86_64 5.1.61-4.el6
Percona-Server-shared-compat x86_64 5.5.29-rel29.4.401.rhel6 percona 3.4 M
replacing mysql-libs.x86_64 5.1.61-4.el6

Transaction Summary
==============================================================================================================
Install 3 Package(s)

Total size: 7.0 M
Is this ok [y/N]: N

The reason I say this is rude is because the Percona repository does not have mysql-libs. It’s not for the Percona repository to say that it’s obsolete. If we were running the Percona packages, that’s fine, but we’re not. We are running the MySQL packages, and we want to keep running them. Just because we run the Percona toolkit and Xtrabackup doesn’t mean we want the shared MySQL libraries from Percona as well.

The bizarre part is that the Percona repository recommends installing BOTH Percona-Server-shared-51 AND Percona-Server-shared-55, and those packages conflict with each other. So if I hit “yes” I’d end up getting:

Transaction Check Error:
file /usr/lib64/libmysqlclient.so conflicts between attempted installs of Percona-Server-shared-51-5.1.67-rel14.3.506.rhel6.x86_64 and Percona-Server-shared-55-5.5.29-rel29.4.401.rhel6.x86_64
file /usr/lib64/libmysqlclient.so.16.0.0 conflicts between attempted installs of Percona-Server-shared-51-5.1.67-rel14.3.506.rhel6.x86_64 and Percona-Server-shared-compat-5.5.29-rel29.4.401.rhel6.x86_64
file /usr/lib64/libmysqlclient_r.so conflicts between attempted installs of Percona-Server-shared-51-5.1.67-rel14.3.506.rhel6.x86_64 and Percona-Server-shared-55-5.5.29-rel29.4.401.rhel6.x86_64
file /usr/lib64/libmysqlclient_r.so.16.0.0 conflicts between attempted installs of Percona-Server-shared-51-5.1.67-rel14.3.506.rhel6.x86_64 and Percona-Server-shared-compat-5.5.29-rel29.4.401.rhel6.x86_64

Error Summary
-------------

So in order to upgrade this machine, we had to remove the Percona repository, update the packages so that mysql-libs was properly updated by the right repository, put the Percona repository back, and run update again to see if xtrabackup or percona-toolkit has updates to be run.

And all is happy again:
$ rpm -qa | egrep -i "percona|maria|mysql" | sort
mysql-5.1.67-1.el6_3.x86_64
mysql-libs-5.1.67-1.el6_3.x86_64
mysql-server-5.1.67-1.el6_3.x86_64
percona-toolkit-2.1.8-1.noarch
percona-xtrabackup-2.0.5-499.rhel6.x86_64
perl-DBD-MySQL-4.014-1.el6.rfx.x86_64
$

Liveblog: How to Use Puppet Like an Adult

Sheeri

My Mozilla coworkers Ben Kero and Dan Maher gave a standing-room only presentation at Linux Conf AU about “How to Use Puppet Like an Adult”. It was fantastic!

Data != logic
Business DATA does not belong in modules, but business LOGIC is OK.

What are the data sources, then?
Hiera – lightweight pluggable, hierarchical databases. External to the modules, you can use many backends, including MySQL. New feature, standard in puppet 3.0. If you like YAML (and you should), you’ll like this.

$var = lookup('something') # unscoped (complicated)
$var = lookup('namespace::something') # scoped (yay!)

Another data source is puppetdb. This is a bigger topic, but the important thing is that it can be used for high performance store configs.

Where to find pre-built modules for puppet?
Github
Puppet Forge

Or you can write your own module
….but don’t waste time building your own, say, Apache module…someone else has a better one out there.

Is that module right for me?
What to check:
OS/distribution
Complexity – Can you read the module and understand what it does? If not, this might not be the module for you.
Popularity – the more people using/forking it, the more support is probably around. Also age of last commit.
What’s the documentation like?

Recommended pre-built modules – these work, and they’re good. Analyze how they work and learn from them:
puppetlabs/firewall
puppetlabs/cloud_provisioner

When rolling your own modules – if this is going to be a one-off, do whatever you want. If you want to make it open source, know that someone else will use it, and make it more generic.

Use parameterized classes. This allowed you to separate your business data from your business logic. You can avoid having passwords, ssh keys, etc in there, and then you CAN open source it.

Make sure it’s documented.

Module template
puppet module generate author-mod_name – gets you all the files you need with the necessary templares (e.g. README has the sections you need).

module template slide

Note: Everybody should be doing spec testing, not just puppet…..

Parameterized classes
Similar to definitions – they are passed in data. It’s how to separate data from logic. If you don’t get anything else, get this:

parameterized classes slide

These help you write your manifest one time for different nodes. If you have 10 web servers with different node names, write one manifest, and use logic and parameterized classes to instantiate that manifest 10 times. Don’t write 10 manifests.

USE A STYLE GUIDE
“Who here has written Perl code? Who here has written Perl code that someone else can read? USE A STYLE GUIDE”

Parser Validation
just run:
$ puppet parser validate manifest.pp

Parser validation example

Put this into your commit hook, so that parser errors don’t get committed.

Linting
A way of making sure code meets the style guide. External tool, but stable. Very customizable, you can use your own style guide, and you can have it ignore certain things (e.g. don’t care about quoting everything, so don’t error on that). You can put this into commit hooks too.

Linting slide

puppet-concat
Dynamically build files out of lots of parts. How you can build good config files for daemons that don’t support .d directories. Assume you have puppet-concat installed already, it’s widely used, because pre-built modules use it too.

puppet-concat slide

stdlib
Put out by puppetlabs, not actually part of the standard library, but contains lots of useful functions. This is also widely used. Can check if the variable is boolean, integer, strings, can collide hashes together, can check functions, etc.

Sanity preservation
Set default values for your variables – make sure they’re sane – you can pull variables out of facter.
Verify content – play it safe, don’t blithely act on user data. You can throw an error (e.g. if you have a string instead of an integer)
Mutually exclusive declarations – ensure when you start navigating down one logical path, it can’t go down the other path. This comes down to if/then programming, makes more layers to your manifest, but you can make accurate statements about what you want the module to do and predict what it WON’T do. Being able to predict what puppet will and won’t do is important.

Useful Log Output
Functions for each log level
e.g. notice(); warn(); err();
Make these informative and human-readable. What broke and why, can other people understand what’s going on with this?

Puppet As a Source of Truth
Build an ecosystem around puppet. The more you use puppet, the more it describes your infrastructure. How do you do this, though?
You can use the puppet data library (PDL) – a collection of services with an API so you can query puppet from other services – e.g. inventory system, dashboard, etc. You can also use it from within puppet.

You can build an inventory service to know all the facter information about all the machines. You can use the run report service for dashboards like puppetdashboard.

You can download a .dat file and visualize it with graphviz to see how your logic paths look. This .dat file comes within puppet (you do “gem install puppet” and then puppet with some options and you can get it).

The take-home:
take-home points

Linux in the Flesh: Adventures Embedding Linux in Hardware

Sheeri

This is not quite a liveblog, but a set of notes about the points I found most interesting in this morning’s Linux Conf AU keynote, given by Dr. Andrew “bunnie” Huang, best known for Hacking the Xbox and developing the chumby.

The talk was a fascinating look at how complex developing embedded Linux devices is. Let’s start with the takeaways:

Slide with takeaway points

One of the points bunnie made was that customizing embedded devices is really a frontier at this point (though he did not use the word “frontier”). There are not a lot of folks doing it. The “Sustainability” bullet point emphasizes this – bunnie talked about how people create custom environments for the device and want updates pushed to it, but that’s difficult because the environment may mean special tweaks to the updates.

The cost point was interesting because we often see a feature and wonder why they cannot put a feature that seems low-cost to us. Here is why:

slide with why low-cost features aren't that easy to add

So if the Cost of Goods Sold (COGS) is $30, you want to add some margin so you can earn a living, say $15. The retail markup will be $45, so the total cost to the consumer is now $90. Then there’s a bit of rounding up to the magic price of $99.

Now if you decide to add a $5 feature to it, the COGS is now $35. Your margin becomes $17.50, and the retail markup becomes $52.50. Now the total cost is $105, and gets increased to the magic price point of $149. So there is a lot of argument about *which* $5 features to add with that price point. (Also note that $5 does not seem like much, but when your total COGS is $30, you’re adding 16.67% of the cost)

The tech stack is surprisingly full of old-school standards, I was pretty excited to see MySQL in it:

tech stack including MySQL

One of my coworkers left the keynote looking at the challenge and wanting to take it on. I heard all the complexity bunnie told us about and realized, that’s not a stack I want to spend the time to learn, for what I would do with it. It is not where I want to spend my time…here is a slide that shows where the time was spent on the chumby device design:

time spent on chumby device design

The actual hardware design, encircled by a box on the slide, is 11%. And the 19% in the product and software design is iterative, some of the 11% of time in the hardware design overlapped with product and software design, because they are intimately tied together. In fact, they’re so intimately tied, that the hardware is actually “dominated by software concerns” – this is not something I was aware of, but of course once I think about it, it makes sense:
shape of hardware is tied to software

bunnie got these times by going back to e-mails, so this is more like the phases of the project. The bulk of the time is the marketing and business development plus the mass-production ramp-up. bunnie talked about the tooling, which was also quite interesting. After showing us all the steps and explaining that a tool run takes 6 weeks and can cost around $20k (I assume USD, that was not specified), this slide came up:

startup vs. Apple tooling costs

What a startup will do is try one design, spending the time and money, and if it’s not right they’ll have to spend the time and money again, in a serial way. Apple can put down $100k and have 5 different tooling runs happen simultaneously, and they pick the best one, so at the end of 6 weeks, Apple has a great design, whereas a startup only has their first design.

And with that being said, Apple re-tooled the iPhone 3G at least 3 times. So that’s why Apple stuff looks so much nicer than what a startup can come up with – that and they have brilliant designers.

The software design is not easy, either:

Why software design is so tricky

The code has to be extremely optimal, binaries have to be stripped, so the UI is responsive, stable, and your gadget succeeds.

The last takeaway on that first slide I showed was “Ship or Die”. Most folks know about this concept, but this slide made it pretty clear:

Ship or Die slide

As bunnie explains it, most gadget sales (with some exceptions) happen between Black Friday [the day after US Thanksgiving] and Christmas [the Dec 25th one]. If you are aiming to release in mid-November, and your deadline slips a few weeks, you are now missing your huge window of sales, and you will have to wait until next year to really sell your product.

There was much more content than I am able to put here, and I am glad I got the chance to see this keynote!

Liveblog: Think, Create and Critique Design

Sheeri

At Linux Conf AU, “Think, Create and Critique Design” by Andy Fitzsimon…

HTML slides
design checklist

Elements and principles of design.

Design is like cooking
Ingredients create flavors that influence a meal
in that way:
Elements create principles influencing a composition.

Some definitions….
Elements are:

  • Line: A continuous path between 2 points. Can also be a process, or a story plot.

  • Shape: When a line joins to cover an area, it evolves into a shape. “Bottleneck” and “pyramid scheme” are ways we use geometry as a metaphor.

  • Space: area between and inside things. Positive space/negative space.

  • Size: physical scale, bigness or smallness.

  • Color: perceived mixture. Color can be additive/subtractive or a mood. Or a metaphor, colorful personality.

  • Value: Static measure. lightness, darkness, volume.

  • Texture: structure and feel, rough/smooth, soft, etc.

Principles are the methods applied. They influence composition, but they’re not composition. They can be made with elements and also other things.

  • Proportion – divided measure of a relative whole

  • Pattern – using the same element(s) multiple times. Can be a template.

  • Graduation – Incremental changes to one element over another

  • Balance/Harmony/Unity – one or more elements creating a cohesion.

  • Contrast

  • Emphasis – a significant use of one or more elements in a single place to distinguish.

  • Form – the ‘whole’ that a sum of parts becomes – Gestalt – German for ‘form’ but in a mind-blowing way.

    If you’re creating something, you look at the elements and see if they form principles. (e.g. cooking, ingredients)

    If you’re appraising, you start with the whole and try to figure out what the elements are. (e.g. “this is great curry, let’s see if I can taste what’s in it).

    Can apply these checklists (elements + principles defined above) to anything – songs, poems, CSS, a movie, whatever.

    Visual Design and Typography tools:
    baseline grid – so easy you can check it with a ruler. A baseline grid always follows a vertical rhythm.

    Varied scale – Robert Bringhurst (god of typography) – 1 unit, 1 metric, scale up by a proportion. How you can create and measure things too.

    Symbolics are a great tool – they follow typographic tools – graphical glyphs treated as type.

    Art nouveau is ornate, decorated, and hard to create. But if you try to create ornate styles, “there’s plenty of places to hide” – you can mess up some details, people won’t necessarily notice. It’s a good aesthetic but hard to communicate well because it’s so busy.

    Style tiles and brand guides – if you have a set of rules, it’s easier to follow. It also creates consistency, and it lends itself to a balanced piece of work.

    Interactive Design tools:
    Fantastic field, UI is hot right now. Be wary of abstracted tools.

    Patterns are intuitive, isolated and repeatable. Patterns don’t dictate a complete success, they need to be interpreted in context.

    Wireframe – the problem with wireframes is that anyone can do one and if it’s not done well, when people execute it, they follow the wireframe. If you start with a bad template that doesn’t take things into consideration, you’ll end up with a bad implementation. Who makes the wireframe has to know what they’re doing.

    Workflow – “a procedure so hard to remember, you write it down”. Really a series of steps in order to produce a result.

    Persona – “a compromise for never meeting real stakeholders – written by gamblers and liars” – So often a persona doc is just punting. Does the person comes from a set of data (the average person has one breast and one testicle) or are they a real person?

    So be wary of abstracted tools but also be wary of abstracted results!

    Analytics – “metrics that justify slavery” – you check what metrics you check, and those tell you what to do. Pick them carefully!

    Instrumentation – “you do it to yourself” – You implement analytics yourself. You’re creating your own enslavement rules.

    Surveys – “the bored, attention-starved periphery of your audience” – don’t forget that it’s a self-selecting audience.

    Reviews – http://xkcd.com/937/ – you have to read the text to figure out why the good is good and the bad is bad

    User Testing – “more like zombie testing – Why won’t they smile?”

    Hyper-realism looks like something real
    skeu reminds you of something real, doesn’t have to be high fidelity (e.g. clock/watch to evoke time)
    Experience design:
    – deliberate design – a dog with tiger stripes – bad tiger, cute dog. Have deliberate differences, be something different.
    – think, make, become. Empathy is a currency now. Take ownership – win empathy, forgiveness and support.

    Now he puts up Maslow’s heirarchy of needs – Then there’s Aaron Walter, who has designing for emotion and design personas

    love, meaning, pleasure, convenience, predictabiilty, purpose – these are the goals in design. If you hit half of these this is great.

    Easy to observe (easy/logical/predictable) vs. hard to tell (lovable/loyal/trusted)

    Now let’s talk about process – the process creates a product, but the design itself is a process. Design thinking is huge.

    Design for Hackers – Andy says “it is great if you can stomach the Apple stuff and the chapter on Web 2.0.”

    Pragmatic Thinking and Learning: Refactor your Wetware is also great.

IN Subqueries in MySQL 5.6 Are Optimized Away

Sheeri

13

In MySQL 5.6, it looks like IN() subqueries are optimized even better than they are in MariaDB 5.5. Here’s a typical IN() subquery, using the sakila sample database (query taken from slide 6 of the presentation about new MySQL 5.6 optimizer statistics):

SELECT title FROM film WHERE film_id IN (SELECT film_id FROM film_actor)

Before there were any subquery optimizations, say if you are using MySQL 5.1, you would have to rewrite this query as a JOIN, to avoid the dreaded DEPENDENT SUBQUERY that shows up in the EXPLAIN:

mysql> explain SELECT title FROM film WHERE film_id IN (SELECT film_id FROM film_actor)\G
*************************** 1. row ***************************
id: 1
select_type: PRIMARY
table: film
type: index
possible_keys: NULL
key: idx_title
key_len: 767
ref: NULL
rows: 1025
Extra: Using where; Using index
*************************** 2. row ***************************
id: 2
select_type: DEPENDENT SUBQUERY
table: film_actor
type: index_subquery
possible_keys: idx_fk_film_id
key: idx_fk_film_id
key_len: 2
ref: func
rows: 1
Extra: Using index

MariaDB 5.5 boasts subquery optimization, and rightfully so. It looks like MariaDB materializes the subquery:

MariaDB [sakila]> explain SELECT title FROM film WHERE film_id IN
-> (SELECT film_id FROM film_actor)\G
*************************** 1. row ***************************
id: 1
select_type: PRIMARY
table: film
type: index
possible_keys: PRIMARY
key: idx_title
key_len: 767
ref: NULL
rows: 1043
Extra: Using index
*************************** 2. row ***************************
id: 1
select_type: PRIMARY
table:
type: eq_ref
possible_keys: distinct_key
key: distinct_key
key_len: 2
ref: func
rows: 1
Extra:
*************************** 3. row ***************************
id: 2
select_type: MATERIALIZED
table: film_actor
type: index
possible_keys: idx_fk_film_id
key: idx_fk_film_id
key_len: 2
ref: NULL
rows: 4889
Extra: Using index

So MariaDB recognizes the subquery and optimizes it. But it is still optimized as a subquery. There are 3 rows here, a new <subquery2> table is used to help optimize the results.

In MySQL 5.6, the subquery is actually optimized away. The optimizer actually changes it. You can see this in the EXPLAIN plan by looking at the select_type. In both the MySQL 5.1 and MariaDB 5.5 examples, the select_type is PRIMARY, which is used for the outer query in a subquery (or the first SELECT in a UNION, but that does not apply here). In MySQL 5.6, the select_type is SIMPLE for both rows. Note that MySQL 5.6 also does not have to add a third table as MariaDB does:

mysql> explain SELECT title FROM film WHERE film_id IN
-> (SELECT film_id FROM film_actor)\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: film
type: index
possible_keys: PRIMARY
key: idx_title
key_len: 767
ref: NULL
rows: 1000
Extra: Using index
*************************** 2. row ***************************
id: 1
select_type: SIMPLE
table: film_actor
type: ref
possible_keys: idx_fk_film_id
key: idx_fk_film_id
key_len: 2
ref: sakila.film.film_id
rows: 1
Extra: Using index; FirstMatch(film)
2 rows in set (0.00 sec)

In the presentation, the Oracle team says that for DBT3 Query #18, “execution time reduces from days to seconds”. With optimizations like this, I believe it!

New EXPLAIN Features in MySQL 5.6 – FORMAT=JSON actually adds more information!

Sheeri

2

When I heard that MySQL 5.6 was implementing EXPLAIN for writes (INSERT,UPDATE,DELETE,REPLACE), I was pretty excited. Then I heard that MySQL 5.6 also was implementing a JSON format for EXPLAIN and my thought was “I do not care about that!”

Boy, was I wrong. The JSON format does not just put the output into JSON format, it also gives extra information that’s actually pretty useful! It can tell you when you are doing an implicit cast, which parts of a composite index are being used, and when index condition pushdown are being used. None of these are shown in regular EXPLAIN (which seems odd, why could they extend the JSON format but not put the information into the regular EXPLAIN format?), so using the JSON format is actually a good idea even if you do not care about what format your output is in.

As a note, MySQL Workbench’s Visual Explain (go to Query->Visual Explain Current Statement) also gives this information.

attached_condition and implicit casts

In a talk about EXPLAIN I do, I use the Sakila sample database. Here is an example of a “bad” query:

mysql> EXPLAIN SELECT rental_id FROM rental WHERE DATE(rental_date) = '2006-02-14'\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: rental
type: index
possible_keys: NULL
key: rental_date
key_len: 10
ref: NULL
rows: 16005
Extra: Using where; Using index
1 row in set (0.00 sec)

This query is “bad” because it is doing a full index scan (type: index) instead of doing a range scan for just the range of dates we want (should be type: range). Ironically, the EXPLAIN does not actually explain why.

However, the JSON format does explain why:
mysql> EXPLAIN FORMAT=JSON SELECT rental_id FROM rental WHERE DATE(rental_date) = '2006-02-14'\G
*************************** 1. row ***************************
EXPLAIN: {
"query_block": {
"select_id": 1,
"table": {
"table_name": "rental",
"access_type": "index",
"key": "rental_date",
"used_key_parts": [
"rental_date",
"inventory_id",
"customer_id"
],
"key_length": "10",
"rows": 16005,
"filtered": 100,
"using_index": true,
"attached_condition": "(cast(`sakila`.`rental`.`rental_date` as date) = '2006-02-14')"
}
}
}

Note that the attached_condition shows the implicit cast. This is MUCH more friendly to a developer or administrator who is trying to figure out why MySQL is not doing what they want it to do!

The visual EXPLAIN from MySQL Workbench also shows the implicit cast:

Workbench showing attached_condition and implicit CAST

You may also notice it shows the “filtered” attribute, which is not in regular EXPLAIN but is part of EXPLAIN EXTENDED – “filtered” is the percentage of “rows” that are estimated to be returned. A higher number here is better, if it is low it means that you are examining a lot of rows that you do not return.

used_key_parts

You may have noticed above that there is a used_key_parts array that does not show up in the traditional EXPLAIN. In a traditional EXPLAIN (or EXPLAIN EXTENDED), you do get to see the index length with the key_len field, so you can guess that only part of a composite index is used. Both the previous query and the following query use this index:

UNIQUE KEY `rental_date` (`rental_date`,`inventory_id`,`customer_id`)

Here is the traditional EXPLAIN – note that it shows the rental_date index is used, and the key_len is 5, which infers that only the first field fo the index, rental_date is being used, not the other 2 id fields. But you have to deduce that for yourself:

mysql> EXPLAIN EXTENDED SELECT rental_id FROM rental WHERE rental_date BETWEEN '2006-02-14 00:00:00' and '2006-02-14 23:59:59'\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: rental
type: range
possible_keys: rental_date
key: rental_date
key_len: 5
ref: NULL
rows: 181
filtered: 100.00
Extra: Using where; Using index

Here is the JSON format, which shows the used_key_parts field, which reveals very clearly that only the first field of the index is used:

mysql> EXPLAIN FORMAT=JSON SELECT rental_id FROM rental WHERE rental_date BETWEEN '2006-02-14 00:00:00' and '2006-02-14 23:59:59'\G
*************************** 1. row ***************************
EXPLAIN: {
"query_block": {
"select_id": 1,
"table": {
"table_name": "rental",
"access_type": "range",
"possible_keys": [
"rental_date"
],
"key": "rental_date",
"used_key_parts": [
"rental_date"
],
"key_length": "5",
"rows": 181,
"filtered": 100,
"using_index": true,
"attached_condition": "(`sakila`.`rental`.`rental_date` between '2006-02-14 00:00:00' and '2006-02-14 23:59:59')"
}
}
}

And here is the MySQL Workbench Visual EXPLAIN that shows the used_key_parts clearly:

Workbench showing used_key_parts and BETWEEN

Index condition pushdown is itself a new feature in MySQL 5.6, and I will talk about it in another blog post.

I am glad I took a second look at EXPLAIN FORMAT=JSON – the new features are awesome! My only complaint is that I think they should be added to either EXPLAIN or EXPLAIN EXTENDED. I also hope that tools like pt-query-digest will be updated to use the extra information.

MySQL 5.1 vs. MySQL 5.5: Floats, Doubles, and Scientific Notation

Sheeri

13

For the past few days, I have been upgrading a few servers. We are going from Percona’s patched MySQL 5.1 to MariaDB 5.5 (the subquery optimization is excellent, and we have lots of subqueries). Our process involves upgrading a slave first, and letting it replicate for a while, and if everything’s good, update more slaves, then the master.

This has served us well in the past. And now that we are checking data integrity between masters and slaves with pt-table-checksum, the process involves checking before we start that there is data integrity. This is easy, as we checksum twice daily and have a Nagios alert if there are any discrepancies. After the upgrade, we checksum again, to be sure no data has been changed/corrupted in the process of doing a mysqldump export and import.*

Much to my surprise, after importing the data on one of our dev servers, I found that there were a lot of discrepancies. So I picked a chunk to do some comparisons on, and found something interesting:

On Server version: 5.1.65-rel14.0-log Percona Server (GPL), 14.0, Revision 475:
mysql> select float_field from db.tbl where id=218964;
+-------------+
| float_field |
+-------------+
| 9.58084e-05 |
+-------------+
1 row in set (0.04 sec)

On Server version: 5.5.28a-MariaDB-log MariaDB Server
MariaDB [(none)]> select float_field from db.tbl where id=218964;
+--------------+
| float_field |
+--------------+
| 0.0000958084 |
+--------------+
1 row in set (0.24 sec)

Which of course causes a different checksum. I tried SELECTing the values, casting and converting them, but I could not get them to change in the database. MySQL 5.1 insists on storing in scientific notation, and MariaDB 5.5 (and MySQL 5.5, we tested it out) insists on storing without scientific notation.

Frankly, I’m surprised this has not come up before (I did lots of querying Google for MySQL 5.5 and scientific notation), since it radically changes how numbers look when they are stored and retrieved. I guess code does the right thing…except for pt-table-checksum, and I cannot really blame it.

In the end, I used the –ignore-columns option to pt-table-checksum, with the result of:

SELECT GROUP_CONCAT(DISTINCT COLUMN_NAME) FROM INFORMATION_SCHEMA.COLUMNS WHERE DATA_TYPE IN ('float','double') AND TABLE_SCHEMA NOT IN ('mysql','information_schema','performance_schema');

In this way, I can get an accurate checksum to see if anything has changed, before I mark that the upgrade is complete on this slave server.

* This is just on the first slave. After the first slave is upgraded, we use xtrabackup to copy the data to another server to upgrade it.

RPKI + ROA

XioNoX

4

A few weeks ago, Mozilla started to use RPKI+ROA, here is a quick introduction.

RPKI (Resource Public Key Infrastructure) is a first step to secure BGP (the routing protocol used on the Internet), it allows to certify (and verify) that a prefix is advertised by the good AS (in other words that an IP points to its legitimate owner) in order to prevent prefix hijacking (usually misconfiguration but can also be used in attacks) like the PakistanTelecom/Youtube story.

Based on that information (route valid/invalid/unknown) the router can act as configured, which is usually to lower the invalid route’s priority so the genuine route is used (so it doesn’t break anything if it’s miss-configured/expired).

Like DNSSEC, RPKI is made of 2 parts, the signing and the verification. The former is possible for us through ARIN (our RIR) the later is possible with Juniper since JunOS v12.2.

How it works (in our case but it seams to be similar with RIPE and probably others) is quite straightforward:

  1. Generate a public/private key pair,
  2.  Communicate the public key to ARIN,
  3. Write a ROA (Route Origination Authorization)
    • It’s basically a statement saying “I allow AS#x to advertise routes y and z”
  4. Sign it with our private key,
  5. Send the signed ROA request to ARIN,
  6. If everything is correct, ARIN will publish this ROA to its repository, publicly available to any validator

Steps 3 to 5 have to be done for each ASN, and before the ROAs expire.
Steps 2 to 6 can be done directly through ARIN’s website (step 4 is done on the client side using JavaScript).

By doing that, any entities validating the routes they receive will not (or less) be impacted if another AS advertises a prefix the same size or smaller than ours.

Probably latter this year, we will deploy the validator, which is made of 2 devices, the router and the validator itself. To not overload the router, the validator (called RTR server) is in charge of downloading the ROAs located in the RIRs’ repositories and telling the router (RTR client) if an AS has the right to advertises the route. Then, as said earlier the router decides what to do. That validator is usually a regular server installed in the same POP as the router.

To continue with the comparison with DNSSEC, right now we are signing our DNS zones, but our own name-servers aren’t validating DNSSEC, while Comcast’s do.

You can easily check the status of a prefix using whois, for instance:

whois -h whois.bgpmon.net " --roa 53371 63.245.217.105/24"
whois -h whois.bgpmon.net 63.245.209.0

The returned code will be one of the following:

0 - ROA correct
1 - No ROA found
2 - ROA incorrect - will tell you what the error is

It’s so easy that nobody with a RIR account has a valid excuse to not start using it!