Percona Repo Conflicts With Itself

Sheeri

5

On a RHEL 6.3 server, we are running MySQL 5.1 (not MariaDB, not the Percona patched version, plain old regular MySQL):

$ rpm -qa | egrep -i "percona|maria|mysql" | sort
mysql-5.1.61-4.el6.x86_64
mysql-libs-5.1.61-4.el6.x86_64
mysql-server-5.1.61-4.el6.x86_64
percona-toolkit-2.1.8-1.noarch
percona-xtrabackup-2.0.5-499.rhel6.x86_64
perl-DBD-MySQL-4.013-3.el6.x86_64
$

We went to do a system update today, and had a problem with dependencies, because the Percona repository wants to replace mysql-libs with Percona’s shared libraries. How rude!

# yum update mysql-libs
Loaded plugins: downloadonly, rhnplugin, security
Setting up Update Process
Resolving Dependencies
--> Running transaction check
---> Package Percona-Server-shared-51.x86_64 0:5.1.67-rel14.3.506.rhel6 will be obsoleting
---> Package Percona-Server-shared-55.x86_64 0:5.5.29-rel29.4.401.rhel6 will be obsoleting
---> Package Percona-Server-shared-compat.x86_64 0:5.5.29-rel29.4.401.rhel6 will be obsoleting
---> Package mysql-libs.x86_64 0:5.1.61-4.el6 will be obsoleted
--> Finished Dependency Resolution

Dependencies Resolved

==============================================================================================================
Package Arch Version Repository Size
==============================================================================================================
Installing:
Percona-Server-shared-51 x86_64 5.1.67-rel14.3.506.rhel6 percona 2.8 M
replacing mysql-libs.x86_64 5.1.61-4.el6
Percona-Server-shared-55 x86_64 5.5.29-rel29.4.401.rhel6 percona 787 k
replacing mysql-libs.x86_64 5.1.61-4.el6
Percona-Server-shared-compat x86_64 5.5.29-rel29.4.401.rhel6 percona 3.4 M
replacing mysql-libs.x86_64 5.1.61-4.el6

Transaction Summary
==============================================================================================================
Install 3 Package(s)

Total size: 7.0 M
Is this ok [y/N]: N

The reason I say this is rude is because the Percona repository does not have mysql-libs. It’s not for the Percona repository to say that it’s obsolete. If we were running the Percona packages, that’s fine, but we’re not. We are running the MySQL packages, and we want to keep running them. Just because we run the Percona toolkit and Xtrabackup doesn’t mean we want the shared MySQL libraries from Percona as well.

The bizarre part is that the Percona repository recommends installing BOTH Percona-Server-shared-51 AND Percona-Server-shared-55, and those packages conflict with each other. So if I hit “yes” I’d end up getting:

Transaction Check Error:
file /usr/lib64/libmysqlclient.so conflicts between attempted installs of Percona-Server-shared-51-5.1.67-rel14.3.506.rhel6.x86_64 and Percona-Server-shared-55-5.5.29-rel29.4.401.rhel6.x86_64
file /usr/lib64/libmysqlclient.so.16.0.0 conflicts between attempted installs of Percona-Server-shared-51-5.1.67-rel14.3.506.rhel6.x86_64 and Percona-Server-shared-compat-5.5.29-rel29.4.401.rhel6.x86_64
file /usr/lib64/libmysqlclient_r.so conflicts between attempted installs of Percona-Server-shared-51-5.1.67-rel14.3.506.rhel6.x86_64 and Percona-Server-shared-55-5.5.29-rel29.4.401.rhel6.x86_64
file /usr/lib64/libmysqlclient_r.so.16.0.0 conflicts between attempted installs of Percona-Server-shared-51-5.1.67-rel14.3.506.rhel6.x86_64 and Percona-Server-shared-compat-5.5.29-rel29.4.401.rhel6.x86_64

Error Summary
-------------

So in order to upgrade this machine, we had to remove the Percona repository, update the packages so that mysql-libs was properly updated by the right repository, put the Percona repository back, and run update again to see if xtrabackup or percona-toolkit has updates to be run.

And all is happy again:
$ rpm -qa | egrep -i "percona|maria|mysql" | sort
mysql-5.1.67-1.el6_3.x86_64
mysql-libs-5.1.67-1.el6_3.x86_64
mysql-server-5.1.67-1.el6_3.x86_64
percona-toolkit-2.1.8-1.noarch
percona-xtrabackup-2.0.5-499.rhel6.x86_64
perl-DBD-MySQL-4.014-1.el6.rfx.x86_64
$

Liveblog: How to Use Puppet Like an Adult

Sheeri

My Mozilla coworkers Ben Kero and Dan Maher gave a standing-room only presentation at Linux Conf AU about “How to Use Puppet Like an Adult”. It was fantastic!

Data != logic
Business DATA does not belong in modules, but business LOGIC is OK.

What are the data sources, then?
Hiera – lightweight pluggable, hierarchical databases. External to the modules, you can use many backends, including MySQL. New feature, standard in puppet 3.0. If you like YAML (and you should), you’ll like this.

$var = lookup('something') # unscoped (complicated)
$var = lookup('namespace::something') # scoped (yay!)

Another data source is puppetdb. This is a bigger topic, but the important thing is that it can be used for high performance store configs.

Where to find pre-built modules for puppet?
Github
Puppet Forge

Or you can write your own module
….but don’t waste time building your own, say, Apache module…someone else has a better one out there.

Is that module right for me?
What to check:
OS/distribution
Complexity – Can you read the module and understand what it does? If not, this might not be the module for you.
Popularity – the more people using/forking it, the more support is probably around. Also age of last commit.
What’s the documentation like?

Recommended pre-built modules – these work, and they’re good. Analyze how they work and learn from them:
puppetlabs/firewall
puppetlabs/cloud_provisioner

When rolling your own modules – if this is going to be a one-off, do whatever you want. If you want to make it open source, know that someone else will use it, and make it more generic.

Use parameterized classes. This allowed you to separate your business data from your business logic. You can avoid having passwords, ssh keys, etc in there, and then you CAN open source it.

Make sure it’s documented.

Module template
puppet module generate author-mod_name – gets you all the files you need with the necessary templares (e.g. README has the sections you need).

module template slide

Note: Everybody should be doing spec testing, not just puppet…..

Parameterized classes
Similar to definitions – they are passed in data. It’s how to separate data from logic. If you don’t get anything else, get this:

parameterized classes slide

These help you write your manifest one time for different nodes. If you have 10 web servers with different node names, write one manifest, and use logic and parameterized classes to instantiate that manifest 10 times. Don’t write 10 manifests.

USE A STYLE GUIDE
“Who here has written Perl code? Who here has written Perl code that someone else can read? USE A STYLE GUIDE”

Parser Validation
just run:
$ puppet parser validate manifest.pp

Parser validation example

Put this into your commit hook, so that parser errors don’t get committed.

Linting
A way of making sure code meets the style guide. External tool, but stable. Very customizable, you can use your own style guide, and you can have it ignore certain things (e.g. don’t care about quoting everything, so don’t error on that). You can put this into commit hooks too.

Linting slide

puppet-concat
Dynamically build files out of lots of parts. How you can build good config files for daemons that don’t support .d directories. Assume you have puppet-concat installed already, it’s widely used, because pre-built modules use it too.

puppet-concat slide

stdlib
Put out by puppetlabs, not actually part of the standard library, but contains lots of useful functions. This is also widely used. Can check if the variable is boolean, integer, strings, can collide hashes together, can check functions, etc.

Sanity preservation
Set default values for your variables – make sure they’re sane – you can pull variables out of facter.
Verify content – play it safe, don’t blithely act on user data. You can throw an error (e.g. if you have a string instead of an integer)
Mutually exclusive declarations – ensure when you start navigating down one logical path, it can’t go down the other path. This comes down to if/then programming, makes more layers to your manifest, but you can make accurate statements about what you want the module to do and predict what it WON’T do. Being able to predict what puppet will and won’t do is important.

Useful Log Output
Functions for each log level
e.g. notice(); warn(); err();
Make these informative and human-readable. What broke and why, can other people understand what’s going on with this?

Puppet As a Source of Truth
Build an ecosystem around puppet. The more you use puppet, the more it describes your infrastructure. How do you do this, though?
You can use the puppet data library (PDL) – a collection of services with an API so you can query puppet from other services – e.g. inventory system, dashboard, etc. You can also use it from within puppet.

You can build an inventory service to know all the facter information about all the machines. You can use the run report service for dashboards like puppetdashboard.

You can download a .dat file and visualize it with graphviz to see how your logic paths look. This .dat file comes within puppet (you do “gem install puppet” and then puppet with some options and you can get it).

The take-home:
take-home points

Linux in the Flesh: Adventures Embedding Linux in Hardware

Sheeri

This is not quite a liveblog, but a set of notes about the points I found most interesting in this morning’s Linux Conf AU keynote, given by Dr. Andrew “bunnie” Huang, best known for Hacking the Xbox and developing the chumby.

The talk was a fascinating look at how complex developing embedded Linux devices is. Let’s start with the takeaways:

Slide with takeaway points

One of the points bunnie made was that customizing embedded devices is really a frontier at this point (though he did not use the word “frontier”). There are not a lot of folks doing it. The “Sustainability” bullet point emphasizes this – bunnie talked about how people create custom environments for the device and want updates pushed to it, but that’s difficult because the environment may mean special tweaks to the updates.

The cost point was interesting because we often see a feature and wonder why they cannot put a feature that seems low-cost to us. Here is why:

slide with why low-cost features aren't that easy to add

So if the Cost of Goods Sold (COGS) is $30, you want to add some margin so you can earn a living, say $15. The retail markup will be $45, so the total cost to the consumer is now $90. Then there’s a bit of rounding up to the magic price of $99.

Now if you decide to add a $5 feature to it, the COGS is now $35. Your margin becomes $17.50, and the retail markup becomes $52.50. Now the total cost is $105, and gets increased to the magic price point of $149. So there is a lot of argument about *which* $5 features to add with that price point. (Also note that $5 does not seem like much, but when your total COGS is $30, you’re adding 16.67% of the cost)

The tech stack is surprisingly full of old-school standards, I was pretty excited to see MySQL in it:

tech stack including MySQL

One of my coworkers left the keynote looking at the challenge and wanting to take it on. I heard all the complexity bunnie told us about and realized, that’s not a stack I want to spend the time to learn, for what I would do with it. It is not where I want to spend my time…here is a slide that shows where the time was spent on the chumby device design:

time spent on chumby device design

The actual hardware design, encircled by a box on the slide, is 11%. And the 19% in the product and software design is iterative, some of the 11% of time in the hardware design overlapped with product and software design, because they are intimately tied together. In fact, they’re so intimately tied, that the hardware is actually “dominated by software concerns” – this is not something I was aware of, but of course once I think about it, it makes sense:
shape of hardware is tied to software

bunnie got these times by going back to e-mails, so this is more like the phases of the project. The bulk of the time is the marketing and business development plus the mass-production ramp-up. bunnie talked about the tooling, which was also quite interesting. After showing us all the steps and explaining that a tool run takes 6 weeks and can cost around $20k (I assume USD, that was not specified), this slide came up:

startup vs. Apple tooling costs

What a startup will do is try one design, spending the time and money, and if it’s not right they’ll have to spend the time and money again, in a serial way. Apple can put down $100k and have 5 different tooling runs happen simultaneously, and they pick the best one, so at the end of 6 weeks, Apple has a great design, whereas a startup only has their first design.

And with that being said, Apple re-tooled the iPhone 3G at least 3 times. So that’s why Apple stuff looks so much nicer than what a startup can come up with – that and they have brilliant designers.

The software design is not easy, either:

Why software design is so tricky

The code has to be extremely optimal, binaries have to be stripped, so the UI is responsive, stable, and your gadget succeeds.

The last takeaway on that first slide I showed was “Ship or Die”. Most folks know about this concept, but this slide made it pretty clear:

Ship or Die slide

As bunnie explains it, most gadget sales (with some exceptions) happen between Black Friday [the day after US Thanksgiving] and Christmas [the Dec 25th one]. If you are aiming to release in mid-November, and your deadline slips a few weeks, you are now missing your huge window of sales, and you will have to wait until next year to really sell your product.

There was much more content than I am able to put here, and I am glad I got the chance to see this keynote!

Liveblog: Think, Create and Critique Design

Sheeri

At Linux Conf AU, “Think, Create and Critique Design” by Andy Fitzsimon…

HTML slides
design checklist

Elements and principles of design.

Design is like cooking
Ingredients create flavors that influence a meal
in that way:
Elements create principles influencing a composition.

Some definitions….
Elements are:

  • Line: A continuous path between 2 points. Can also be a process, or a story plot.

  • Shape: When a line joins to cover an area, it evolves into a shape. “Bottleneck” and “pyramid scheme” are ways we use geometry as a metaphor.

  • Space: area between and inside things. Positive space/negative space.

  • Size: physical scale, bigness or smallness.

  • Color: perceived mixture. Color can be additive/subtractive or a mood. Or a metaphor, colorful personality.

  • Value: Static measure. lightness, darkness, volume.

  • Texture: structure and feel, rough/smooth, soft, etc.

Principles are the methods applied. They influence composition, but they’re not composition. They can be made with elements and also other things.

  • Proportion – divided measure of a relative whole

  • Pattern – using the same element(s) multiple times. Can be a template.

  • Graduation – Incremental changes to one element over another

  • Balance/Harmony/Unity – one or more elements creating a cohesion.

  • Contrast

  • Emphasis – a significant use of one or more elements in a single place to distinguish.

  • Form – the ‘whole’ that a sum of parts becomes – Gestalt – German for ‘form’ but in a mind-blowing way.

    If you’re creating something, you look at the elements and see if they form principles. (e.g. cooking, ingredients)

    If you’re appraising, you start with the whole and try to figure out what the elements are. (e.g. “this is great curry, let’s see if I can taste what’s in it).

    Can apply these checklists (elements + principles defined above) to anything – songs, poems, CSS, a movie, whatever.

    Visual Design and Typography tools:
    baseline grid – so easy you can check it with a ruler. A baseline grid always follows a vertical rhythm.

    Varied scale – Robert Bringhurst (god of typography) – 1 unit, 1 metric, scale up by a proportion. How you can create and measure things too.

    Symbolics are a great tool – they follow typographic tools – graphical glyphs treated as type.

    Art nouveau is ornate, decorated, and hard to create. But if you try to create ornate styles, “there’s plenty of places to hide” – you can mess up some details, people won’t necessarily notice. It’s a good aesthetic but hard to communicate well because it’s so busy.

    Style tiles and brand guides – if you have a set of rules, it’s easier to follow. It also creates consistency, and it lends itself to a balanced piece of work.

    Interactive Design tools:
    Fantastic field, UI is hot right now. Be wary of abstracted tools.

    Patterns are intuitive, isolated and repeatable. Patterns don’t dictate a complete success, they need to be interpreted in context.

    Wireframe – the problem with wireframes is that anyone can do one and if it’s not done well, when people execute it, they follow the wireframe. If you start with a bad template that doesn’t take things into consideration, you’ll end up with a bad implementation. Who makes the wireframe has to know what they’re doing.

    Workflow – “a procedure so hard to remember, you write it down”. Really a series of steps in order to produce a result.

    Persona – “a compromise for never meeting real stakeholders – written by gamblers and liars” – So often a persona doc is just punting. Does the person comes from a set of data (the average person has one breast and one testicle) or are they a real person?

    So be wary of abstracted tools but also be wary of abstracted results!

    Analytics – “metrics that justify slavery” – you check what metrics you check, and those tell you what to do. Pick them carefully!

    Instrumentation – “you do it to yourself” – You implement analytics yourself. You’re creating your own enslavement rules.

    Surveys – “the bored, attention-starved periphery of your audience” – don’t forget that it’s a self-selecting audience.

    Reviews – http://xkcd.com/937/ – you have to read the text to figure out why the good is good and the bad is bad

    User Testing – “more like zombie testing – Why won’t they smile?”

    Hyper-realism looks like something real
    skeu reminds you of something real, doesn’t have to be high fidelity (e.g. clock/watch to evoke time)
    Experience design:
    – deliberate design – a dog with tiger stripes – bad tiger, cute dog. Have deliberate differences, be something different.
    – think, make, become. Empathy is a currency now. Take ownership – win empathy, forgiveness and support.

    Now he puts up Maslow’s heirarchy of needs – Then there’s Aaron Walter, who has designing for emotion and design personas

    love, meaning, pleasure, convenience, predictabiilty, purpose – these are the goals in design. If you hit half of these this is great.

    Easy to observe (easy/logical/predictable) vs. hard to tell (lovable/loyal/trusted)

    Now let’s talk about process – the process creates a product, but the design itself is a process. Design thinking is huge.

    Design for Hackers – Andy says “it is great if you can stomach the Apple stuff and the chapter on Web 2.0.”

    Pragmatic Thinking and Learning: Refactor your Wetware is also great.

IN Subqueries in MySQL 5.6 Are Optimized Away

Sheeri

13

In MySQL 5.6, it looks like IN() subqueries are optimized even better than they are in MariaDB 5.5. Here’s a typical IN() subquery, using the sakila sample database (query taken from slide 6 of the presentation about new MySQL 5.6 optimizer statistics):

SELECT title FROM film WHERE film_id IN (SELECT film_id FROM film_actor)

Before there were any subquery optimizations, say if you are using MySQL 5.1, you would have to rewrite this query as a JOIN, to avoid the dreaded DEPENDENT SUBQUERY that shows up in the EXPLAIN:

mysql> explain SELECT title FROM film WHERE film_id IN (SELECT film_id FROM film_actor)\G
*************************** 1. row ***************************
id: 1
select_type: PRIMARY
table: film
type: index
possible_keys: NULL
key: idx_title
key_len: 767
ref: NULL
rows: 1025
Extra: Using where; Using index
*************************** 2. row ***************************
id: 2
select_type: DEPENDENT SUBQUERY
table: film_actor
type: index_subquery
possible_keys: idx_fk_film_id
key: idx_fk_film_id
key_len: 2
ref: func
rows: 1
Extra: Using index

MariaDB 5.5 boasts subquery optimization, and rightfully so. It looks like MariaDB materializes the subquery:

MariaDB [sakila]> explain SELECT title FROM film WHERE film_id IN
-> (SELECT film_id FROM film_actor)\G
*************************** 1. row ***************************
id: 1
select_type: PRIMARY
table: film
type: index
possible_keys: PRIMARY
key: idx_title
key_len: 767
ref: NULL
rows: 1043
Extra: Using index
*************************** 2. row ***************************
id: 1
select_type: PRIMARY
table:
type: eq_ref
possible_keys: distinct_key
key: distinct_key
key_len: 2
ref: func
rows: 1
Extra:
*************************** 3. row ***************************
id: 2
select_type: MATERIALIZED
table: film_actor
type: index
possible_keys: idx_fk_film_id
key: idx_fk_film_id
key_len: 2
ref: NULL
rows: 4889
Extra: Using index

So MariaDB recognizes the subquery and optimizes it. But it is still optimized as a subquery. There are 3 rows here, a new <subquery2> table is used to help optimize the results.

In MySQL 5.6, the subquery is actually optimized away. The optimizer actually changes it. You can see this in the EXPLAIN plan by looking at the select_type. In both the MySQL 5.1 and MariaDB 5.5 examples, the select_type is PRIMARY, which is used for the outer query in a subquery (or the first SELECT in a UNION, but that does not apply here). In MySQL 5.6, the select_type is SIMPLE for both rows. Note that MySQL 5.6 also does not have to add a third table as MariaDB does:

mysql> explain SELECT title FROM film WHERE film_id IN
-> (SELECT film_id FROM film_actor)\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: film
type: index
possible_keys: PRIMARY
key: idx_title
key_len: 767
ref: NULL
rows: 1000
Extra: Using index
*************************** 2. row ***************************
id: 1
select_type: SIMPLE
table: film_actor
type: ref
possible_keys: idx_fk_film_id
key: idx_fk_film_id
key_len: 2
ref: sakila.film.film_id
rows: 1
Extra: Using index; FirstMatch(film)
2 rows in set (0.00 sec)

In the presentation, the Oracle team says that for DBT3 Query #18, “execution time reduces from days to seconds”. With optimizations like this, I believe it!

New EXPLAIN Features in MySQL 5.6 – FORMAT=JSON actually adds more information!

Sheeri

2

When I heard that MySQL 5.6 was implementing EXPLAIN for writes (INSERT,UPDATE,DELETE,REPLACE), I was pretty excited. Then I heard that MySQL 5.6 also was implementing a JSON format for EXPLAIN and my thought was “I do not care about that!”

Boy, was I wrong. The JSON format does not just put the output into JSON format, it also gives extra information that’s actually pretty useful! It can tell you when you are doing an implicit cast, which parts of a composite index are being used, and when index condition pushdown are being used. None of these are shown in regular EXPLAIN (which seems odd, why could they extend the JSON format but not put the information into the regular EXPLAIN format?), so using the JSON format is actually a good idea even if you do not care about what format your output is in.

As a note, MySQL Workbench’s Visual Explain (go to Query->Visual Explain Current Statement) also gives this information.

attached_condition and implicit casts

In a talk about EXPLAIN I do, I use the Sakila sample database. Here is an example of a “bad” query:

mysql> EXPLAIN SELECT rental_id FROM rental WHERE DATE(rental_date) = '2006-02-14'\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: rental
type: index
possible_keys: NULL
key: rental_date
key_len: 10
ref: NULL
rows: 16005
Extra: Using where; Using index
1 row in set (0.00 sec)

This query is “bad” because it is doing a full index scan (type: index) instead of doing a range scan for just the range of dates we want (should be type: range). Ironically, the EXPLAIN does not actually explain why.

However, the JSON format does explain why:
mysql> EXPLAIN FORMAT=JSON SELECT rental_id FROM rental WHERE DATE(rental_date) = '2006-02-14'\G
*************************** 1. row ***************************
EXPLAIN: {
"query_block": {
"select_id": 1,
"table": {
"table_name": "rental",
"access_type": "index",
"key": "rental_date",
"used_key_parts": [
"rental_date",
"inventory_id",
"customer_id"
],
"key_length": "10",
"rows": 16005,
"filtered": 100,
"using_index": true,
"attached_condition": "(cast(`sakila`.`rental`.`rental_date` as date) = '2006-02-14')"
}
}
}

Note that the attached_condition shows the implicit cast. This is MUCH more friendly to a developer or administrator who is trying to figure out why MySQL is not doing what they want it to do!

The visual EXPLAIN from MySQL Workbench also shows the implicit cast:

Workbench showing attached_condition and implicit CAST

You may also notice it shows the “filtered” attribute, which is not in regular EXPLAIN but is part of EXPLAIN EXTENDED – “filtered” is the percentage of “rows” that are estimated to be returned. A higher number here is better, if it is low it means that you are examining a lot of rows that you do not return.

used_key_parts

You may have noticed above that there is a used_key_parts array that does not show up in the traditional EXPLAIN. In a traditional EXPLAIN (or EXPLAIN EXTENDED), you do get to see the index length with the key_len field, so you can guess that only part of a composite index is used. Both the previous query and the following query use this index:

UNIQUE KEY `rental_date` (`rental_date`,`inventory_id`,`customer_id`)

Here is the traditional EXPLAIN – note that it shows the rental_date index is used, and the key_len is 5, which infers that only the first field fo the index, rental_date is being used, not the other 2 id fields. But you have to deduce that for yourself:

mysql> EXPLAIN EXTENDED SELECT rental_id FROM rental WHERE rental_date BETWEEN '2006-02-14 00:00:00' and '2006-02-14 23:59:59'\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: rental
type: range
possible_keys: rental_date
key: rental_date
key_len: 5
ref: NULL
rows: 181
filtered: 100.00
Extra: Using where; Using index

Here is the JSON format, which shows the used_key_parts field, which reveals very clearly that only the first field of the index is used:

mysql> EXPLAIN FORMAT=JSON SELECT rental_id FROM rental WHERE rental_date BETWEEN '2006-02-14 00:00:00' and '2006-02-14 23:59:59'\G
*************************** 1. row ***************************
EXPLAIN: {
"query_block": {
"select_id": 1,
"table": {
"table_name": "rental",
"access_type": "range",
"possible_keys": [
"rental_date"
],
"key": "rental_date",
"used_key_parts": [
"rental_date"
],
"key_length": "5",
"rows": 181,
"filtered": 100,
"using_index": true,
"attached_condition": "(`sakila`.`rental`.`rental_date` between '2006-02-14 00:00:00' and '2006-02-14 23:59:59')"
}
}
}

And here is the MySQL Workbench Visual EXPLAIN that shows the used_key_parts clearly:

Workbench showing used_key_parts and BETWEEN

Index condition pushdown is itself a new feature in MySQL 5.6, and I will talk about it in another blog post.

I am glad I took a second look at EXPLAIN FORMAT=JSON – the new features are awesome! My only complaint is that I think they should be added to either EXPLAIN or EXPLAIN EXTENDED. I also hope that tools like pt-query-digest will be updated to use the extra information.

MySQL 5.1 vs. MySQL 5.5: Floats, Doubles, and Scientific Notation

Sheeri

13

For the past few days, I have been upgrading a few servers. We are going from Percona’s patched MySQL 5.1 to MariaDB 5.5 (the subquery optimization is excellent, and we have lots of subqueries). Our process involves upgrading a slave first, and letting it replicate for a while, and if everything’s good, update more slaves, then the master.

This has served us well in the past. And now that we are checking data integrity between masters and slaves with pt-table-checksum, the process involves checking before we start that there is data integrity. This is easy, as we checksum twice daily and have a Nagios alert if there are any discrepancies. After the upgrade, we checksum again, to be sure no data has been changed/corrupted in the process of doing a mysqldump export and import.*

Much to my surprise, after importing the data on one of our dev servers, I found that there were a lot of discrepancies. So I picked a chunk to do some comparisons on, and found something interesting:

On Server version: 5.1.65-rel14.0-log Percona Server (GPL), 14.0, Revision 475:
mysql> select float_field from db.tbl where id=218964;
+-------------+
| float_field |
+-------------+
| 9.58084e-05 |
+-------------+
1 row in set (0.04 sec)

On Server version: 5.5.28a-MariaDB-log MariaDB Server
MariaDB [(none)]> select float_field from db.tbl where id=218964;
+--------------+
| float_field |
+--------------+
| 0.0000958084 |
+--------------+
1 row in set (0.24 sec)

Which of course causes a different checksum. I tried SELECTing the values, casting and converting them, but I could not get them to change in the database. MySQL 5.1 insists on storing in scientific notation, and MariaDB 5.5 (and MySQL 5.5, we tested it out) insists on storing without scientific notation.

Frankly, I’m surprised this has not come up before (I did lots of querying Google for MySQL 5.5 and scientific notation), since it radically changes how numbers look when they are stored and retrieved. I guess code does the right thing…except for pt-table-checksum, and I cannot really blame it.

In the end, I used the –ignore-columns option to pt-table-checksum, with the result of:

SELECT GROUP_CONCAT(DISTINCT COLUMN_NAME) FROM INFORMATION_SCHEMA.COLUMNS WHERE DATA_TYPE IN ('float','double') AND TABLE_SCHEMA NOT IN ('mysql','information_schema','performance_schema');

In this way, I can get an accurate checksum to see if anything has changed, before I mark that the upgrade is complete on this slave server.

* This is just on the first slave. After the first slave is upgraded, we use xtrabackup to copy the data to another server to upgrade it.

RPKI + ROA

XioNoX

4

A few weeks ago, Mozilla started to use RPKI+ROA, here is a quick introduction.

RPKI (Resource Public Key Infrastructure) is a first step to secure BGP (the routing protocol used on the Internet), it allows to certify (and verify) that a prefix is advertised by the good AS (in other words that an IP points to its legitimate owner) in order to prevent prefix hijacking (usually misconfiguration but can also be used in attacks) like the PakistanTelecom/Youtube story.

Based on that information (route valid/invalid/unknown) the router can act as configured, which is usually to lower the invalid route’s priority so the genuine route is used (so it doesn’t break anything if it’s miss-configured/expired).

Like DNSSEC, RPKI is made of 2 parts, the signing and the verification. The former is possible for us through ARIN (our RIR) the later is possible with Juniper since JunOS v12.2.

How it works (in our case but it seams to be similar with RIPE and probably others) is quite straightforward:

  1. Generate a public/private key pair,
  2.  Communicate the public key to ARIN,
  3. Write a ROA (Route Origination Authorization)
    • It’s basically a statement saying “I allow AS#x to advertise routes y and z”
  4. Sign it with our private key,
  5. Send the signed ROA request to ARIN,
  6. If everything is correct, ARIN will publish this ROA to its repository, publicly available to any validator

Steps 3 to 5 have to be done for each ASN, and before the ROAs expire.
Steps 2 to 6 can be done directly through ARIN’s website (step 4 is done on the client side using JavaScript).

By doing that, any entities validating the routes they receive will not (or less) be impacted if another AS advertises a prefix the same size or smaller than ours.

Probably latter this year, we will deploy the validator, which is made of 2 devices, the router and the validator itself. To not overload the router, the validator (called RTR server) is in charge of downloading the ROAs located in the RIRs’ repositories and telling the router (RTR client) if an AS has the right to advertises the route. Then, as said earlier the router decides what to do. That validator is usually a regular server installed in the same POP as the router.

To continue with the comparison with DNSSEC, right now we are signing our DNS zones, but our own name-servers aren’t validating DNSSEC, while Comcast’s do.

You can easily check the status of a prefix using whois, for instance:

whois -h whois.bgpmon.net " --roa 53371 63.245.217.105/24"
whois -h whois.bgpmon.net 63.245.209.0

The returned code will be one of the following:

0 - ROA correct
1 - No ROA found
2 - ROA incorrect - will tell you what the error is

It’s so easy that nobody with a RIR account has a valid excuse to not start using it!

Urgent Security Maintenance – Subversion (svn.mozilla.org) – Sunday 13 Jan 2013

Shyam Mani

We’ll be taking down svn.mozilla.org for urgent security maintenance
*today* starting at 1615 PST (0015 UTC).

During this window svn.mozilla.org will be *completely* offline, no
check-outs or check-ins will be permitted.

*Time*: January 13th 2013, 1615 (January 14th 2013, 0015 UTC)

*Scheduled downtime*: 12 hours

*Impact*: All subversion related services including svn.mozilla.org and
viewvc.svn.mozilla.org will be *offline*.

We apologize for any inconvenience caused. If there is any *critical*
work this downtime would affect, please feel free to let me know and
we’ll work something out.

Open Database Camp at Northeast LinuxFest

Sheeri

I am happy and proud to announce that there will be an Open Database Camp held at this year’s Northeast LinuxFest! The venue is at Harvard University in Cambridge, MA (“our fair city”), and will take place Saturday, March 16 and Sunday, March 17, 2013.

Northeast LinuxFest and Open Database Camp are both free, but there is no reciprocal membership. To register for Open Database Camp, just sign up with Eventbrite. We are also soliciting session ideas ahead of time, and attendees will choose sessions during the Saturday morning planning session, as usual for Open DB Camp.

If you are interested in sponsoring, do so directly to Northeast LinuxFest and let them know it’s for Open Database Camp!

Open Database Camp is for all open databases – whether it’s MySQL, Postgres, NoSQL, been around for years or something you’re thinking about. You can see previous session ideas at the OpenSQLCamp website.