MySQL Workbench with Unix Socket only servers.

Brandon Johnson

Here at Mozilla we like to keep our database systems secure and no more open than they need to be.  Like many database systems, privileged users are locked down to socket only connections. This can be a problem if you want to use database admin tools, like the MySQL Workbench or maybe even some remote performance monitoring software.

One cool trick is forwarding that socket to a port temporarily while you do your work. Below I’ll show you how to forward a socket to a port for use with the MySQL Workbench.

First, establish an ssh connection to your database server using any regular shell connector, such as Terminal for Mac devices, ssh for Linux devices, or PuTTy for Windows.

Once connected, assuming you’re using a socket of “/var/lib/mysql/mysql.sock” (the default), we’ll use a linux tool called socat to forward that socket to an unused port. Note that socat has packages for a wide variety of distributions and is available through both yum and apt-get for major distributions. For more info on socat, see here.

The command for this is:

socat TCP-LISTEN:<port>,reuseaddr,fork,su=nobody UNIX-CLIENT:<socket_path>

An example, using the default socket path and port 3308 would be:

socat TCP-LISTEN:3308,reuseaddr,fork,su=nobody UNIX-CLIENT:/var/lib/mysql/mysql.sock

Tip: If you’re consistently connecting from the same IP address, consider adding the “range” option to the socat parameters for extra security.

What does this command do? It forwards all traffic being received from the port provided to the socket provided. It’s basically the socket equivalent to port forwarding.

Once you’ve created the socket forwarding, you can connect the MySQL Workbench to it by simply setting up a connection to the server using TCP/IP and your port, like the screenshot below (note that it will require your normal socket user’s password).

Screen Shot 2013-06-18 at 5.41.13 PM

 

Once you’re configured like the above screenshot, voila! You can now connect to your socket only server.

Brief Outage for Phoenix Data Center Chassis

Sheeri

One of the chassis in the PHX1 datacenter was experiencing issues which took many services, including those on the generic web cluster offline and degraded others for approximately half an hour. Fixing the issue took approximately 15 minutes. Services should be back to normal.

For reference, the following web services were either downgraded, or unavailable:

generic cluster (contains many web apps)

bouncer
elasticsearch
etherpad
graphite
hangprocessor
input
input-celery
openshift
plugins and plugins memcached
puppetmaster
rabbit
socorro memcache

If you have any questions or concerns please address them to helpdesk@mozilla.com.

Bugzilla Feeling Slow?

Sheeri

We have been experiencing intermittent Bugzilla slowness since Wednesday, June 12th 2013 at 5 pm UTC (10 am US/Pacific time). We have been working throughout the weekend to pinpoint the cause of this irregular, but noticeable, issue. The problem is performance only, there have been no reports and no evidence of data or functionality loss. We will release additional information as we have it.

Update 18 Jun 2013 18:40 pm UTC: The Phoenix chassis outage was completely unrelated to this Bugzilla slowness. Bugzilla is in a different data center and neither caused nor affected the chassis problem, and the only effect the chassis problem had was to pull resources away from figuring out and fixing the bugzilla issue.

A Different Spin On the max_allowed_packet Problem

Sheeri

Back in November, I filed MySQL bug 67448, talking about a different type of max_allowed_packet problem.

See, an application had put data into the database, but could not retrieve it without getting max_allowed_packet. With the help of some really smart community folks (named Jesper Hansen, Brandon Johnson and Shane Bester), we determined that MySQL actually has 2 different max_allowed_packet settings: client and server.

When you change the max_allowed_packet variable, you are changing the server variable if it is in [mysqld] and the client variable if it is in [client] or [mysql] or whatever client you have. As far as we can tell, there’s no way to actually view what the client variable is, as looking at both the session and global max_allowed_packet variable shows you the server variable.

If max_allowed_packet is not set by the client, it defaults to 16M. The proposed solution is to allow it to be increased for non-interactive clients, and the bug has been verified as a “feature request”, though it has not been implemented yet.

ulimits and upgrading from Oracle MySQL 5.0 to Percona patched MySQL 5.1

Sheeri

1

After upgrading to Percona’s patched MySQL 5.1*, end users were having connectivity problems, and reporting errors such as:

OperationalError: (2003, "Can't connect to MySQL server on 'db-amo-ro' (110)")

TimeoutError: Request timed out after 5.000000 seconds

OperationalError: (1135, "Can't create a new thread (errno 11); if you are not out of available memory, you can consult the manual for a possible OS-dependent bug")

We had these same problems a while back, before increasing ulimit settings in /etc/sysconfig/mysqld. Oracle’s MySQL startup script specifically sources this file:

[ -e /etc/sysconfig/$prog ] && . /etc/sysconfig/$prog

However, we saw these errors again when we upgraded to Percona’s MySQL 5.1. At first we thought that it was because Oracle’s startup script is /etc/init.d/mysqld and Percona’s is named /etc/init.d/mysql (so we would put ulimits in /etc/sysconfig/mysql). However, when we looked, we saw that Percona’s startup script does NOT source anything in /etc/sysconfig.

So then we put the following in /etc/security/limits.d/99-nproc-mysql.conf:
root soft nproc 32768
root hard nproc 65535

We restarted MySQL and all was good. Even though we are long past having this problem, I thought it was important enough to blog about.

* We finished upgrading all of our servers to MySQL 5.1 at the end of 2012. We ran into this interesting snag that I wanted to blog about, even though we’re in the middle of upgrading to MySQL 5.5 right now (and by the end of the year, we will upgrade to MySQL 5.6 – the performance schema stuff is definitely something we want to utilize).

Upgrading support.mozilla.org databases

Sheeri

9

A while ago (November 2012 to be exact), we upgraded the support.mozilla.org databases from Percona 5.1 to MariaDB 5.5 (the next step, happening soon, is upgrading them to Oracle’s MySQL 5.6). One of the engineers and I had a conversation where he mentioned that “one of our worst performing views on SUMO is doing waaaayyy better with the upgraded databases”, that it “seems more stable” and that “I stopped receiving ‘MySQL went away or disconnected emails’ which came in once in a while.”

It’s always nice to see upgrades actually making a difference. In our case we saw a lot less CPU wait, though that might also be partially due to tuning the memory settings on the machines and adding in another read slave to handle queries. As a result, network traffic throughput went from less than 1 Mb/sec to about 18 Mb/sec, because the machines were just handling more queries per second, period.

(I had this e-mail as a draft for a while and decided to clean it up and publish it now!)

MySQL User Group Video – Determinism and Databases

Sheeri

The May Boston MySQL User Group featured John Hugg of VoltDB talking about determinism and databases. I have uploaded the hour-long video to http://www.youtube.com/watch?v=mTDLyRauJtw. Seasoned MySQLers will nod their heads because the problems described are familiar, and those who are not exactly sure what “determinism and databases” means will learn a lot.

Enjoy!

(As always, videos are free on YouTube with no login or attempt to solicit your e-mail address or any other information)

RFO: DNSSEC Resolution failures (mozilla.org) 20130515@1800 PDT [872818]

mrz

On May 15 at 1800 PDT Nagios  alerted the start of sporadic DNS resolution failures.  This post summarizes the events, the impact and specific steps Mozilla IT is taking to avoid future disruptions of this nature.

This post is intended to be technical in nature.  DNSSEC is fairly technical and DNSSEC failures tend to be similarly technical. As we’ve done before, we hope to share the failures we encounter in production so you don’t have to experience the same.

SUMMARY

An SOA mismatch between SVN and the nameservers was caused by the DNSSEC signer refusing to sign with an expired ZSK. This was misdiagnosed as a KSK issue, leading to a DNS outage for DNSSEC-verifying resolvers.

DETAILS

In the afternoon of May 15, the nameservers refused to load SOA update 2013051500 for the mozilla.org DNSSEC-signed zone.

Investigation found that the DNSSEC signer was refusing to sign the zone, providing only the error “fatal: cannot find SOA RRSIGs“. In hindsight, this undocumented error indicates that the zone’s ZSK has expired.

Mozilla’s domain registrar publishes DS records for the mozilla.org KSK. When the expired key was found at 16:44, it was misunderstood to be a KSK, rather than a ZSK. A new KSK was generated and its DS record added to Mozilla’s domain registrar.

The new KSK did not resolve the signing errors. Mozilla’s domain registrar was found to rate-limit DS record changes, preventing the new KSK from being reverted. DNS lookups began showing invalid DS records from Mozilla’s domain registrar, but this was later found to be internal DNS only.

After examining the keys (both current and expired) more closely, the expired key was found to be a ZSK, rather than a KSK. Renewing the ZSK fixed the DNSSEC signer. The mozilla.org SOA 2013051500 was signed by both KSKs and the new ZSK, and then published.

Comcast users began reporting DNS resolution issues of mozilla.org, complicating access to various Mozilla properties. DNSSEC validation tools showed unexpected issues with the signed mozilla.org zone.

The DS records were confirmed to be correct externally, so the mozilla.org zone was re-signed without the old KSK, leaving only the new KSK and new ZSK. This resolved the validation issues for reasons unknown, and Comcast users reported DNS working correctly again.

Bugs have been filed to document the KSK/ZSK renewal process, to monitor the expiration times of those keys, and to monitor that the zones validate.

BUGS
  • 872818: mozilla.org SOA mismatch, DNSSEC signer refusing to sign
  • 872831: alarm when DNSSEC signing keys are expiring soon
  • 872884: document ZSK and KSK renewal/rollover process
  • 872832: regenerate mozilla.org DNSSEC ZSK (resolved)
  • 872885: regenerate mozilla.org DNSSEC KSK (resolved)
  • 872927: monitoring: add full validation of DNSSEC zones
TIMELINE (PST8PDT, UTC -0700)
  • 15:32 – SOA mismatch detected between nameservers 2013051402 and svn 2013051500.
  • 16:03 – Found DNSSEC signer refusing to sign mozilla.org 2013051500
  • 16:44 – Found expired key preventing signing of mozilla.org
  • 16:52 – Added new KSK to Mozilla’s domain registrar alongside existing KSK to renew expired key
  • 17:06 – Found that expired key was ZSK, not KSK as previously thought.
  • 17:27 – Signed mozilla.org with both KSKs and new ZSK
  • 17:45 - Mozilla’s domain registrar publishing incorrect hash for new KSK (misleadingly, for internal lookups only)
  • 18:00 – Comcast users reporting sporadic DNS resolution failures
  • 18:20 – Validation issue found with signed zones
  • 18:25 – Signed mozilla.org with new KSK and new ZSK
  • 18:30 – Comcast users reporting DNS resolving successfully
  • 18:35 – Validation issue confirmed resolved
NOTES
  • ZSK and KSK are “zone signing key” and “key signing key” for mozilla.org. DNSSEC permits multiple KSKs and autoselects the latest ZSK. We sign with a single KSK, outside of 17:30-18:25 above.
  • There is no filesystem difference between ZSKs and KSKs. The distinction is the word “zone” or “key” in the comment in the first line of the keyfile.