MySQL and the Leap Second, High CPU and the Fix

Sheeri

58

ETA: For those wanting to know the root cause, see the comments.

So, MySQL seemed to be faring just fine during and after the leap second. There were no big problems or catastrophic failures. However, we noticed that MySQL (and Java) servers* had a huge spike in CPU – which was odd, because nothing out of the ordinary was running. We were able to fix the problem quite easily. Here’s the graph, and the fix:

CPU spike

Note that the CPU spikes starting at 17:00 Pacific, which is midnight UTC. We fixed it around 19:35 Pacific.

The fix is quite simple – simply set the date. Alternatively, you can restart the machine, which also works. Restarting MySQL (or Java, or whatever) does NOT fix the problem. We put the following into puppet to run on all our machines:

$ cat files/bin/leap-second.sh
#!/bin/bash
# this is a quick-fix to the 6/30/12 leap second bug

if [ ! -f /tmp/leapsecond_2012_06_30 ]
then
/etc/init.d/ntpd stop; date -s "`date`" && /bin/touch /tmp/leapsecond_2012_06_30
fi

We ran this in puppet, which we set to run the command on the next puppet run, and puppet would also restart ntp (the way we have it configured; your mileage may vary).

* and a bunch of our IT guys have personal, at-home minecraft servers, which were also similarly affected.

58 responses

  1. Robert Coli wrote on :

    Doesn’t the command need a “date -s” in it to work?

    Your current command :

    /etc/init.d/ntp stop; date `date +”%m%d%H%M%C%y.%S”`

    But this just PRINTS the date at that date, it does not SET the date to that date.

    /etc/init.d/ntp stop; date -s “`date`”

    Works just as well as the complex formatting string (you are losing the milliseconds with the race anyway, and ntp will correct) and actually calls “settime” system call to set the date.

    Are you sure your command as transferred to the blog post works?

    =Rob

    1. Sheeri wrote on ::

      Rob – I thought the same thing about date -s. However, the folks that actually ran the command assure me this works. Yours is shorter, for sure.

      At any rate, reading this blog post will have people knowing what the fix is (set the date). There are many different ways to do that, so folks should use what works for them. At this point we’re just nitpicking :D

      But since that was the first thing *I* noticed too, I’ll change the script to your shorter way, using date -s, to avoid lots of other comments about it.

      1. Robert Coli wrote on :

        Sheeri,

        Got it… actually “print a date other than the current date” syntax is to use “-d”, and -s is not required to have “set” effect.. *if* the new timestamp is the only argument. You don’t need -s argument to get a call to settime via date as proved below :

        [root@hostname ~]# strace date `date +”%m%d%H%M%C%y.%S”` 2>&1 |grep setti
        clock_settime(CLOCK_REALTIME, {1341115668, 0}) = 0

        =Rob

        1. Robert Coli wrote on :

          FWIW, testing indicates that stopping/starting ntp is not required; the entire fix is the call to clock_settime via “date”.

          =Rob

          1. k wrote on :

            Correct :)

  2. wesmorgan1 wrote on ::

    You can avoid NTP glitchiness by configuring for leap seconds in advance.

    See http://support.ntp.org/bin/view/Support/ConfiguringNTP for details; you’ll also need to grab the leapseconds file from ftp://tycho.usno.navy.mil/pub/ntp/

  3. Ross wrote on ::

    Thanks guys, I was totally stumped for a while their as I had the same problem. No idea their was a leap second update, so thanks!

  4. Pingback from 闰秒导致系统CPU利用率暴涨 | 坚果云博客 on ::

    [...] 谢谢Mozilla的一篇blog, 也谢谢Google的快速灵活的实时索引,我们在重启服务器的过程中,发现了如下更简单的解决办法: $ cat files/bin/leap-second.sh # this is a quick-fix to the 6/30/12 leap second bug [...]

  5. Adminblogger wrote on :

    The more elegant solution:

    date -s “`date -u`”

    1. Jason Drage wrote on ::

      Great – this worked for 3 of 4 virtualisation servers – the other was fixed with a reboot. thanks!

    2. nathan wrote on :

      Even simpler:
      date -s now
      (as mentioned on the serverfault question 403732 page).

  6. Sam Pullara wrote on :

    Does anyone know the root cause? Why is the CPU spiking?

    1. bburton wrote on :

      The underlying cause is a kernel bug, as discussed http://serverfault.com/questions/403732/anyone-else-experiencing-high-rates-of-linux-server-crashes-today and https://access.redhat.com/knowledge/articles/15145 (see https://bugzilla.redhat.com/show_bug.cgi?id=479765 also)

      The kernel fix is in https://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=6b43ae8a619d17c4935c3320d2ef9e92bdeed05d

      But as of yet, I haven’t seen a good analysis of the specifics on why MySQL and Java went into a spin, but as http://dpaste.com/765885/ mentions, it sounds like things were getting hung up due to syscalls that went into loops at the leap second hit.

      We’ll update the post if we see any details analysis of either a MySQL CPU spike or a Java/JVM get posted in the next few days

      1. John Stultz wrote on :

        I believe the analysis from serverfault doesn’t correctly describe the issue from this post.

        My analysis is here:
        https://lkml.org/lkml/2012/7/1/203

        Proposed fix thread starts here:
        https://lkml.org/lkml/2012/7/1/176

        1. nathan wrote on :

          A second thread covering the proposed fixes:

          https://lkml.org/lkml/2012/7/5/440

    2. Jayfar wrote on :

      Race condition I think.

    3. Tero Marttila wrote on :

      This affects a wide variety of services; at least mysql, puppetmaster, firefox (!). See LKLM:

      https://lkml.org/lkml/2012/7/1/19

  7. Justin Dolske wrote on :

    Tell us more about these Minecraft servers. :-)

  8. Pingback from High cpu load on CloudStack management servers after leap second 30/6/2012 on ::

    [...] as well, though I didn’t experience problems with it.  Thanks to the guys @Mozilla for blogging about this problem and suggesting a [...]

  9. Pingback from MySQL and the Leap Second, High CPU and the Fix | Mozilla IT | ちゅどん道中記 on ::

    [...] and the Leap Second, High CPU and the Fix | Mozilla IT Posted 2012年7月1日 Check MySQL and the Leap Second, High CPU and the Fix | Mozilla ITMozilla IT Mozilla IT & Operations MySQL and the Leap Second, High CPU and the Fix Jun 30 2012 3 [...]

  10. Pingback from Plötzliche hohe MySQL CPU Auslastung ohne aktive Queries » Debian Root on ::

    [...] zu können und dort wurde ich fündig. Es scheint ein Kernel Bug zu sein bezüglich "Leap Second". Mozilla hat einen Blogpost mit der Lösung veröffentlicht. Ein Neustart des Servers behebt das Problem wohl auch, aber so kann man sich diesen [...]

  11. Pingback from Blogme » Blog Archive » MySQL and the Leap Second, High CPU and the Fix | Mozilla IT on ::

    [...] is the original post: MySQL and the Leap Second, High CPU and the Fix | Mozilla IT Posted By: admin Last Edit: 01 lip 2012 @ 03:07 AM Email • Permalink Tags: [...]

  12. Pingback from Me Blog − MySQL and the Leap Second, High CPU and the Fix | Mozilla IT on ::

    [...] The rest is here: MySQL and the Leap Second, High CPU and the Fix | Mozilla IT [...]

  13. Pingback from Leap Second Problem auf Debian und anderen › NETWAYS Blog on ::

    [...] MySQL and the Leap Second [...]

  14. Pingback from (Linux) Leap second message Issue 2012 | 나만의 세상~! on ::

    [...] Link : http://blog.mozilla.org/it/2012/06/30/mysql-and-the-leap-second-high-cpu-and-the-fix/ Link : http://planet.mysql.com/entry/?id=33709 Link : [...]

  15. Pingback from High MySQL CPU Load Today? Quick Fix « Coffee on the Keyboard on ::

    [...] And yet MySQLd and Java apparently didn’t like it, at least some version and OS combinations. Fortunately, there’s an easy fix. Sheeri from Mozilla’s IT group pointed me to their blog post. [...]

  16. Pingback from Перевод мировых атомных часов на одну секунду привёл к массовому зависанию серверных приложений | AllUNIX.ru — Всероссийский портал о UN on ::

    [...] приложения работающие в Java VM, такие как Hadoop и Cassandra), начинала съедать все процессорные ресурсы СУБД MySQL, [...]

  17. Pingback from Quick Links related to linux leap second issue (especially for Java users) | DonaldOJDK Blog on ::

    [...] Mozilla seems to have been early to learn and report that the core issue is related to highres timers on linux, and in particular applications that make use of ntpd – the Network Time Protocol Daemon.  There has been some code activity related to that in the kernel lately, but it’s not clear to me (as a self-admitted linux-n00b) if that code is live, or somehow related – please check in with your linux provider to see what they’re recommending.  It’s not certain that’s the (or only) issue, but it seems to be the way the internetz are leaning at the moment. [...]

  18. Pingback from Elementary! MySQL did it, in the CPU, with the the leap second! | Martin Falatic’s Techno Blog on ::

    [...] found the following suggestion (found here and here) worked well enough for me on its own (fingers crossed [...]

  19. Alex Kavanagh wrote on :

    Thanks so much for this – solved my problem too!

  20. Pingback from Перевод мировых атомных часов на одну секунду привёл к массовому зависанию серверных приложений : Записки начинающего линуксоида on ::

    [...] приложения работающие в Java VM, такие как Hadoop и Cassandra), начинала съедать все процессорные ресурсы СУБД MySQL, [...]

  21. Pingback from Verlängertes Wochenende kann Linux einfrieren | virtualfiles.net on ::

    [...] Nacht geschaltete Extrasekunde kann auf Linux-Rechnern zu dauerhafter CPU-Auslastung führen. Dokumentiert ist dieses Verhalten etwa im Blog der Mozilla Foundation, die vom Zeitpunkt der Schaltsekunde an [...]

  22. Sven T wrote on :

    Thanks for the script!

  23. Pingback from うるう病にかかっていた件 at reisa Sekai on ::

    [...] MySQL and the Leap Second, High CPU and the Fix (Mozilla IT) [...]

  24. Pingback from Anonymous on ::

    [...] [...]

  25. Pingback from Happy Leap Second! « Just Urbanism on ::

    [...] was the user process spinning at 100% CPU even after being restarted. A quick fix seems to be setting the system clock which apparently resets the bad state in the kernel (we [...]

  26. Pingback from Перевод мировых атомных часов на одну секунду привёл к массовому зависанию серверных приложений | conon print Перевод мировых атомных часов на on ::

    [...] приложения работающие в Java VM, такие как Hadoop и Cassandra), начинала съедать все процессорные ресурсы СУБД MySQL, [...]

  27. Pingback from Linux, BSD un citas lietas - Atvērtajās ziņās | Globālas problēmas atompulksteņa sinhronizācijas dēļ on ::

    [...] uzkāršanās (pamatā Java VM bāzētu, tādu kā Hadoop un Cassandra), visus servera procesus sāka patērēt MySQL, atslēdzās VPN tuneļi, kas bāzēti uz OpеnVPN, uzkārās Linux bāzēti [...]

  28. Pingback from Schaltsekunde: Linux kann einfrieren | virtualfiles.net on ::

    [...] Nacht geschaltete Extrasekunde kann auf Linux-Rechnern zu dauerhafter CPU-Auslastung führen. Dokumentiert ist dieses Verhalten etwa im Blog der Mozilla Foundation, die vom Zeitpunkt der Schaltsekunde an [...]

  29. Pingback from Leap second hits Qantas air bookings, while Reddit and Mozilla stutter | AKTUAL TECHNO NEWS on ::

    [...] The resolution in most cases was to reboot a servers, alternative people upon a thread deliberating a complaint said. Mozilla after pushed out a fix. [...]

  30. Pingback from Leap second hits Qantas air bookings, while Reddit and Mozilla stutter | Old News on ::

    [...] The solution in many cases was to reboot the servers, other people on the thread discussing the problem said. Mozilla later pushed out a fix. [...]

  31. Pingback from Leap second hits Qantas air bookings, while Reddit and Mozilla stutter on ::

    [...] The resolution in most cases was to reboot a servers, alternative people upon a thread deliberating a complaint said. Mozilla after pushed out a fix. [...]

  32. Pingback from Leap second hits Qantas air bookings, while Reddit and Mozilla stutter | Technology News on ::

    [...] The solution in many cases was to reboot the servers, other people on the thread discussing the problem said. Mozilla later pushed out a fix. [...]

  33. Pingback from Leap second: Linux can freeze on ::

    [...] night can cause permanent high CPU loads on Linux computers. Among other examples, the behaviour is documented in the blog of the Mozilla Foundation, where strange peak loads on one Mozilla server were observed [...]

  34. Trackback from What problems did adding a leap second on June 30, 2012 cause?... on ::

    What problems did adding a leap second on June 30, 2012 cause?…

    A live-lock in the Linux kernel[1] caused 100% CPU usage in user-space processes. The root cause was diagnosed[2] to be early firing of timers due to the clock_was_set() state-clearing function not being called A workaround[3] was developed which had t…

  35. Pingback from “Leap second” | ~/blog on ::

    [...] nombreuses organisations dont Mozilla se sont fait avoir. Google semble avoir géré le problème intelligement en ajoutant quelques [...]

  36. Pingback from What does the leap-second bug look like? » Aaron at the Internet Archive on ::

    [...] simply setting the system clock fixes it.  Posted by Aaron Binns at [...]

  37. Ben wrote on :

    For future reference, you could do all that in Puppet instead of distributing and then executing a shell script.

    exec { ‘leapsecond_2012_06_30′:
    creates => ‘/tmp/leapsecond_2012_06_30′,
    command => ‘/bin/touch /tmp/leapsecond_2012_06_30′,
    onlyif => ‘/etc/init.d/ntpd stop; date -s “`date`”‘,
    }

  38. Pingback from Leap second bug found to be Linux issue, not Cassandra. | DataStax on ::

    [...] challenge was that when troubleshooting the problem, some applications such as Cassandra, MySQL, Hadoop, Elastic Search, and others showed extremely high CPU consumption, which led people to [...]

  39. Pingback from Leap second og CentOS 6 | KP sin huskeblogg on ::

    [...] dette også vises med at serveren hadde 3.7 millioner kontekstsvitsjinger mot 8-10 tusen normalt. Mozilla har også lagt ut litt informasjon om hvordan de løste problemet uten reboot. This entry was [...]

  40. Sapd wrote on :

    And i search and wonderd why my Mysql has high CPU usage…

  41. Pingback from The 2012 Leap Second on Linux | Open Query blog on ::

    [...] original post MySQL and the Leap Second, High CPU and the Fix describes how MySQL servers would suddenly exhibit high CPU usage during a period of low load. From [...]

  42. Pingback from Dodatkowa sekunda spowodowała problemy na Linuksie | OSWorld.pl on ::

    [...] MySQL and the Leap Second, High CPU and the Fix [...]

  43. Cédric wrote on ::

    Hi, thx for this nice tip !
    For Frenchies guys, LANG=en_EN.UTF-8 can help before to set the date

  44. Pingback from Electrical Storms, Leap-Seconds, and the Ops Weekend from Hell - FullContact on ::

    [...] after the leap second. JVMs started and immediately stopped. We were puzzled. We came across the Mozilla IT Blog post about high-CPU usage of MySQL after the leap second which provided the seemingly useless [...]

  45. Pingback from Leap Second, Public Cloud and a Lesson in Enterprise Resiliency | Secure Lagos on ::

    [...] are a lot of interesting notes, bug reports, and second-hand-stories from this too… MySQL had an issue, as did Java. [...]

  46. Pingback from The 2012 Leap Second on Linux | ForLinux Blog on ::

    [...] There are other articles to read up on here [...]

  47. Trackback from ... [Trackback]... on ::

    … [Trackback]…

    [...] Find More Informations here: blog.mozilla.org/it/2012/06/30/mysql-and-the-leap-second-high-cpu-and-the-fix/ [...]…