Caching for Monitoring: Timing is Everything

Sheeri

I found Baron’s reasoning on why the Percona Nagios plugins do not use caching interesting. On the surface, the logic is sound – you do not want to cache when you want real-time monitoring.

I have not yet had time to look at the Percona plugins for Nagios, though I do want to, because back at PalominoDB I helped write a Nagios plugin for MySQL that allows you to do arbitrary calculations. By “arbitrary calculations” I mean you can have a calculation like “Threads_connected/max_connections*100″ and set a threshold of “>80″. You can mix and match MySQL status variables and system variables, and use any perl functions as well, including basic arithmetic.

We put a caching function in there, complete with examples. Why would we do that if caching is bad, as Baron says? Well, long-term caching is bad, but it is certainly acceptable to have caching with a threshold lower than the check interval. With the PalominoDB Nagios plugin for MySQL, you can have 10 different calculations, and set the cache threshold on a per-check basis – the examples use 60 seconds.

If I have 10 calculations, I can set it so the checks do not re-connect to MySQL if there is a file that’s less than 60 seconds old. With checks that run every 5 minutes by default, it makes complete sense to cache the first run of SHOW STATUS/SHOW VARIABLES/SHOW PROCESSLIST, and the other 9 checks use the cache file – but only if it fresh within the past minute.

It is quite likely that the Percona Nagios plugins for MySQL do not lend themselves to this type of caching. Because the PalominoDB Nagios plugin is so powerful, it makes sense to have this type of micro-caching. Otherwise, each additional monitoring check adds more strain on the database.

Edited to add: I just took a look at the Percona monitoring plugins for Nagios and they check very, very different things. The level of customization is not as flexible as it is with the PalominoDB Nagios plugin, because it is checking very different things. The Percona monitoring plugins are a set of 12 different checks, as opposed to the 1 check that the PalominoDB plugin has. Still, I could see a value in caching the output of SHOW ENGINE INNODB STATUS (or whatever it uses) for the pmp-check-mysql-innodb plugin, so you can run it three times – once with the idle_blocker_duration option, once with the waiter_count option, and once with the max_duration option, and it only runs the command needed once.

Basically it comes down to this: Baron is correct when he says “Running SHOW STATUS infrequently doesn’t add load to the server.” and I am correct when I say “Running SHOW STATUS frequently adds a lot of load to the server.” The Percona plugin and the PalominoDB plugin are completely different, and there seems to be very little overlap.