{"id":525,"date":"2015-01-14T18:23:48","date_gmt":"2015-01-14T18:23:48","guid":{"rendered":"http:\/\/blog.mozilla.org\/services\/?p=525"},"modified":"2015-01-15T04:30:39","modified_gmt":"2015-01-15T04:30:39","slug":"heka-log-parsing-performance","status":"publish","type":"post","link":"https:\/\/blog.mozilla.org\/services\/2015\/01\/14\/heka-log-parsing-performance\/","title":{"rendered":"Heka Log Parsing Perfomance"},"content":{"rendered":"<p>I&#8217;ve been experimenting with Heka throughput recently, and have discovered a few log parsing tips and tricks that I thought were worth sharing.<\/p>\n<p>I ran tests on my Lenovo Thinkpad W530 laptop, which has four 2.8GHz cores, showing up as 8 cores to my Linux OS thanks to hyperthreading. From our ops team I got a couple of Nginx log files, each with over 6 million entries. I made three copies of each log file, and put each copy in a separate directory, emulating the common use case of having to parse output from multiple hosts and\/or domains. I configured Heka with a single <a title=\"LogstreamerInput\" href=\"http:\/\/hekad.readthedocs.org\/en\/v0.8.2\/config\/inputs\/index.html#logstreamer-input\">LogstreamerInput<\/a> set up to concurrently load all six files, using our <a title=\"Nginx access log decoder\" href=\"http:\/\/hekad.readthedocs.org\/en\/v0.8.2\/config\/decoders\/index.html#nginx-access-log-decoder\">Nginx access log decoder<\/a>, which ultimately uses Lua&#8217;s <a title=\"LPEG\" href=\"http:\/\/www.inf.puc-rio.br\/~roberto\/lpeg\/lpeg.html\">LPEG<\/a> engine to handle the actual parsing. Not a scientifically rigorous test, but I&#8217;m just trying to get some ballpark throughput numbers.<\/p>\n<p>For my first run I set Go&#8217;s maxprocs value to 1, which makes only one core available for Heka to use. With that setting, I was processing in the neighborhood of 31K events per second (eps); so far so good. Then I cranked maxprocs up to 4. I ran the tests again and my throughput jumped all the way up to about 33K eps. Wait&#8230; what!?! I know that sometimes lock contention and other issues can cause <a title=\"GOMAXPROCS performance impact\" href=\"http:\/\/golang.org\/doc\/faq#Why_GOMAXPROCS\">poor performance across multiple cores<\/a>, but we&#8217;re processing six distinct streams here; this should parallelize fairly well. I was seeing almost no improvement. What the heck was going on?<\/p>\n<p>Nothing to do but roll up my sleeves and dig in. I busted out the <a title=\"Profiling Go Programs\" href=\"http:\/\/blog.golang.org\/profiling-go-programs\">Go profiling tools<\/a>, but nothing obvious jumped out at me. So I took a look at the source code, and was temporarily stunned by the force of my palm slamming into my forehead at high velocity. While the LogstreamerInput was correctly spinning up a separate goroutine for each unique log stream, it was only spinning up a <em>single<\/em> decoder to be shared by all of them. That meant that no matter how many cores were available to Heka as a whole, all of the parsing was happening in a single goroutine, which would only be running on a single core.<\/p>\n<p>Luckily this was an easy fix. In about 30 minutes, I had a working Heka that would spin up a separate decoder instance for each log stream; these changes have since landed in the <a title=\"Heka v0.8.3\" href=\"https:\/\/github.com\/mozilla-services\/heka\/releases\/tag\/v0.8.3\">v0.8.3 release<\/a>. Running this with maxprocs of 1 ran slightly slower than before, about 30K eps. This wasn&#8217;t surprising, since now the single core was context switching through six different decoder goroutines whereas before there was just one. But cranking maxprocs up to 4 and trying again gave me around 85K eps. Now we&#8217;re talking! Not linear improvement, but still a much better number. I got my best results with maxprocs set to 7, where I was consistently processing approximately 118K log lines per second.<\/p>\n<p>I was actually able to do even better than that. In prior experimentation with log parsing performance, we&#8217;ve realized that parsing timestamps is a very processor intensive part of the job. We can lighten this load by configuring our Nginx servers to log timestamps using `$msec` instead of `$time_local` or `$time_iso8601` in the `log_format` setting. This means our log files contain Unix timestamps (seconds-since-epoch format) rather than more complicated date strings. The difference is significant. When I tested versions of the log files using the `$msec` timestamp format, my single core throughput jumped to around 36K eps. Four cores gave me 105K eps, and using 7 cores had me consistently parsing about 143K lines per second. Not too shabby!<\/p>\n<p>It&#8217;s worth noting that there is still a lot of room for improvement. For instance, due to details of the runtime layers, pushing text from Go through C and into Lua (and then back again) involves more memory copies than it probably should. The Lua sandbox that Heka uses is implemented as a <a title=\"Lua sandbox\" href=\"https:\/\/github.com\/mozilla-services\/lua_sandbox\">separate project<\/a>, in pure C. It would be possible for us to get much higher parsing throughput by writing a thinner wrapper around the sandbox using a language that doesn&#8217;t require copies to talk to C, such as C++, Rust, or C itself. This removes us from the rest of the Heka pipeline, of course, but since a considerable amount of the data processing that we do in Heka is also happening in a sandbox, we&#8217;d still be able to do a lot of crunching, we&#8217;d just have to set up a bespoke pipeline with Lua code instead of using Heka&#8217;s routing infrastructure.<\/p>\n<p>Even so, the full Heka stack is fast enough for most use cases, and is ready to be used today. I hope you&#8217;ll try it out, and come find us in the #heka channel on irc.mozilla.org, or on the Heka <a title=\"Heka mailing list\" href=\"https:\/\/mail.mozilla.org\/listinfo\/heka\">mailing list<\/a> to ask questions and\/or tell us about your experience.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>I&#8217;ve been experimenting with Heka throughput recently, and have discovered a few log parsing tips and tricks that I thought were worth sharing. I ran tests on my Lenovo Thinkpad W530 laptop, which has four 2.8GHz cores, showing up as &hellip; <a class=\"go\" href=\"https:\/\/blog.mozilla.org\/services\/2015\/01\/14\/heka-log-parsing-performance\/\">Continue reading<\/a><\/p>\n","protected":false},"author":598,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[30947,30944],"tags":[],"_links":{"self":[{"href":"https:\/\/blog.mozilla.org\/services\/wp-json\/wp\/v2\/posts\/525"}],"collection":[{"href":"https:\/\/blog.mozilla.org\/services\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.mozilla.org\/services\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.mozilla.org\/services\/wp-json\/wp\/v2\/users\/598"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.mozilla.org\/services\/wp-json\/wp\/v2\/comments?post=525"}],"version-history":[{"count":0,"href":"https:\/\/blog.mozilla.org\/services\/wp-json\/wp\/v2\/posts\/525\/revisions"}],"wp:attachment":[{"href":"https:\/\/blog.mozilla.org\/services\/wp-json\/wp\/v2\/media?parent=525"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.mozilla.org\/services\/wp-json\/wp\/v2\/categories?post=525"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.mozilla.org\/services\/wp-json\/wp\/v2\/tags?post=525"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}