Feed on
Posts
Comments

Introduction: Using A Riak Cluster for the Mozilla Test Pilot Project

As part of integrating Test Pilot into the Firefox 4.0 beta, we needed a production-worthy back-end for storing the experiment results and performing analysis on them. As discussed in the previous blog post, Riak and Cassandra and Hbase, oh my!, we decided on Riak as that back-end.

Some of the preliminary work and a lot of the initial implementation involved conducting benchmarking studies that would verify the fitness of the solution and give us a solid understanding of when and how we would need to scale. Mozilla worked with Basho (the stewards of the Riak project) to perform this benchmarking, and this blog post details the results.


The Test Pilot program involves the storage and processing of usability experiments from users who have opted in to the program. Each experiment collects non-personal data and the user has a chance to review the collected data at the end of each study before submitting it to us for analysis. Typically, an experiment runs for a week from the time that the user’s Test Pilot client receives the experiment specification and the user decides to opt-in to it. In production terms this means 100 GB to 2 TB of self-reported usability data, in total, will be generated, depending on the experiment specification. Further, the majority of the data will be submitted in a 48 hour window starting 7 days after the launch of the experiment. Object sizes can vary wildly depending on the experiment and on the user. Originally, Jono wrote a quick script to parse through previous Test Pilot data to determine a median size. Now that we actually have data in Riak, we can re-run this test with a Map Reduce query: Experiment Size Summary JSON. It currently gives us the following results on our latest study:

[{"invalidDataPoints":20, "validDataPoints":318620, "numLargeObjects":0, "dataSizeStats":{"count":318620, "min":1440, "max":2986210, "percentile25":8819, "median":25627.5, "percentile75":54351, "percentile99":314983, "mean":44542.48518297723, "sum":14650937989.0, "variance":4785728479.24337, "sdev":69178.9598016866}, "numEventStats":{"count":318620, "min":0, "max":47432, "percentile25":137, "median":468, "percentile75":1022, "percentile99":5554, "mean":806.4797784194408, "sum":256960587, "variance":1482951.5760175542, "sdev":1217.7649921136483}, "errors":{}}]

So our median payload size is 25 KB and the max item was 2 MB. That isn’t too much of a worry though because the third quartile response size is only double the median at 54 KB and the 99th percentile is 314 KB.

We were interested to see how Riak would handle the load, especially given our plan to use pre-commit hooks to ensure the data conformed to the expected format and to limit exceedingly large files (in excess of 5MB).

Most of these benchmarks ran before we went into production. Now we are now running experiments with results supported by the benchmarks. In production, Riak has proved very stable and capable of handling our write throughput load, which is a pleasing validation of our assessment and choice to use it as the Test Pilot back-end.

Physical Systems

The production Riak cluster we are running uses four sixteen-core Intel systems, with 24GB of RAM and dual SATA disk drives.

Software

We are running Ubuntu 10.04 on these boxes. Tests were first run against Riak 0.10 and later Riak 0.11, both using the Bitcask storage engine. For benchmarking, we employed Basho Bench, an open source tool built by the Basho team to create load tests and generate both graphical and tabular outputs. One nice point about Basho Bench is that it is pretty easily adaptable to any general load testing scenario and is not tied tightly to Riak.

Configuring Riak

When initially configuring Riak, for the most part we tried to stick with the defaults unless documentation or advice from Basho indicated otherwise. One of these defaults was the setting of 1024 vnodes. That means initially 1024/4 or 256 vnodes would be distributed across four physical nodes, leaving plenty of room to expand the number of servers in the cluster.

The only significant tweaking of configurations we have done during this test is in varying the number SpiderMonkey VMs we ran for the pre-commit hook benchmarks (see Study Three.)

The Studies: Three Facets of the Production Environment

Our goal in running these studies was, simply put, no surprises. That meant we needed to run studies to that profiled:
1. Latency
2. Stability, especially for long running tests
3. Performance when we introduced variable object sizes
4. Performance when we introduced pre-commit hooks to evaluate incoming data

Since each vnode blocks until a write is complete, and the production payload we expect to see for the typical Test Pilot experiments is a 10KB median object size, we also needed to make sure the introduction of a distribution of object sizes that included very large object (up to 5MB) did not introduce unacceptable performance bottlenecks.

As the data from the experiments indicates, Riak demonstrated the throughput, stability, and 99th percentile latency we needed to move to production.

We found predictable decreases in throughput as object sizes increased. We also saw a performance increase when we upgraded from Riak 0.10 to Riak 0.11 which was made available during the period of our performance testing.

Study One: Baseline for 1 Kbyte and 10 Kbyte objects

In this study, composed over several test runs, we configured Basho Bench to generate 1 Kbyte and 10 Kbyte objects to be written to Riak over progressively longer periods.

Test Run 1.a – Riak version 0.10 – 1K objects – 1 hour
Test Run 1.a - Riak version 0.10 - 1K objects - 1 hour
Median Throughput: 4,481.3 ops/sec
Median Latency at: 3.25 ms
Latency at 99th percentile: 5.36 ms
Latency at 99.9th percentile: 6.80 ms
Test Run 1.b – Riak version 0.10 – 10K objects – 1 hour
Test Run 1.b - Riak version 0.10 - 10K objects - 1 hour
Median Throughput: 4,468.7 ops/sec
Median Latency: 3.25 ms
Latency at 99th percentile: 5.51 ms
Latency at 99.9th percentile: 7.02 ms
Test Run 1.c – Riak version 0.10 – 10K objects – 8 hours
Test Run 1.c - Riak version 0.10 - 10K objects - 8 hours
Median Throughput: 3,149.9 ops/sec
Median Latency: 4.62 ms
Latency at 99th percentile: 7.29 ms
Latency at 99.9th percentile: 13.37 ms
Test Run 1.d – Riak version 0.10 – 25K objects – 1 hour
Test Run 1.d - Riak version 0.10 - 25K objects - 1 hour
Median Throughput: 1603.7 ops/sec
Median Latency: 9.39 ms
Latency at 99th percentile: 14.88 ms
Latency at 99.9th percentile: 21.88 ms
Test Run 1.e – Riak version 0.11 – 1K objects – 1 hour
Test Run 1.e - Riak version 0.11 - 1K objects - 1 hour
Median Throughput: 5,256.2 ops/sec
Median Latency: 2.76 ms
Latency at 99th percentile: 4.42 ms
Latency at 99.9th percentile: 5.69 ms
Test Run 1.f – Riak version 0.11 – 10K object size – 8 hours
Test Run 1.f - Riak version 0.11 - 10K object size - 8 hours
Median Throughput: 4,071.4 ops/sec
Median Latency: 3.23 ms
Latency at 99th percentile: 58.45 ms
Latency at 99.9th percentile: 69.62 ms
Test Run 1.g – Riak version 0.11 – 25K object size – 1 hour
Test Run 1.g - Riak version 0.11 - 25K object size - 1 hour
Median Throughput: 2,583.0 ops/sec
Median Latency: 5.55 ms
Latency at 99th percentile: 10.65 ms
Latency at 99.9th percentile: 70.95 ms
Test Run 1.h – Riak version 0.11 – 50K object size – 1 hour
Test Run 1.h - Riak version 0.11 - 50K object size - 1 hour
Median Throughput: 1,392.1 ops/sec
Median Latency: 9.80 ms
Latency at 99th percentile: 28.21 ms
Latency at 99.9th percentile: 28.58 ms

Study One Findings:

Study One demonstrated requisite performance (throughput and latency), stability, and predictability properties for a test of fixed object sizes.

Median latency and 99.9th percentile insert latency for all object sizes met or exceeded our requirements.

Insert performance was stable across all test periods and Riak showed itself to be stable under load.

The upgrade from Riak v. 0.10 to v. 0.11 brought with it significant performance improvements (on the order of 20-25%, as demonstrated when comparing Test Runs 1.a to 1.e and 1.d to 1.g). The 0.12 release claims to have additional performance improvements.

A quick note on the results in Test Run 1.c, where we saw a the decline in throughput starting at approximately the 25,000 second mark. When using Bitcask, keys (along with some metadata) are stored in memory. With the current version of Bitcask, when applications reach a point where the key space exceeds the total physical memory of clustered machines, you will either need to add capacity by adding another physical machine (which causes Riak to redistribute the keys across the increased total available memory) or accept the decreased performance caused by swapping.

The Basho team is working on a new version of Bitcask that deals directly with this issue. Again, this condition is not one the Test Pilot project needs to worry about because we are several months away from having billions of keys, but it is good to be aware of the current limitation and behavior.

Study Two: Benchmarks using a distribution of object sizes (25K and 50K median)

Once we established a baseline of behavior with fixed object sizes (Study One), we introduced variable object sizes to better model our production load. We configured tests to use an exponential distribution of object sizes, with median object sizes set at 10K, 25K, and 50K.

Test Run 2.a – Riak v. 0.11 – 10K median object size – 1 hour
Test Run 2.a - Riak v. 0.11 - 10K median object size - 1 hour
Median Throughput: 3,879.6 ops/sec
Median Latency: 3.42 ms
Latency at 99th percentile: 7.71 ms
Latency at 99.9th percentile: 27.06 ms
Test Run 2.b – Riak v. 0.11 – 10K median object size – 4 hours
Test Run 2.b - Riak v. 0.11 - 10K median object size - 4 hours
Median Throughput: 3,991.6 ops/sec
Median Latency: 3.43 ms
Latency at 99th percentile: 7.63 ms
Latency at 99.9th percentile: 11.95 ms
Test Run 2.c – Riak v. 0.11 – 25K median object size – 1 hour
Test Run 2.c - Riak v. 0.11 - 25K median object size - 1 hour
Median Throughput: 2307.7 ops/sec
Median Latency: 4.87 ms
Latency at 99th percentile: 20.60 ms
Latency at 99.9th percentile: 165.10 ms
Test Run 2.d – Riak v. 0.11 – 50K median object size – 1 hour
Test Run 2.d - Riak v. 0.11 - 50K median object size - 1 hour
Median Throughput: 1366.4 ops/sec
Median Latency: 7.56 ms
Latency at 99th percentile: 50.36 ms
Latency at 99.9th percentile: 137.39 ms

Study Two Findings:

We can conclude from the results of Study Two that Riak will successfully handle loads resembling our production traffic.

As object sizes increased, operations/second predictably decreased. System stability was uniformly excellent. In fact, the longer running 10K object test (Test 2.b, 4 hours) showed a significantly lower 99.9th percentile insert latency vs. the 1 hour Test, 2.a — 11.95 ms vs 27.06 ms, respectively. It is difficult to attribute a reason to the observed behavior though, as a matter of good practice, longer running tests tend to provide more accurate measurements.

In terms of raw throughput, Riak performed well on a small 4-node cluster.

Study Three: Benchmarks using a pre-commit hooks

When we tested the map reduce and pre-commit functionality we found that larger objects exhausted the allotted memory space for the JavaScript VM. This was fixed via a patch that is included in the latest Riak. In addition to making the stack space configurable there were enhancements made to better performance of the JavaScript map reduce jobs as well as improve its timeout functionality.

This Study included variations in object size (10K, 25K, and 50K) and SpiderMonkey VMs (8 and 16). We also ran a test using pre-commit hooks written in Erlang (see Test Run 3.f) as a point of comparison. We prefer to use the JavaScript pre-commit hooks because it is a language with which we have more familiarity.

Test Run 3.a – Riak v. 0.11 – Pre-commit Hook – 10K object size – 8 VMS – 1 hour
Test Run 3.a - Riak v. 0.11 - Pre-commit Hook - 10K object size - 8 VMS - 1 hour
Median Throughput: 467 ops/sec
Median Latency: 2.71ms
Latency at 99th percentile: 81.5 ms
Latency at 99.9th percentile: 116.1 ms
Test Run 3.b – Riak v. 0.11 – Pre-commit Hook -10K object size – 16 VMS – 1 hour
Test Run 3.b - Riak v. 0.11 - Pre-commit Hook -10K object size - 16 VMS - 1 hour
Median Throughput: 497.1 ops/sec
Median Latency: 26.77 ms
Latency at 99th percentile: 68.68 ms
Latency at 99.9th percentile: 94.25 ms
Test Run 3.c – Riak v. 0.11 – Pre-commit Hook -10K object size – 32 VMS – 1 hour
Test Run 3.c - Riak v. 0.11 - Pre-commit Hook -10K object size - 32 VMS - 1 hour
Median Throughput: 512 ops/sec
Median Latency: 26.23 ms
Latency at 99th percentile: 60.74 ms
Latency at 99.9th percentile: 81.63 ms
Test Run 3.d – Riak v. 0.11 – Pre-commit Hook – 50K object size – 16 VMS – 1 hour
Test Run 3.d - Riak v. 0.11 - Pre-commit Hook - 50K object size - 16 VMS - 1 hour
Median Throughput: 127.3 ops/sec
Median Latency: 10.32 ms
Latency at 99th percentile: 29.29 ms
Latency at 99.9th percentile: 39.56 ms
Test Run 3.e – Riak v. 0.11 – Pre-commit Hook – 50K object size – 16 VMS – 3 hours
Test Run 3.e - Riak v. 0.11 - Pre-commit Hook - 50K object size - 16 VMS - 3 hours
Median Throughput: 120.0 ops/sec
Median Latency: 10.25 ms
Latency at 99th percentile: 37.08 ms
Latency at 99.9th percentile: 49.11 ms
Test Run 3.f – Riak v. 0.11 – Pre-commit Hook – 10K object size – Erlang – 10 minutes
Test Run 3.f - Riak v. 0.11 - Pre-commit Hook - 10K object size - Erlang - 10 minutes
Median Throughput: 2,052.9 ops/sec
Median Latency: 6.99 ms
Latency at 99th percentile: 12.22 ms
Latency at 99.9th percentile: 15.61 ms

Study Three Findings: JavaScript and Erlang pre-commit hooks

This study demonstrated that JavaScript pre-commit hooks for large objects did not offer the throughput performance we required to use them in production. Using JavaScript pre-commit hooks, we saw a 10x decrease in throughput on the cluster and a 5x increase in insert latencies at the 99th and 99.9th percentile as compared to the median object size tests in Study Two. Increasing the number of SpiderMonkey VMs used per instance did not have an appreciable enough impact on throughput performance and, as Tests 3.a, 3.b, and 3.c show, increasing the VMs introduced latency.

Erlang pre-commit hooks offered a significant advantage (3x), though they do not fit into our operational requirements.

While this test demonstrated that pre-commit hooks were not well suited for our solution, this did not impact our production plans. Instead, we will use post-commit hooks triggering a MapReduce job to remove unwanted payloads.

The Basho team has also scheduled work to enhance performance of the JavaScript pre-commit hooks. This work is scheduled for the 0.13 and subsequent release.

It is worth noting that while total throughput was not acceptable for the Test Pilot production needs, the Riak cluster continued to demonstrate predictable behavior and stability even under the significant insert load and with additional memory allocated to SpiderMonkey VMs.

Conclusions

From these benchmark studies, Riak meets or exceeds all the write performance requirements that led Mozilla to select it for the Test Pilot project. From an operational perspective, it was nice to see predictable performance and stability under load. Many of the performance graphs paint an unexciting picture — flat lines from left to right. The ones that were not flat lines typically had a very obvious explanation. This is precisely the sort of non-excitement that operations and developer teams love to see.

Basho made some great performance improvements from 0.10 to 0.11, and based on what they’ve been talking about for 0.12, I look forward to putting it through the benchmark tests.

The Basho Bench tool was well-suited for this task, and was flexible enough to let us model the scenarios we needed to feel confident in our deployment.

The Basho team has assisted us in the set up of our cluster and in the design of these tests. Their support has been excellent and they take are highly attentive to our feature and bug requests. We are currently spending a lot of time and attention on our Map Reduce queries right now, and we are working closely with them to make further enhancements and improvements there. I’m hoping that I might be able to steal a few hours from some JS platform people during our upcoming work week to take a stab at hacking JeagerMonkey into Riak. >:)

5 Responses to “Benchmarking Riak for the Mozilla Test Pilot Project”

  1. […] has a 4 machine, 64 core Riak cluster in […]

  2. on 27 Aug 2010 at 1:06 am Dmitry

    Quite interesting results…

    I’ll appreciate if you reveal you storage architecture: what is hw and os level configuration?

    Secondly, I am really curious where you hit a limit. For instance looking on your 2.b test. It claims 3992 TPS that corresponds about 62.4 per CPU core. An average latency 3.43ms tell me that your CPU is under utilized (you have not included a metrics of HW demand during tests) but I can estimate CPU utilization was about 20 – 25%. You should be able to push your system up to 240 – 260 TPS per CPU core unless the cluster is bound to other resources. The logical question is what was abound factor during your test and what was disk, nic utilization?

  3. […] still suffers from the same performance characteristics around disk access as MongoDB – once you have to page to disk, operations become slow and […]

  4. […] still suffers from the same performance characteristics around disk access as MongoDB – once you have to page to disk, operations become slow and […]

  5. […] Since Dynamo is only available inside of Amazon, how are we supposed to work with it ourselves? Riak is a clone of Dynamo that meets our need for a shopping cart. It’s a key/value database; it’s fault tolerant, and it’s fast. […]

Trackback URI | Comments RSS

Leave a Reply

You must be logged in to post a comment.