Geeking Out on the Snippet

Testing was conducted in close partnership with the snippet team, in particular Jean Collings, Online Marketing Manager who wrangles the day-to-day life of the snippet. Jean reviewed the content of this post and provided key insights.

I want to give a more detailed accounting of how we tested and deployed the snippet during Mozilla’s Year-End Fundraising campaign. (For background, read my previous post about early testing, or if you don’t even know what a ‘snippet’ is then start with this “What is the Snippet?” post.)

I met with the snippet team several times before we began testing. We agreed that the goal would be to leverage the snippet for year-end fundraising in a way that would educate, engage and delight our users. In other words, the snippet is a chance to put Mozilla’s non-profit roots in front of millions of people who use Firefox every day. Ideally, some of those folks would feel compelled to give during our campaign.

In the end, the snippet clocked 961,000,000 impressions for users of the EN (English) version of the Firefox browser. Non-EN versions of the browser served another 72,000,000 impressions – primarily in DE (Germany), FR (France), and PTBR (Brazil -Portuguese).

The campaign lasted just over five weeks, with about 33,000,000 daily impressions on average. Impressions tended to dip a little on the weekends compared to weekdays, and impressions also fell on major holidays (i.e. Christmas eve and Christmas Day).

In total, we raised more than $845,000 through the snippet from more than 90,000 individual donors during the campaign. That is an increase of 1,121.00% compared to 2012. That’s a huge leap, and it was the result of intensive donor funnel testing and optimization in close partnership with my colleague Jean Collings, who manages the snippet page.

The snippet donor funnel looks like this:

snippet view   >   click snippet   >   land on a donation form   >   complete transaction

Testing. Why Bother?

Testing mattered because of the snippet’s incredibly high traffic volume. During the hour of heaviest snippet traffic on December 30th, we received over $590 in donations through the snippet per minute (that’s nearly $10 a second!). In that hour alone we raised over $35,000 USD. At that volume, even tiny changes in the user funnel could mean thousands more dollars, or thousands of dollars lost. That’s why we tested more than 40 different variations of the snippet, starting on November 22nd. I knew the last week in December would be the highest traffic period, and I wanted to conduct enough tests ahead of time to identify the highest performing snippet to launch during that final critical week.

Now, the quirks.

In my previous post I outlined the basics of what we tested (icon, text; etc) and touched on some unique challenges that make testing the snippet less than ideal. I can’t talk in detail about how we conducted tests to optimize the snippet funnel without also explaining how the testing itself was made less rigorous due to a number of quirks. The snippet was not built for this kind of rapid iterative testing and optimization, and that had significant implications for our testing process and campaign results.

Quirk 1: Traffic allocation

The year-end campaign snippet was one of several snippets slotted for November and December. The EOY campaign was allotted 40% of US-EN snippet traffic from November 22 to December 24th and other snippets were seen by 60% of traffic. From December 25 – 31, the EOY fundraising snippet was shown to 100% of that traffic.

Implications: This was the most snippet traffic ever allocated  for a Mozilla fundraising campaign. Due to the snippet’s back-end construction, we were able to test a maximum of four variants at once. With four variants, it took roughly twice as long to reach statistical significance than when we tested just two variants at once (which took half the time).

Quirk 2: Turning a snippet “off” or “on” actually took up to 48 hours

Once we had results, swapping snippets in or out based on performance was not as simple as flipping a switch. Once turned “on” a new snippet took some time to “populate” in Firefox browsers – up to 24 hours to reach its full traffic allocation. This also happened when turning a losing variant “off.” Snippets could still receive some impressions 24 to 48 hours after flipping the “off” switch.

In more technical terms, every computer running the Firefox browser will ping the snippet server every 24-48 hours. Some computers will display new snippets faster than others to the user.

You can see below that even 48 hours after being disabled, impressions were still registering for four poorer-performing snippets turned off simultaneously on 11/24:

On 11/26 – 48 hours after being turned off due to poor performance – all four of these snippets still received over 1.9 million impressions.

Implications: The biggest implication for this quirk was that snippets that underperformed still received millions of impressions. If the better performing snippets would have been given those impressions instead, we would have raised thousands more dollars for Mozilla.

Quirk 3: Impressions and CTR data lag

In order to do analysis, we had to collect data from three different sources and combine them manually. Impressions on the snippet are collected from the internal Firefox metrics team. Click through rates (CTRs) are collected in Google Analytics. Donation data were collected in a donations platform called Blue State Digital. Basically Jean and I pasted our respective data into a shared spreadsheet. This practice was made more complex because impressions and CTR data were not available in real-time, but donations data were. We also had to pay close attention to timestamps and aligned all data by blocks of time in PST, since two data sources were in PST, and a third was in GMT. Test results were delayed 24 – 48 hours while waiting for impressions and CTR data which were available at midnight for the previous 24-hour period.

Implications: Combined with Quirk 2, all this meant the testing cycle was tricky to keep straight. It sometimes took 48 – 72 hours to receive, collect, and analyze data from one set of snippet tests. Snippets that were poor performing stayed live until the data told us to take them down. A quicker test cycle would allow us to put higher performing snippets in place faster, and increase donations.

Quirk 4: Fatigue

Firefox users got understandably blasé after seeing the same snippet for several days. Click through rates for a single snippet variant would predictably drop over the course of about a week, which meant donations would also decrease. Snippet CTR would fall about 0.05% – 0.10% per 24-hour period.

Implications: The testing regimen helped us stay ahead of CTR fatigue. We changed icons and text with each new set of snippets to test, and that also boosted CTRs. We ended up settling into a testing rhythm that put fresh snippets into constant rotation. For the most part, we kept a truly low CTR at bay. In order to compare snippet results “apples to apples” we primarily focused on data from the first full 24 hours a new test was live.

Here is what this fatigue looks like for most of the snippets we tested – a slow decline:

 That doesn’t seem like a very steep decline, except when you consider the volume is enormous (about 1 – 2 million impressions per hour depending on the time of day). Even reduction of just 0.10% CTR meant losing thousands of dollars in donations.

Here is a comparison of the daily average CTR during the year-end campaign (green) with the average snippet CTR (orange). The red line is the overall average CTR for the entire year-end campaign:

Between 12/9 and 12/19 CTR dipped due to a number of factors. First, we were testing a particularly low performing block of snippet variations in that time (using a silver coin icon instead of red, among other things). Their performance was not staying ahead of CTR fatigue. Second, I was traveling for holidays and so analysis took longer than at any other time in our testing cycle (I was a one-woman show when it came to analyzing all the data). Third, we realized that in order to stay ahead of CTR fatigue we actually needed more fresh icon designs, but Mozilla staff were all on holiday by this time. We ended up recycling earlier “heart” and “lock” icons later on in the month, which helped boost CTR somewhat.

Quirk 5: Simultaneous landing page testing

The donation form on the landing page is an important part of the donor funnel. Though we focused a lot of energy on optimizing the snippet, we did work on improving the conversion rate of the landing page. Here are two examples of landing page tests:

  1. Test One: We changed the amount array

  a. Default version (default amount array)

  b. Test version (New amount array)

2.  Test Two: We simplified the form by eliminating three page elements that were not required

  a. Default version

  b. Test version (eliminated page title, street2, and PayPal logo)

Findings for Test One were inconclusive, changing the amount array didn’t significantly improve conversion or total revenue through the donation form. (Occasionally you may get a lower conversion, but a higher average gift.) Test Two yielded a higher conversion rate, so we implemented the new form for all subsequent snippets. (Landing page optimization could be a whole post in and of itself.)

Implications: In order to maintain the integrity of A/B landing page tests, we had to keep the first part of the donor funnel constant. We sent donor traffic through two identical snippets that led to two separate landing pages – one control, and one test. That kept results apples-to-apples, since different snippets could influence donor behavior on the landing page. This meant we had to further divide our allotted snippet variant traffic across landing page tests. We couldn’t always test snippets at the same time we were testing landing pages.

We Learned a Lot

Though we faced some challenges and had to navigate through the snippet’s quirks, we were doing the best testing possible. We improvised to keep the tests meaningful, even if they were not perfect. Of course I would have preferred to base revenue-critical decisions on data that were much more rigorous. It was the best we could do given resources and quirks, and it resulted in increased revenue compared to 2012. Could we improve? Definitely given all the knowledge we now have.

Before we launched this testing regimen on November 22, 2013 the snippet hadn’t been the subject of such an intense series of ongoing, iterative testing. We learned a lot as we went. For example, we have definitive data that an animated icon results in higher CTRs:

We found that making an entire sentence a hotlink vs. a two-word text link seems to improve CTR (though we should do more testing of this):

Many of our findings were long held “hunches” but testing provided certainty. We also now have a strong set of metrics to benchmark future snippet performance – especially for fundraising campaigns.

This is an example of a later snippet that was among the higher-performing (though it’s not here, the lock was animated):

winning_snippet

The snippet below yielded over $109,000 in the final 72 hours of the campaign (December 29 – 31). Its peak CTR during the time it was live was 0.39% (its low at the stroke of midnight PST December 31st was 0.20%):

blue box snippet

You can see almost all of the variations we tested, including EN, FR, DE, and PT-BR locales here.

Jean and I considered what would make this testing process better. Here is what our wish list would look like:

  1. Access real-time impressions and click data in addition to donor data.
  2. Match timestamps across data automatigically.
  3. The ability to turn snippets “off” and “on” in real-time (or at least cycle in or out faster).
  4. A CTR and conversion dashboard that would pull all data in from disparate sources (to replace manual copy/paste from various data sources). Jean says several Firefox teams are using Tableau, so we could possibly test using that platform.
  5. Conduct regular testing over the course of the year, instead of beginning in late November. This is something Wikipedia does. (They raised $18.7 million USD in December 2013 during their own year-end campaign.)
  6. The ability to do multivariate testing in addition to simpler A/B testing. That would mean we could reduce the number of tests and test faster. (What’s the difference?)
  7. The ability to turn off the fundraising snippet for supporters who donate so they don’t continue to see it for the duration of the campaign.

Some of these ideas would definitely be easier to implement than others, but none of them are simple. The snippet is a wily and complex little piece of web real estate.

If nothing on the wishlist above were to come to pass, there are still some important ways we could make testing stronger, though still imperfect, using our existing configurations. If we implement these three things, we would improve engagement through the snippet during year-end fundraising and raise more revenue:

Test throughout the year. Wikipedia conducts tests of their fundraising banner to a small percentage of users on a regular basis throughout the year. That allows us to finish testing by November 22nd and use the precious snippet traffic we have available during those critical five weeks to show only our highest performing snippets.

(Related) Build a large collection of tested and proven year-end snippets before the campaign launches. A library of different variations at-the-ready during year-end peak weeks would allow us to stay ahead of CTR fatigue. This likely means cycling through our highest performing snippets every 4-7 days during the final month of the year when our donation volume is at its maximum. This means, for instance, having more icons, so we don’t have to recycle.

Increase staff resources for analysis. Adding even one additional person to assist with analytics during the critical testing and execution phases would improve rigor and fill gaps (such as when I’m traveling). Adding an analytics pro to the small team focused on the snippet would definitely improve returns.

The snippet wrangling was fast-paced and exciting for this fundraising geek, but it also showed me the potential of the snippet for engaging Firefox fans in Mozilla’s mission. It was also a real joy working closely with Jean and the rest of the snippet team.

Bring on 2014.