Machine learning made easy

A/B testing with bayesian bandits in Google Analytics

A/B testing is a way to optimize a web page. Half of visitors see one version, the other half another, so you can tell which version is more conducive to your goal - for example selling something. Since June 2013 A/B testing can be conveniently done with Google Analytics. Here’s how.

This article is not quite about machine learning. If you’re not interested in testing, scroll down to the bayesian bandits section.

Google Content Experiments

We remember Google Website Optimizer from a few years ago. It wasn’t exactly user friendly or slick, but it felt solid and did the job. Unfortunately, at one point in time Google pulled the plug, leaving Genetify as a sole free (and open source) tool for multivariate testing. Multivariate means testing a few elements on a page simultanously.

At that time they launched Content Experiments in Google Analytics, but it was a giant step backward. Content experiments were very primitive and only allowed rudimentary A/B split testing. It is still the case if you want to use only the “point and click” interface in Analytics, but there’s been an interesting development: in June 2013, Google announced Content Experiments API.

Essentially, now you can have a full-fledged multivariate testing tool akin to Website Optimizer. The catch is that you need to implement content variations yourself in JavaScript. It’s pretty straightforward and is well described in documentation - they call it a browser-only implementation (if you want to use a server-side mechanism, there’s nothing to stop you).

Why test?

These days, everybody knows about the importance of getting traffic: if you have a web page, the point is that people visit it. Hence search engine optimization, Adwords and other forms of advertising.

But when those visitors come - is your site built in such a way that they stay and do what you would like them to do?

If you run the shop, you want your visitors to place an order. That’s very easy to measure and can be done for example with ecommerce tracking in Google Analytics. In case of a blog, you’d want them to read, return and read some more. This is not as clear-cut goal, but still you can measure average time on page, page views, returning visits and so on.

The setup

First you setup the experiment using the normal interface in GA, only skipping the part where you put variation URLs - just enter some subpages there. Also, don’t validate in step five, because you’ll be using your custom code. Just launch the experiment without validation.

Now you put some Javascript in your page source:

  1. load the content experiments script from Google, providing it the experiment ID from the GA visual interface:

    <script src="//"></script>

  2. get a variation number it selects: var variation = cxApi.chooseVariation();
  3. put your normal Analytics snippet after this

Now the only thing left is actually showing the selected variation.

Showing the selected variation

Suppose that you want to vary contents of a DIV. You might do this by including both the original and the alternative version in HTML, but only displaying the original by default:

<div id="div_original">Please consider purchasing our product.</div>
<div id="div_variation_1" style="display: none;">What are you waiting for? Buy NOW!!!</div>

Then if GA decides it wants to show the variation to a given user, you switch the divs: the original becomes hidden and the variation shows. Here’s an example JavaScript snippet to put at the end of HTML BODY:

<script type="text/javascript">
    // original is 0
    if ( variation == 1 ) {
        document.getElementById( 'div_original' ).style.display = 'none';
        document.getElementById( 'div_variation_1' ).style.display = 'block';

Not exactly rocket science, is it?

Bayesian bandits

When a user first visits the page, Analytics decides which variation to show. The selected variation is stored in a cookie so when the user opens a next page or comes back tomorrow he is shown the same version.

How Google decides which variation to select? Either randomly or by an algorithm called a multi-armed bandit. The name comes from one-armed bandits, machines found in casinos. The difference is that a multi-arm bandit has many arms and you need to decide which one to pull. Different arms are supposed to produce success with different probabilities, so it’s a matter of finding the best one.

If you don’t get the idea now, you will get it in a moment. Here’s a link to the page with an interactive visualization. Go there and click Start Simulation.

Bayesian bandits

As you can see, the algorithm is remarkably good at finding the right arm, provided that the probabilities don’t change during the experiment. In case of web testing they are rather unlikely to change during the experiment, so that makes bandits a good solution for selecting variations to show. It’s especially true when you have many and expect them to differ substantially in terms of effect - a bandit will find the best quickly and concentrate on them.

If you have a two or three similiar performing variations, you might want to split the visitors evenly - it’s easier to compare the resulting numbers this way.

Note that Google uses bandits to decide which ads to show - the underlying problem is very similiar. Serving ads is probably their most important business, so they’re presumably rather good at that.

Metrics to optimize

How do you measure success, then? In a nutshell, however you want. You can either use one of the built-in metrics (including page views, bounce rate and time on page) or a custom goal defined in Analytics. This means thay you can optimize some basic metrics with a click of a button, or, with some effort, pretty much whatever you want.