Multivariate Testing: Promises and Pitfalls for High-Traffic Websites

Multivariate Testing for High-traffic Websites

Multivariate testing offers high-traffic websites the ability to find the right combination of features and creative ideas to maximize conversion rates. However, it is not sufficient to simply throw a bunch of ideas into a pot and start testing. This article answers the question, What is a multivariate test?, explains the advantages and pitfalls of multivariate testing, and offers some new ideas for the future.

If you run a relatively high-traffic site, consider this question: Will I profit from running multivariate tests?

Before we dive into the question, let’s be sure to define the terms. I’ll talk about the dangers of doing an multivariate test (MVT) and when you should consider using them.

What Is Multivariate Testing?

Multivariate testing is a technique for testing a hypothesis in which multiple variables are modified.

Multivariate testing is distinct from A/B testing in that it involves the simultaneous observation and analysis of more than one outcome variable. Instead of measuring A against B, you are measuring A, B, C, D & E all at once.

Whereas A/B testing is typically used to measure the effect of more substantial changes, multivariate testing is often used to measure the incremental effect of numerous changes at once.

This process can be further subdivided in a number of ways, which we’ll discuss in the next section.

Multivariate, Multi-variant or Multi-variable

For this article, we are focusing on a specific way of testing in which elements are changed on a webpage. Before we dive into our discussion of multivariate testing, we should identify what we are talking about and what we are not talking about.

One of the frequent items tested is landing page headlines. Getting the headline on the page can significantly increase conversion rates for your landing pages. When testing a headline, we often come up with several variants of the words, testing them individually to see which generates the best results.

A multi-variant tests multiple variants of one variable or element.
A multi-variant test with multiple variants of one variable or element.

This is a multi-variant test. It changes one thing–one variable–but provides a number of different variants of that element.

Now suppose we thought we could improve one of our landing pages by changing the “hero image” as well as the headline. We would test our original version against a new page that changed both the image and the headline.

An example of a multi-variable test. Here we are testing the control against a variation with two changes, or two variables.
An example of a multi-variable test. Here we are testing the control against a variation with two changes, or two variables.

This is a multi-variable test. The image is one variable and the headline is a second variable. Technically, this is an AB test with two variables changing. If Variation (B) generated more leads, we wouldn’t know if the image or the headline were the biggest contributors to the increase in conversions.

To thoroughly test all combinations, we would want to produce multiple variations, each with a different variant of the variable.

Two variable with two variants each yield four page variations in this multivariate testing example.
Two variables with two variants each yield four page variations in this multivariate testing example.

In the image above, we have four variations of the page, based on two variables (image and headline) each having two variants. Two variables times two variants each equals four variations.

Confused yet?

A multivariate test, then, is a test that tests multiple variants of variables found on a page or website.

To expand on our example, we might want to find the right hero image and headline on our landing page. Here’s the control:

The Control in our multivariate test is the current page design.
The Control in our multivariate test is the current page design.

We will propose two additional variants of the hero image–for a total of three variants including the control–and additional two variants of the headline, three including the control.

Here are the three images:

We want to vary the hero image. This variable in our test has three variants.
We want to vary the hero image on our page. This variable in our test has three variants.

Here are three headlines, including the existing one.

  1. Debt Relief that Works
  2. Free Yourself from the Burden of Debt
  3. Get Relief from Debt

A true multivariate test will test all combinations. Given two variables with three variants each, we would expect nine possible combinations: three images x three headlines.

Here’s another example that will help you understand how variables, variants and variations relate. An ecommerce company believes that visitors are not completing their checkout process for any of three reasons:

  1. The return policy is not visible
  2. They are required to register for an account
  3. They don’t have security trust symbols on the pages

While these all seem like reasonable things to place in a shopping cart, sometimes they can work against you. Providing this kind of information may make a page cluttered and increase abandonment of the checkout process.

The only way to know is to test.

How many variables do we have here? We have three: privacy policy, registration and security symbols.

How many variants do we have? We have two of each variable, one variant in which the item is shown and one variant in which it is not shown.

This is 2 x 2 x 2, or eight combinations. If we had three different security trust symbols to choose from, we would have four variants, three choices and none. That is 2 x 2 x 4, or sixteen combinations.

We’ll continue to use this example as we explore multivariate testing.

Why Multivariate Testing Isn’t Valuable In Most Scenarios

A multivariate test seeks to test every possible combination of variants for a website given one or more variables. 

If we ran an MVT for our ecommerce checkout example above, it would look something like this:

Variations multiply with multivariate tests requiring more traffic and conversions.
Variations multiply with multivariate tests requiring more traffic and conversions.

There are many reasons that multivariate testing is often the wrong choice for a given business, but today, I’m going to focus on five. These are the five reasons multivariate tests (MVTs) are not worth doing compared to A/B/n tests:

  1. A lack of time or traffic
  2. Crazy (and crappy) combinations
  3. Burning up precious resources
  4. Missing out on the learning process
  5. Failing to use MVT as a part of a system

Let’s take a closer look at each reason.

1. Multivariate Tests Take a Long Time or a Whole Lot of Traffic

Traffic to each variation is a small percentage of the overall traffic. This means that it takes longer to run an MVT. Lower traffic means it takes longer to reach statistical significance, and we can’t believe the data until we reach this magical place.

Statistical Significance is the point at which we are confident that the results reported in a test will be seen in the future, that winning variations will deliver more conversions and losing variations would deliver fewer conversions over time. Read 2 Questions That Will Make You A Statistically Significant Marketer or hear the audio.

Furthermore, statistical significance is really measured by the number of successful transactions you process.

For example, MXToolbox offers free tools for IT people who are managing email servers, DNS servers and more. They also offer paid plans with more advanced features. MXToolbox gets millions of visitors every month, and many of them purchase the paid plans. Even with millions of visits, they don’t have enough transactions to justify multivariate testing.

It’s not just about traffic.

This is why MVTs can be done only on sites with a great deal of traffic and transactions. If not the tests take a long time run.

2. Variations Multiply Like Rabbits

As we saw, just three variables with two variants resulted in eight variations, and adding two more security trust symbols to the mix brought this to sixteen combinations. Traffic to each variation would be reduced to just 6.25%.

Multivariate testing tools, like VWO and Optimizely offer options to test a sample of combinations — called Partial, or Fractional Factorial testing — instead of testing them all, which is called Full Factorial testing. We won’t dive into the mathematics of Full Factorial and Partial Factorial tests. It gets a little messy. It’s sufficient to know that partial factorial (fractional factorial) testing may introduce inaccuracies that foil your tests.

What’s important is that more variations mean larger errors… because statistics.

Every time you add another variation to an AB test, you increase the margin of error for the test slightly. As a rule, Conversion Sciences allows no more than six variations for any AB test because the margin of error becomes a problem.

In an AB test with two variations, we may be able to reach statistical significance in two weeks, and bank a 10% increase in conversions. However, in a test with six variations, we may have to run for four weeks before we can believe that the 10% lift is real. The margin of error is larger with six variations requiring more time to reach statistical significance.

Now think about a multivariate test with dozens of variations. Larger and larger margins of error mean the need for even more traffic and some special calculations to ensure we can believe our results aren’t just random.

Ultimately, most of these variations aren’t worth testing.

All eight variations in our example make sense together. As you add variations, however you can end up with some crazy combinations.

Picture this:

It’s pouring down rain. You are camping with your son.

While huddled in your tent, you fire up your phone’s browser to find a place to stay. While flipping through your search results on Google, your son proclaims over your shoulder, “That one has a buffet! Let’s go there, Dad!”

Ugh.

The last time he ate at an all-you-can-eat buffet, he was stuck in the restroom for an hour. Not a pretty picture.

Then again, neither is staying out in the wretched weather. So you click to check out the site.

Something is off.

The website’s headline says, “All you can eat buffet.” But nothing else seems to match. The main picture is two smiling people at the front desk, ready to check you in.

As you scroll to the bottom, the button reads “Book Your Massage Today”.

Is this some kind of joke?

As strange as this scenario sounds, one problem with MVTs is that you will get combinations like this example that simply don’t make sense.

This leaves you with two possibilities:

  1. Start losing your customers to variations you should not even test (not recommended).
  2. Spend some of your time making sure each variation makes sense together.

The second option will take more time and restrict your creativity. But even worse, now you need more traffic in order for your test.

With an A/B/n test, you pick and choose which tests you like and which to exclude.

Some may argue it can be time-consuming to create each A/B/n variation while a multivariate test is an easy way to test all variations at once.

Think of a multivariate test as a system that automatically creates all possible combinations to help you find the best outcome. So on the surface, it sounds appealing.

But as you dig into what’s really going on, you may think twice before using an MVT.

3. Losing Variations are Expensive

Optimization testing can be fun. The chance of a breakthrough discovery that could make you thousands of dollars is quite appealing. Unfortunately, those variations that underperform the Control reduce the number of completed transactions and fewer transactions means less revenue.

Every test — AB or Multivariate — has a built in cost.

Ideally, we would let losing variations run their course. Statistically, there is a chance they will turn around and be a big winner when we reach statistical significance. At Conversion Sciences, we monitor tests to see if any variations turn south. If a losing variation is costing us too many conversions, we’ll stop it before it reaches statistical significance. This is how we control the cost of testing.

This has two advantages.

  1. We can control the “cost” of an AB test.
  2. We can direct more traffic to the other variations, meaning the test will take less time to reach significance.

When tests run faster, we can test more frequently.

On the other hand, multivariate tests run through all variations, or a large sample of variations. Losers run to statistical significance and this can be very expensive.

Lars Lofgren, former Director of Growth at KISSmetrics, mentioned that if a test drops below a 10% lift, you should kill it. Here’s why:

What would you rather have?

  • A confirmed 5% winner that took 6 months to reach
  • A 20% winner after cycling through 6-12 tests in that same 6 month period

Forget that 5% win, give me the 20%!

So the longer we let a test run, the higher that our opportunity costs start to stack up. If we wait too long, we’re forgoing serious wins that we could of found by launching other tests.

If a test drops below a 10% lift, it’s now too small to matter. Kill it. Shut it down and move on to your next test.

Keeping track of all the MVT variations isn’t easy to do (and also is time consuming). But time spent on sub-par tests are not the only resource you lose either.

4. It’s Harder to Learn from Multivariate Tests

Optimization works best when you learn why your customers behave the way that they do. Perhaps with an MVT you may find the best performing combination, but what have you learned?

When you run your tests all at one time, you miss out on understanding your audience.

Let’s take the example from the beginning of this article. Suppose our multivariate test reported that this was the winning combination:

If this combination wins, can we tell why?
If this combination wins, can we know why?

What can we deduce from this? Which element was most important to our visitors? The return policy? Removing the registration? Adding trust symbols?

And why does it matter?

For starters, it makes it easier to come up with good test hypotheses later on. If we knew that adding trust symbols was the biggest influence, we might decide to add even more trust symbols to the page. Unfortunately, we don’t know.

When you learn something from an experiment, you can apply that concept to other elements of your website. If we knew that the return policy was a major factor, we might try adding the return policy on all pages. We might even test adding the return to our promotional emails.

Testing is not just about finding more revenue. It is about understanding your visitors. This is a problem for multivariate tests.

5. Seeing What Sticks Is Not An Effective Testing System

Multivariate tests are seductive. They can tempt you into testing lots of things, just because you can. This isn’t really testing. It’s fishing. Throwing a bunch of ideas into a multivariate test means you’re testing a lot of unnecessary hypotheses.

Testing follows the Scientific Method:

  1. Research the problem.
  2. Develop hypotheses.
  3. Select the most likely hypotheses.
  4. Design experiments to test your hypotheses.
  5. Run the experiment in a controlled environment.
  6. Evaluate your results.
  7. Develop new hypotheses based on your learnings.

The danger of a multivariate test is that you skip steps 3, 4 and 7, that you:

  1. Research the problem
  2. Develop hypotheses.
  3. Throw them into the MVT blender
  4. See what happens.

Andrew Anderson said it well,

The question is never what can you do, but what SHOULD you do.

Just because I can test a massive amount of permutations does not mean that I am being efficient or getting the return on my efforts that I should. We can’t just ignore the context of the output to make you feel better about your results.

You will get a result no matter what you do, the trick is constantly getting better results for fewer resources.

When used with the scientific method, an A/B/n test can give you the direction you need to continually optimize your website.

Machine Learning and Multivariate Testing

Multivariate testing is now getting a hand from artificial intelligence. For decades, a kind of program called a neural network has allowed computers to learn as they collect data, making decisions that are more accurate than humans using less data. These neural networks have only been practical in solving very specific kinds of problems.

Now, software company Sentient Ascend has brought a kind of neural network into the world of multivariate testing. It’s called an evolutionary neural network or a genetic neural network. This approach uses machine learning to sort through possible variations, selecting what to test so that we don’t have to test all combinations.

These evolutionary algorithms follow branches of patterns through the fabric of possible variations, learning which are most likely to lead to the highest converting combination. Poor performing branches are pruned in favor of more likely winners. Over time, the highest performer emerges and can be captured as the new control.

These algorithms also introduce mutations. Variants that were pruned away earlier are reintroduced into the combinations to see if they might be successful in better-performing combinations.

This organic approach promises results faster and with less traffic.

Neural networks allow testing tools to learn what combinations will work without testing all multivariate combinations.
Evolutionary neural networks allow testing tools to learn what combinations will work without testing all multivariate combinations.

With machine learning, websites that had too little traffic for pure multivariate testing can seriously consider it as an option.

Final Thoughts: Is There Ever a Case For doing MVTs?

There are instances when introducing many variables is sometimes difficult to avoid or better to focus on.

Chris Goward of WiderFunnel gives four advantages to doing MVTs over A/B/n tests:

  1. Easily isolate many small page elements and measure their individual effects on conversion rate
  2. Measure interaction effects between independent elements to find compound effects
  3. Follow a more conservative path of incremental conversion rate improvement
  4. Facilitate interesting statistical analysis of interaction effects

He later admits, “At WiderFunnel, we run one Multivariate Test for every 8-10 A/B/n Test Rounds.”

Both methods are valuable learning tools.

What is Your Experience?

It is a bit of heated subject between optimization experts. I’d be curious to hear from you about your ideas and experience on what matters the most.

Please leave a comment.

Categories: AB Testing
Tags: