## A/B Testing Statistics: An Intuitive Guide For Non-Mathematicians

A/B testing statistics made simple. A guide that will clear up some of the more confusing concepts while providing you with a solid framework to AB test effectively.

Here’s the deal. You simply cannot A/B test effectively without a sound understanding of A/B testing statistics.

And while there has been a lot of exceptional content written on AB testing statistics, I’ve found that most of these articles are either overly simplistic or they get very complex without anchoring each concept to a bigger picture.

Today, I’m going to explain the statistics of AB testing within a linear, easy-to-follow narrative. It will cover everything you need to use AB testing software effectively.

You might have been told that plugging a few numbers into a statistical significance calculator is enough to validate a test. Or perhaps you see the green “test is significant” checkmark popup on your testing dashboard and immediately begin preparing the success reports for your boss.

In other words, you might know just enough about split testing statistics to dupe yourself into making major errors, and that’s exactly what I’m hoping to save you from today.

Here’s my best attempt at making statistics intuitive.

## Why Statistics Are So Important To A/B Testing

The first question that has to be asked is “Why are statistics important to AB testing?”

The answer to that questions is that AB testing is inherently a statistics-based process. The two are inseparable from each other.

An AB test is an example of statistical hypothesis testing, a process whereby a hypothesis is made about the relationship between two data sets and those data sets are then compared against each other to determine if there is a statistically significant relationship or not.

To put this in more practical terms, a prediction is made that Page Variation #B will perform better than Page Variation #A, and then data sets from both pages are observed and compared to determine if Page Variation #B is a statistically significant improvement over Page Variation #A.

This process is an example of statistical hypothesis testing.

But that’s not the whole story. The point of AB testing has absolutely nothing to do with how variations #A or #B perform. We don’t care about that.

What we care about is how our page will ultimately perform with our entire audience.

And from this birdseye view, the answer to our original question is that statistical analysis is our best tool for predicting outcomes we don’t know using information we do know.

For example, we have no way of knowing with 100% accuracy how the next 100,000 people who visit our website will behave. That is information we cannot know today, and if we were to wait o until those 100,000 people visited our site, it would be too late to optimize their experience.

What we can do is observe the next 1,000 people who visit our site and then use statistical analysis to predict how the following 99,000 will behave.

If we set things up properly, we can make that prediction with incredible accuracy, which allows us to optimize how we interact with those 99,000 visitors. This is why AB testing can be so valuable to businesses.

In short, statistical analysis allows us to use information we know to predict outcomes we don’t know with a reasonable level of accuracy.

## The Complexities Of Sampling, Simplified

That seems fairly straightforward, so where does it get complicated?

The complexities arrive in all the ways a given “sample” can inaccurately represent the overall “population”, and all the things we have to do to ensure that our sample can accurately represent the population.

Let’s define some terminology real quick.

The “population” is the group we want information about. It’s the next 100,000 visitors in my previous example. When we’re testing a webpage, the true population is every future individual who will visit that page.

The “sample” is a small portion of the larger population. It’s the first 1,000 visitors we observe in my previous example.

In a perfect world, the sample would be 100% representative of the overall population.

For example:

Let’s say 10,000 out of those 100,000 visitors are going to ultimately convert into sales. Our true conversion rate would then be 10%.

In a tester’s perfect world, the mean (average) conversion rate of any sample(s) we select from the population would always be identical to the population’s true conversion rate. In other words, if you selected a sample of 10 visitors, 1 of them (10%) would buy, and if you selected a sample of 100 visitors, then 10 would buy.

But that’s not how things work in real life.

In real life, you might have only 2 out of the first 100 buy or you might have 20… or even zero. You could have a single purchase from Monday through Friday and then 30 on Saturday.

This variability across samples is expressed as a unit called the “variance”, which measures how far a random sample can differ from the true mean (average).

The Freakonomics podcast makes an excellent point about what “random” really is. If you have one person flip a coin 100 times, you would have a random list of heads or tails with a high variance.

If we write these results down, we would expect to see several examples of long streaks, five or seven or even ten heads in a row. When we think of randomness, we imagine that these streaks would be rare. Statistically, they are quite possible in such a dataset with high variance.

The higher the variance, the more variable the mean will be across samples. Variance is, in some ways, the reason statistical analysis isn’t a simple process. It’s the reason I need to write an article like this in the first place.

So it would not be impossible to take a sample of ten results that contain one of these streaks. This would certainly not be representative of the entire 100 flips of the coin, however.

Fortunately, we have a phenomenon that helps us account for variance called “regression toward the mean”.

Regression toward the mean is “the phenomenon that if a variable is extreme on its first measurement, it will tend to be closer to the average on its second measurement.”

Ultimately, this ensures that as we continue increasing the sample size and the length of observation, the mean of our observations will get closer and closer to the true mean of the population.

In other words, if we test a big enough sample for a sufficient length of time, we will get accurate “enough” results.

So what do I mean by accurate “enough”?

## Understanding Confidence Intervals & Margin of Error

In order to compare two pages against each other in an Ab test, we have to first collect data on each page individually.

Typically, whatever AB testing tool you are using will automatically handle this for you, but there are some important details that can affect how you interpret results, and this is the foundation of statistical hypothesis testing, so I want to go ahead and cover this part of the process.

Let’s say you test your original page with 3,662 visitors and get 378 conversions. What is the conversion rate?

You are probably tempted to say 10.3%, but that’s inaccurate. 10.3% is simply the mean of our sample. There’s a lot more to the story.

To understand the full story, we need to understand two key terms:

1. Confidence Interval
2. Margin of Error

You may have seen something like this before in your split testing dashboard.

The original page above has a conversion rate of 10.3% plus or minus 1.0%. The 10.3% conversion rate value is the mean. The ± 1.0 % is the margin for error, and this gives us a confidence interval spanning from 9.3% to 11.3%.

10.3% ± 1.0 % at 95% confidence is our actual conversion rate for this page.

What we are saying here is that we are 95% confident that the true mean of this page is between 9.3% and 11.3%. From another angle, we are saying that if we were to take 20 total samples, we can know with complete certainty that the sample conversion rate would fall between 9.3% and 11.3% in at least 19 of those samples.

The confidence interval is an observed range in which a given percentage of test outcomes fall. We manually select our desired confidence level at the beginning of our test, and the size of the sample we need is based on our desired confidence level.

The range of our confidence level is then calculated using the mean and the margin of error.

The easiest way to demonstrate this with a visual.

The confidence level is decided upon ahead of time and based on direct observation. There is no prediction involved. In the above example, we are saying that 19 out of every 20 samples tested WILL, with 100% certainty, have an observed mean between 9.3% and 11.3%.

The upper bound of the confidence interval is found by adding the margin of error to the mean. The lower bound is found by subtracting the margin of error from the mean.

The margin for error is a function of the standard deviation, which is a function of the variance. Really all you need to know is that all of these terms are measures of variability across samples.

Confidence levels are often confused with significance levels (which we’ll discuss in the next section) due to the fact that the significance level is set based on the confidence level, usually at 95%.

You can set the confidence level to be whatever you like. If you want 99% certainty, you can achieve it, BUT it will require a significantly larger sample size. As the chart below demonstrates, diminishing returns make 99% impractical for most marketers, and 95% or even 90% is often used instead for a cost-efficient level of accuracy.

In high-stakes scenarios (live-saving medicine, for example), testers will often use 99% confidence intervals, but for the purposes of the typical CRO specialist, 95% is almost always sufficient.

Advanced testing tools will use this process to measure the sample conversion rate for both the original page AND Variation B, so it’s not something you are really going to ever have to calculate on your own, but this is how our process starts, and as we’ll see in a bit, it can impact how we compare the performance of our pages.

Once we have our conversion rates for both the pages we are testing against each other, we use statistical hypothesis testing to compare these pages and determine whether the difference is statistically significant.

### Important Note About Confidence Intervals

It’s important to understand the confidence levels your AB testing tools are using and to keep an eye on the confidence intervals of your pages’ conversion rates.

If the confidence intervals of your original page and Variation B overlap, you need to keep testing even if your testing tool is saying that one is a statistically significant winner.

## Significance, Errors, & How To Achieve The Former While Avoiding The Latter

Remember, our goal here isn’t to identify the true conversion rate of our population. That’s impossible.

When running an AB test, we are making a hypothesis that Variation B will convert at a higher rate for our overall population than Variation A will. Instead of displaying both pages to all 100,000 visitors, we display them to a sample instead and observe what happens.

• If Variation A (the original) had a better conversion rate with our sample of visitors, then no further actions need to be taken as Variation A is already our permanent page.
• If Variation B had a better conversion rate, then we need determine whether the improvement was statistically large “enough” for us to conclude that the change would be reflected in the larger population and thus warrant us changing our page to Variation B.

So why can’t we take the results at face value?

The answer is variability across samples. Thanks to the variance, there are a number of things that can happen when we run our AB test.

1. Test says Variation B is better & Variation B is actually better
2. Test says Variation B is better & Variation B is not actually better (type I error)
3. Test says Variation B is not better & Variation B is actually better (type II error)
4. Test says Variation B is not better & Variation B is not actually better

As you can see, there are two different types of errors that can occur. In examining how we avoid these errors, we will simultaneously be examining how we run a successful AB test.

Before we continue, I need to quickly explain a concept called the null hypothesis.

The null hypothesis is a baseline assumption that there is no relationship between two data sets. When a statistical hypothesis test is run, the results either disprove the null hypothesis or they fail to disprove the null hypothesis.

This concept is similar to “innocent until proven guilty”: A defendant’s innocence is legally supposed to be the underlying assumption unless proven otherwise.

For the purposes of our AB test, it means that we automatically assume Variation B is NOT a meaningful improvement over Variation A. That is our null hypothesis. Either we disprove it by showing that Variation B’s conversion rate is a statistically significant improvement over Variation A, or we fail to disprove it.

And speaking of statistical significance…

## Type I Errors & Statistical Significance

A type I error occurs when we incorrectly reject the null hypothesis.

To put this in AB testing terms, a type I error would occur if we concluded that Variation B was “better” than Variation A when it actually was not.

Remember that by “better”, we aren’t talking about the sample. The point of testing our samples is to predict how a new page variation will perform with the overall population. Variation B may have a higher conversion rate than Variation A within our sample, but we don’t truly care about the sample results. We care about whether or not those results allow us to predict overall population behavior with a reasonable level of accuracy.

So let’s say that Variation B performs better in our sample. How do we know whether or not that improvement will translate to the overall population? How do we avoid making a type I error?

Statistical significance.

Statistical significance is attained when the p-value is less than the significance level. And that is way too many new words in one sentence, so let’s break down these terms real quick and then we’ll summarize the entire concept in plain English.

The p-value is the probability of obtaining at least as extreme results given that the null hypothesis is true.

In other words, the p-value is the expected fluctuation in a given sample, similar to the variance. Imagine running an A/A test, where you displayed your page to 1,000 people and then displayed the exact same page to another 1,000 people.

You wouldn’t expect the sample conversion rates to be identical. We know there will be variability across samples. But you also wouldn’t expect it be drastically higher or lower. There is a range of variability that you would expect to see across samples, and that, in essence, is our p-value.

The significance level is the probability of rejecting the null hypothesis given that it is true.

Essentially, the significance level is a value we set based on the level of accuracy we deem acceptable. The industry standard significance level is 5%, which means we are seeking results with 95% accuracy.

So, to answer our original question:

We achieve statistical significance in our test when we can say with 95% certainty that the increase in Variation B’s conversion rate falls outside the expected range of sample variability.

Or from another way of looking at it, we are using statistical inference to determine that if we were to display Variation A to 20 different samples, at least 19 of them would convert at lower rates than Variation B.

## Type II Errors & Statistical Power

A type II error occurs when the null hypothesis is false, but we incorrectly fail to reject it.

To put this in AB testing terms, a type II error would occur if we concluded that Variation B was not “better” than Variation A when it actually was better.

Just as type I errors are related to statistical significance, type II errors are related to statistical power, which is the probability that a test correctly rejects the null hypothesis.

For our purposes as split testers, the main takeaway is that larger sample sizes over longer testing periods equal more accurate tests. Or as Ton Wesseling of Testing.Agency says here:

You want to test as long as possible – at least 1 purchase cycle – the more data, the higher the Statistical Power of your test! More traffic means you have a higher chance of recognizing your winner on the significance level your testing on!

Because…small changes can make a big impact, but big impacts don’t happen too often – most of the times, your variation is slightly better – so you need much data to be able to notice a significant winner.

Statistical significance is typically the primary concern for AB testers, but it’s important to understand that tests will oscillate between being significant and not significant over the course of a test. This is why it’s important to have a sufficiently large sample size and to test over a set time period that accounts for the full spectrum of population variability.

For example, if you are testing a business that has noticeable changes in visitor behavior on the 1st and 15th of the month, you need to run your test for at least a full calendar month.  This is your best defense against one of the most common mistakes in AB testing… getting seduced by the novelty effect.

Peter Borden explains the novelty effect in this post:

Sometimes there’s a “novelty effect” at work. Any change you make to your website will cause your existing user base to pay more attention. Changing that big call-to-action button on your site from green to orange will make returning visitors more likely to see it, if only because they had tuned it out previously. Any change helps to disrupt the banner blindness they’ve developed and should move the needle, if only temporarily.

More likely is that your results were false positives in the first place. This usually happens because someone runs a one-tailed test that ends up being overpowered. The testing tool eventually flags the results as passing their minimum significance level. A big green button appears: “Ding ding! We have a winner!” And the marketer turns the test off, never realizing that the promised uplift was a mirage.

By testing a large sample size that runs long enough to account for time-based variability, you can avoid falling victim to the novelty effect.

### Important Note About Statistical Significance

It’s important to note that whether we are talking about the sample size or the length of time a test is run, the parameters for the test MUST be decided on in advance.

Statistical significance cannot be used as a stopping point or, as Evan Miller details, your results will be meaningless.

As Peter alludes to above, many AB testing tools will notify you when a test’s results become statistical significance. Ignore this. Your results will often oscillate between being statistically significant and not being statistically significant.

The only point at which you should evaluate significance is the endpoint that you predetermined for your test.

## Terminology Cheat Sheet

We’ve covered quite a bit today.

For those of you who have just been smiling and nodding whenever statistics are brought up, I hope this guide has cleared up some of the more confusing concepts while providing you with a solid framework from which to pursue deeper understanding.

If you’re anything like me, reading through it once won’t be enough, so I’ve gone ahead and put together a terminology cheat sheet that you can grab. It lists concise definitions for all the statistics terms and concepts we covered in this article.

A concise list of statistics terminology to take with you for easy reference.

## A/B Test Intro: What Is An A/B Test?

What is an A/B Test? How does split testing work? Who should run AB tests? Discover the Conversion Scientists’ secrets to AB testing.

AB testing, also referred to as “split” or “ABn” testing, is the process of testing multiple variations of a web page in order to identify higher-performing variations and improve the page’s conversion rate.

As the web has become increasingly competitive and traffic has become increasingly expensive, the rate at which online businesses are able to convert incoming visitors to customers has become more and more important.

In fact, it has led to an entirely new industry, called Conversion Rate Optimization (CRO), and the centerpiece of this new CRO industry is AB testing.

More than any other thing a business can do, AB testing reveals what will increase online revenue and by how much. This is why we recommend it.

## What Is An A/B Test?

An AB test is an experiment in which a web page (Page A) is compared against a new variation of that page (Page B) by alternately displaying both versions to a live audience.

The number of visitors who convert on each page is recorded as a percentage of conversions per visitor, referred to as the “conversion rate”. The conversion rates for each page variation are then compared against each other to determine which page performs better.

What Is An A/B Test?

Using the above image as an example, since Page B has a higher conversion rate, it would be selected as the winning test and replace the original as the permanent page displayed to visitors.

(There are several very important statistical requirements Page B would have to meet in order to truly be declared the winner, but we’re keeping it simple for the purposes of this article)

## How Does Split Testing Work?

Split testing is a conceptually simple process, and thanks to an abundance of high-powered software tools, it is now very easy for marketers to run AB tests on a regular basis.

### 1. Select A Page To Improve

The process begins by identifying the page that you want to improve. Online landing pages are commonly tested, but you can test any page of a website. AB testing can even be applied to email, display ads and any number of things..

### 2. Hypothesize A Better Variation

Once you have selected your target page, it’s time to create a new variation that can be compared against the original. Your new page will be based on your best hypothesis about what will convert with your target audience, so the better you understand that audience, the better results you will get from AB testing.

### 3. Display Both Pages To A Live Audience

The next step is to display both pages to a live audience. In order to keep everything else equal, you’ll want to use split testing software to alternately display Page A (original) and Page B (variation) via the same URL.

### 4. Collect Conversion Data

Collect data on both pages. Monitor how many visitors are viewing each page, where they are clicking, and how often they are taking the desired action (usually converting into leads or sales). Tests must be run long enough to achieve statistically significant results.

### 5. Select The Winning Page

Once one page has proven to have a statistically higher conversion rate, implement it as the permanent page for that URL. The A/B test is now complete, and a new one can be started by returning to Step #2 and hypothesizing a new page variation.

## Who Should Run AB Tests?

Now that you understand what an AB test is, the next question is whether or not YOU should invest in running AB tests on your webpages.

There are three primary factors that determine whether AB testing is right for your website:

1. Number of transactions (purchases, leads or subscribers) per month.
2. The speed with which you want to test.
3. The average value of each sale, lead or subscriber to the business.

We’ve created a very helpful calculator called the Conversion Upside Calculator to help you understand what each small increase in your conversion rate will deliver in additional annual income.

Based on how much you stand to earn from improvements, you can decide whether it makes sense to purchase a suite of AB testing tools and experiment on your own or hire a dedicated CRO agency to maximize your results.

### 21 Quick and Easy CRO Copywriting Hacks

Keep these proven copywriting hacks in mind to make your copy convert.

• 43 Pages with Examples
• Assumptive Phrasing
• "We" vs. "You"
• Pattern Interrupts
• The Power of Three
• This field is for validation purposes and should be left unchanged.

## Correlation, Causation, and Their Impact on AB Testing

Correlation and causation are two very different things. Often correlation is at work while the causation is not. By understanding how to identify them, we can master correlation, causation and the decisions they drive.

In 2008, Hurricane Ike stormed his way through the Gulf of Mexico, striking the coasts of Texas and Louisiana. This powerful Category 3 hurricane took 112 lives, making Ike the seventh most deadly hurricane in recent history.

Ike stands alone in one other way: It is the only storm with a masculine name in the list of ten most deadly storms since 1950. For all of his bravado, Ike killed fewer people than Sandy, Agnes, the double-team of Connie and Dianne, Camile, Audrey and Katrina. Here are the top ten most deadly hurricanes according to a video published by the Washington Post.

If we pull the data for the top ten hurricanes since 1950 from

#10-Carol: 1954, 65 Deaths

#9-Betsy: 1965, 75 Deaths

#8-Hazel, 1954, 95 Deaths

#7-Ike 2008, 112 Deaths

#6-Sandy 2012, 117 Deaths

#5-Agnes, 1972, 122 Deaths

#4-Connie and Dianne, 1955, 184 Deaths

#3-Camille, 1969, 265 Deaths

#2-Audrey, 1957, 416 Deaths

#1-Katrina, 2005, 1833 Deaths

There is a clear correlation in this data, and in data collected on 47 other hurricanes. Female-named hurricanes kill 45 people on average, while the guys average only 23.

Heav’n has no Rage, like Love to Hatred turn’d,

Nor Hell a Fury, like a Woman scorn’d. — William Congreve

Now, if we assume causation is at work as well, an answer to our problem presents itself quite clearly: We should stop giving hurricanes feminine names because it makes them meaner. Clearly, hurricanes are affected by the names we give them, and we can influence the weather with our naming conventions.

You may find this conclusion laughable, but what if I told you that secondary research proved the causation, that we can reduce deaths by as much as two thirds simply by changing Hurricane Eloise to Hurricane Charley. It appears that hurricanes are sexist, that they don’t like being named after girls, and get angry when we do so.

Our minds don’t really like coincidence, so we try to find patterns where maybe there isn’t one. Or we see a pattern, and we try to explain why it’s happening because once we explain it, it feels like we have a modicum of control. Not having control is scary.

As it turns out, The Washington Post published an article about the relationship between the gender of hurricanes’ names and the number of deaths the hurricane causes. The article’s title is “Female-named hurricanes kill more than male hurricanes because people don’t respect them, study finds.” The opening sentence clears up confusion you might get from the title: “People don’t take hurricanes as seriously if they have a feminine name and the consequences are deadly, finds a new groundbreaking study.”

## The Difference Between Correlation and Causation

Another way to phrase the Washington Post’s conclusion is, The number of hurricane-related deaths depends on the gender of the hurricane’s name. This statement demonstrates a cause/effect relationship where one thing – the number of deaths – cannot change unless something else – the hurricane’s name – behaves a certain way (in this case, it becomes more or less feminine).

If we focus on decreasing hurricane-related deaths, we can make changes to the naming convention that will that try to take people’s implicit sexism out of the picture. We could:

• Make all the names either male or female instead of alternating
• Choose names that are gender non-specific
• Change the names to numbers
• Use date of first discovery as identification
• Use random letter combinations
• Use plant or animal names

## What is Correlation?

In order to calculate a correlation, we must compare two sets of data. We want to know if these two datasets correlate or change together. the graph below is an example of two datasets that correlate visually.

Graph from Google Analytics showing two datasets that appear to correlate.

In this graph of website traffic, our eyes tell us that the Blue and Orange data change at the same time and with the same magnitude from day to day. Incidentally, causation is at play here as well. The Desktop + Tablet Sessions data is part of All Sessions so the latter depends on the former.

How closely do these two lines correlate? We can find out with some help from a tool called a scatter plot. These are easy to generate in Excel. In a scatter plot, one dataset is plotted along the horizontal axis and the other is graphed along the vertical axis. In a typical graph, the vertical value, called y depends on the horizontal value, usually called x. In a scatter plot, the two are not necessarily dependent on each other. If two datasets are identical, then the scatter plot is a straight line. The following image shows the scatter plot of two datasets that correlate well.

The scatter plot of two datasets with high correlation.

In contrast, here is the scatter plot of two datasets that don’t correlate.

The scatter plot of two datasets with a low correlation.

The equations you see on these graphs include and Rthat is  calculated by Excel for us when we add a Trendline to the graph. The closer this value is to 1, the higher the statistical correlation. You can see that the first graph has an R2 of 0.946 — close to 1 — while the second is 0.058. We will calculate a correlation coefficient and use a scatter plot graph to visually inspect for correlations.

For data that shows a strong correlaton, we can then look for evidence proving or disproving causation.

## Errors in Correlation, Causation

Causation can masquerade as a number of other effects:

1. Coincidence: Sometimes random occurrences appear to have a causal relationship.
2. Deductive Error: There is a causal relationship, but it’s not what you think.
3. Codependence: An external influence, a third variable, on the which two correlated things depend.

Errors of codependence result from an external stimuli that affects both datasets equally. Here are some examples.

Math scores are higher when children have larger shoe sizes.

Can we assume larger feet cause increased capacity for math?

Possible third variable: Age; children’s feet get bigger when they get older.

Enclosed dog parks have higher incidents of dogs biting other dogs/people.

Can we assume enclosed dog parks cause aggression in dogs?

Possible third variable: Attentiveness of owners; pet owners might pay less attention to their dogs’ behavior when there is a fence around the dog park.

Satisfaction rates with airlines steadily increase over time.

Can we assume that airlines steadily improve their customer service?

Possible third variable: Customer expectations; customers may have decreasing expectations of customer service over time.

The burden of proof is on us to prove causation and to eliminate these alternative explanations.

## How to Prove Causation When All You Have is Correlation

As we have said, when two things correlate, it is easy to conclude that one causes the other. This can lead to errors in judgement. We need to determine if one thing depends on the other. If we can’t prove this with some confidence, it is safest to assume that causation doesn’t exist.

### 1. Evaluate the Statistics

Most of our myths, stereotypes and superstitions can be traced to small sample sizes. Our brains are wired to find patterns in data, and if given just a little data, our brains will find patterns that don’t exist.

The dataset of hurricanes used in the Washington Post article contains 47 datapoints. That’s a very small sample to be making distinctions about. It’s easier to statistically eliminate causation as an explanation than it is to prove it causation.

For example, people avoid swimming in shark infested waters is likely to cause death by shark. Yet they don’t avoid walking under coconut trees because, “What are the odds” that a coconut will kill you. As it turns out, there are 15 times more fatalities each year from falling coconuts than from shark attacks.

If you’re dealing with less than 150 data points — the coconut line — then you probably don’t need to even worry about whether one thing caused the other. In this case, you may not be able to prove correlation, let alone causation.

### 2. Find Another Dataset

In the case of hurricanes, we have two datasets: The number of deaths and weather or not the hurricane was named after a boy or a girl.

The relationship between a hurricane’s name and hurricane deaths.

The correlation is pretty obvious. This is binary: either the storm has a man’s name or a woman’s name. However, this becomes a bit clouded when you consider names like Sandy and Carol, which are names for both men and women. We need need a dataset that measures our second metric with more granularity if we’re going to calculate a correlation.

Fortunately, we have the web. I was able to find another dataset that rated names by masculinity. Using the ratings found on the site behindthename.com, we graphed femininity vs. death toll. Because of the outlier, Katrina, we used a logarithmic scale.

There is little statistical correlation between masculinity and death toll. Causation is in question.

I created a trend line for this data and asked Excel to provide a coefficient of determination, or an R-squared value. As you remember, the closer this number is to 1, the higher the two datasets correlate. At 0.0454, there’s not a lot of correlation here.

Researchers at the University of Illinoise and Arizona State University did the same thing as a part of their study, according to the Washington Post story. They found the opposite result. “The difference in death rates between genders was even more pronounced when comparing strongly masculine names versus strongly feminine ones.” They were clearly using a different measure of “masculinity” to reach their conclusion.

What else could we do to test causation?

### 3. Create Another Dataset Using AB Testing

Sometimes, we need to create a dataset that verifies causation. The researchers in our Washington Post study did this. They setup experiments “presenting a series of questions to between 100 and 346 people.” They found that the people in their experiments predicted that male-named hurricanes would be more intense, and that they would prepare less for female-named hurricanes.

In short, we are all sexist. And it’s killing us.

Running an experiment is a great way to generate more data about a correlation in order to establish causation. When we run an AB test, we are looking for a causation, but will often settle for correlation. We want to know if one of the changes we make to a website causes an increase in sales or leads.

We can deduce causation by limiting the number of things we change to one per treatment.

#### AB Testing Example: Correlation or Causation

One of the things we like to test is the importance of findability on a website. We want to discern how important it is to help visitors find things on a site. For a single product site, findability is usually not important. If we add search features to the site, conversions or sales don’t rise.

For a catalog ecommerce site with hundreds or thousands of products, findability may be a huge deal. Or not.

We use a report found in Google Analytics that compares the conversion rate of people who search against all visitors.

This report shows that “users” who use the site search function on a site buy more often and make bigger purchases when they buy.

This data includes hundreds of data points over several months, so it is statistically sound. Is it OK, then, to assume that if we get more visitors to search, we’ll see an increase in purchases and revenue? Can we say that searching causes visitors to buy more, or is it that buyers use the site search feature more often?

In this case, we needed to collect more information. If search causes an increase in revenue, then if we make site search more prominent, we should see an increase in transactions and sales. We designed two AB tests to find out.

In one case, we simplified the search function of the site and made the site search field larger.

This AB Test helped identify causation by increasing searches and conversions.

Being the skeptical scientists that we are, we defined another AB test to help establish causation. We had a popover appear when a visitor was idle for more than a few seconds. The popover offered the search function.

This AB test increased the number of searchers and increased revenue per visit.

At this point, we had good evidence that site search caused more visitors to buy and to purchase more.

#### Another AB Testing Example

The point of AB testing is to make changes and be able to say with confidence that what you did caused conversion rates to change. The conversion rate may have plummeted or skyrocketed or something in between, but it changed because of something you did.

One of our clients had a sticky header sitewide with three calls-to-action: Schedule a Visit, Request Info, and Apply Now. Each of these three CTAs brought the visitor to the exact same landing page.

We hypothesized that multiple choices were overwhelming visitors, and they were paralyzed by the number of options. We wanted to see if fewer options would lead to more form fills. To test this hypothesis, we only changed one thing for our AB test: we removed “Apply Now”.

After this change we saw a 36.82% increase in form fills. The conversion rate went from 4.9% to 6.71%.

Phrased differently: The number of form fills depends on the number of CTAs.

We get the terms Dependent Variable and Independent Variable from this kind of cause/effect relationship.

The number of CTAs is the independent variable because we – the people running the test – very intentionally changed it.

The number of form fills is the dependent variable it depended on the number of CTAs. Changes to the dependent variable happen indirectly. A researcher can’t reach in and just change it.

Make sense?

This is called a causal relationship because one variable causes another to change.

### 4. Prove the “Why?”

If you have a set of data that seems to prove causation, you are left with the need to answer the questions, “Why?”

Why do female-named hurricanes kill more people? The hypothesis we put forward at the beginning of this article was that girly names make hurricanes angry and more violent. There is plenty of evidence from the world of physics that easily debunks this theory. We chose it because it was absurd, and we hoped an absurdity would get you to read this far. (SUCCESS!)

The researchers written about by the Washington Post came up with a more reasonable explanation: that the residents in the path of such storms are sexist, and prepare less for feminine-sounding hurricanes. However, even this reasonable explanation needed further testing.

The problem with answering the question, “Why?” in a reasonable way is that our brains will decide that it is the answer just because it could be the answer. Walking at night causes the deaths of more pedestrians than walking in daylight. If I told you it was because more pedestrians drink at night and thus blunder into traffic, you might stop all analysis at that point. However, the real reason may be that cars have more trouble seeing pedestrians at night than in the daytime.

Don’t believe the first story you hear or you’ll believe that hurricanes hold a grudge. Proving the “Why” eliminates errors of deduction.

#### Does Watching Video Cause More Conversions?

We did a AB test for the site Automatic.com in which we replaced an animation with a video that explains the benefits of their adapter that connects your smartphone to your car. In this test, the treatment with the video generated significantly more revenue than the control.

Replacing the animation (left) with a video (right) on one site increased revenue per visit.

Our test results demonstrate a correlation between video on the home page and an increase in revenue per visitor. It is a natural step to assume that the video caused more visitors to buy. Based on this, we might decide to test different kinds of video, different lengths, different scripts, etc.

As we now know, correlation is not causation. What additional data could we find to verify causation before we invest in additional video tests?

We were able to find an additional dataset. The video player provided by Wistia tracked the number of people who saw the video on the page vs. the number of people who watched the video. What we learned was that only 9% of visitors actually clicked play to start watching the video.

Even though conversions rose, there were few plays of the video.

So, the video content was only impacting a small number of visitors. Even if every one of these watchers bought, it wouldn’t account for the increase in revenue. Here, the 9% play rate is the number of unique plays divided by the number of unique page loads.

A more likely scenario is that the animation had a negative impact on conversion vs. the static video title card image. Alternatively, the load time of the animation may have allowed visitors to scroll past before seeing it.

Nonetheless, had we continued with our deduction error, we might have invested heavily in video production to find more revenue when changing the title card for this video is all we needed.

## Back to Hurricanes

The article argues: The number of hurricane-related deaths depends on the gender of the hurricane’s name.

Do you see any holes in this conclusion?

These researchers absolutely have data that say in no uncertain terms that hurricanes with female names have killed more people, but have they looked closely enough to claim that the name causes death? Let’s think about what circumstances would introduce a third variable each time a hurricane makes landfall.

• Month of year (is it the beginning or end of hurricane season?)
• Position in lunar phase (was there a full moon?)
• Location of landfall

If we only consider location of landfall, there are several other third variables to consider:

• Amount of training for emergency personnel
• Quality of evacuation procedures
• Average level of education for locals
• Average socio-economic status of locals
• Proximity to safe refuge
• Weather patterns during non-hurricane seasons

I would argue that researchers have a lot more work to do if they really want to prove that femininity of a hurricane’s name causes a bigger death toll. They would need to make sure that only variable they are changing is the name, not any of these third variables.

Unfortunately for environmental scientists and meteorologists, it’s really difficult to isolate variables for natural disasters because it’s not an experiment you can run in a lab. You will never be able to create a hurricane and repeatedly unleash it on a town in order to see how many people run. It’s not feasible (nor ethical).

Fortunately for you, it’s a lot easier when you’re AB testing your website.

## How an A/A Test Gives You Confidence

Nothing gives you confidence and swagger like AB testing. And nothing will end your swagger faster than bad data. In order to do testing right, there are some things you need to know about AB testing statistics. Otherwise, you’ll spend a lot of time trying to get answers, but instead of getting answers, you’ll end up either confusing yourself more or thinking you have an answer, when really you have nothing. An A/A test ensures that the data you’re getting can be used to make decisions with confidence.

What’s worse than working with no data? Working with bad data.

We’re going to introduce you to a test that, if successful will teach you nothing about your visitors. Instead, it will give you something that is more valuable than raw data. It will give you confidence.

## What is an A/A Test

The first thing you should test before your headlines, your subheads, your colors, your call to actions, your video scripts, your designs, etc. is your testing software itself. This is done very easily by testing one page against itself. One would think this is pointless because surely, the same page against the same page is going to have the same results, right?

Not necessarily.

After three days of testing, this A/A test showed that the variation identical to the Original was delivering 35.7% less revenue. This is a swagger killer.

This A/A Test didn’t instill confidence after three days.

This can be cause by any of these issues:

1. The AB testing tool you’re using is broken.
2. The data being reported by your website is wrong or duplicated.
3. The AA test needs to run longer.

Our first clue to the puzzle is the small size of the sample. While there were over 345 or more visits to each page, there were only 22 and 34 transactions. This is too small by a large factor. In AB testing statistics, transactions are more important than traffic in building statistical confidence. Having fewer than 200 transactions per treatment often delivers meaningless results.

Clearly, this test needs to run longer.

Your first instinct may be to hurry through the A/A testing so you can get to the fun stuff – the AB testing. But that’s going to be a mistake, and the above shows why.

An A/A test serves to calibrate your tools

Had the difference between these two identical pages continued over time, we would call off any plans for AB testing altogether until we figured out if the tool implementation or website were the source of the problem. We would also have to retest anything done prior to discovering this AA test anomaly.

In this case, running the A/A test for a longer stretch of time increased our sample size and the results evened out, as they should in an A/A test. A difference of 3.5% is acceptable for an AA test. We also learned that a minimum sample size approaching 200 transactions per treatment was necessary before we could start evaluating results.

This is a great lesson in how statistical significance and sample size can build or devastate our confidence.

## An A/A Test Tells You Your Minimum Sample Size

The reason the A/A test panned out evenly in the end was it took that much time for a good amount of traffic to finally come through the website and see both “variations” in the test. And it’s not just about a lot of traffic, but a good sample size.

• Your shoppers on a Monday morning are statistically completely different people from your shoppers on a Saturday night.
• Your shoppers during a holiday seasons are statistically different from your shoppers on during a non-holiday season.
• Your shoppers at work are different from your shoppers at home.

It’s amazing the differences you may find if you dig into your results, down to specifics like devices and browsers. Of course, if you only have a small sample size, you may not be able to trust the results.

This is because a small overall sample size means that you may have segments of your data allocated unevenly. Here is an sample of data from the same A/A test. At this point, less than 300 sessions per variation have been tested. You can see that, for visitors using the Safari browser–Mac visitors–there is an uneven allocation, 85 visitors for the variation and 65 control. Remember that both are identical. Furthermore, there is an even bigger divide between Internet Explorer visitors, 27 to 16.

This unevenness is just the law of averages. It is not unreasonable to imagine this kind of unevenness. But, we expect it to go away with larger sample sizes.

You might have different conversion rates with different browsers.

Statistically, an uneven allocation leads to different results, even when all variations are equal. If the allocation of visits is so off, imagine that the allocation of visitors that are ready to convert is also allocated unevenly. This would lead to a variation in conversion rate.

And we see that in the figure above. For visitors coming with the Internet Explorer browser, none of sixteen visitors converted. Yet two converting visitors were sent to the calibration variation for a conversion rate of 7.41%.

In the case of Safari, the same number of converting visitors were allocated to the Control and the calibration variation, but only 65 visits overall were sent to the Control. Compared this to the 85 visitors sent to the Calibration Variation. It appears that the Control has a much higher conversion rate.

But it can’t because both pages are identical.

Over time, we expect most of these inconsistencies to even out. Until then they often add up to uneven results.

These forces are at work when you’re testing different pages in a AB test. Do you see why your testing tool can tell you to keep the wrong version if your sample size is too small?

## Calculating Test Duration

You have to test until you’ve received a large enough sample size from different segments of your audience to determine if one variation of your web page performs better on the audience type you want to learn about. The A/A test can demonstrate the time it takes to reach statistical significance.

The duration of an AB test is a function of two factors.

1. The time it takes to reach an acceptable sample size.
2. The difference between the performance of the variations.

If a variation is beating the control by 50%, the test doesn’t have to run as long. The large margin of “victory”, also called “chance to beat” or “confidence”, is larger than the margin of error, even at small er sample sizes.

So, an A/A test should demonstrate a worst case scenario, in which a variation has little chance to beat the control because it is identical. In fact, the A/A test may never reach statistical significance.

In our example above, the test has not reached statistical significance, and there is very little chance that it ever will. However, we see the Calibration Variation and Control draw together after fifteen days.

These identical pages took fifteen days to come together in this A/A Test.

This tells us that we should run our tests a minimum of 15 days to ensure we have a good sample set. Regardless of the chance to beat margin, a test should never run for less than a week, and two weeks is preferable.

## Setting up an A/A Test

The good thing about an A/A test is that there is no creative or development work to be done. When setting up an AB test, you program the AB testing software to change, hide or remove some part of the page. This is not necessary for an A/A test, by definition.

For an A/A test, the challenge is to choose the right page on which to run the test. Your A/A test page should have two characteristics:

1. Relatively high traffic. The more traffic you get to a page, the faster you’ll see alignment between the variations.
2. Visitors can buy or signup from the page. We want to calibrate our AB testing tool all the way through to the end goal.

For these reasons, we often setup A/A tests on the home page of a website.

You will also want to integrate your AB testing tool with your analytics package. It is possible for your AB testing tool to be setup wrong, yet both variations behave similarly. By pumping A/A test data into your analytics package, you can compare conversions and revenue reported by the testing tool to that reported by analytics. They should correlate.

### Can I Run an A/A Test at the Same Time as an AB Test?

Statistically, you can run an A/A test on a site which is running an AB test. If the tool is working well, than your visitors wouldn’t be significantly affected by the A/A test. You will be introducing additional error to your AB test, and should expect it to take longer to reach statistical significance.

And if the A/A test does not “even out” over time, you’ll have to throw out your AB test results.

You may also have to run your AB test past statistical significance while you wait for the A/A test to run its course. You don’t want to change anything at all during the A/A test.

## The Cost of Running an A/A Test

There is a cost of running an A/A test: Opportunity cost. The time and traffic you put toward an A/A test could be used to for an AB test variation. You could be learning something valuable about your visitors.

The only times you should consider running an A/A test is:

1. You’ve just installed a new testing tool or changed the setup of your testing tool.
2. You find a difference between the data reported by your testing tool and that reported by analytics.

Running an A/A test should be a relatively rare occurrence.

There are two kinds of A/A test:

1. A “Pure” two variation test
2. An AB test with a “Calibration Variation”

Here are some of the advantages and disadvantages of these kinds of A/A tests.

### The Pure Two-Variation A/A Test

With this approach, you select a high-traffic page and setup a test in your AB testing tool. It will have the Control variation and a second variation with no changes.

Advantages: This test will complete in the shortest timeframe because all traffic is dedicated to the test

### The Calibration Variation A/A Test

This approach involves adding what we call a “Calibration Variation” to the design of a AB test. This test will have a Control variation, one or more “B” variations that are being tested, and another variation with no changes from the Control. When the test is complete you will have learned something from the “B” variations and will also have “calibrated” the tool with an A/A test variation.

Advantages: You can do an A/A test without stopping your AB testing program.

Disadvantages: This approach is statistically tricky. The more variations you add to a test, the larger the margin of error you would expect. It will also drain traffic from the AB test variations, requiring the test to run longer to statistical significance.

AA Test Calibration Variation in an AB Test (Optimizely)

Unfortunately, in the test above, our AB test variation, “Under ‘Package’ CTAs”, isn’t outperforming the A/A test Calibration Variation.

## You Can Learn Something More From an A/A Test

One of the more powerful capabilities of AB testing tools is the ability to track a variety of visitor actions across the website. The major AB testing tools can track a number of actions that can tell you something about your visitors.

1. Which steps of your registration or purchase process caused them to abandon your site
2. How many visitors started to fill out a form
3. Which images visitors clicked on
4. Which navigation items were most frequently clicked

Go ahead and setup some of these minor actions–usually called ‘custom goals’– and then examine the behavior when the test has run its course.

## In Conclusion

Hopefully, if nothing else, you were amused a little throughout this article while learning a bit more about how to ensure a successful AB test. Yes, it requires patience, which I will be the first to admit I don’t have very much of. But it doesn’t mean you have to wait a year before you switch over to your winning variation.

You can always take your winner a month or two in and use it for PPC and continue testing and tweaking on your organic traffic. That way you get the both worlds – the assurance that you’re using your best possible option on your paid traffic and taking the time to do more tests on your free traffic.

And that, my friends, is AB testing success in a nutshell. Now go find some stuff to test and tools to test with!

### 21 Quick and Easy CRO Copywriting Hacks

Keep these proven copywriting hacks in mind to make your copy convert.

• 43 Pages with Examples
• Assumptive Phrasing
• "We" vs. "You"
• Pattern Interrupts
• The Power of Three
• This field is for validation purposes and should be left unchanged.

## The Hero's Journey to an Amazing AB Testing Program

We are often guilty of writing about the AB testing process as if it was something you can jump into and start doing. We believe an AB testing program can keep you from making expensive design mistakes and find hidden revenue that your competitors are currently getting. It’s not an overnight switch, however. It takes some planning and resources.
It’s a journey not unlike that taken by many heroes throughout history and mythology. We invite you to join the ranks of heroic journeymen.
The journey looks something like this: You befriend a helpful stranger who gives you something magical. Soon, you are called to adventure by events beyond your control. When you act, you enter a strange new world and must understand it. You are set on a journey to right some wrong. Allies and helpers work with you to move past trials and tests. Your magical talisman helps you understand what to do. With patience you gather your reward and return home, the master of two worlds.
Consider this blog post a harbinger of the adventure that awaits you. Here are the things you’ll encounter on your journey.

## Executive Champion: Magical Helper

Every hero story involved some kind of “supernatural help”. In the story of the Minotaur, Ariadne gave Theseus the golden string to find his way out of the Minotaur labyrinth. In Star Wars, Obi-Wan Kenobi gave Luke Skywalker a light saber showing him how to dodge blaster shots and face Darth Vader.
Each barrier in your path to an amazing AB testing program will have obstacles, and each obstacle will require a little magical help. This is the role of your executive champion. Your executive can impart to you special gifts, such as the Magical Elixir of Budget, the Red Tape Cleaver: Blessed Blade of Freedom, as well as the One Ring of Power to rule them all…but let’s not get carried away.
In my experience – and I’d like to hear yours – AB testing is not something that can be done “under the radar” until you have some successes. Use this post to guide you, prepare a presentation, and convince someone with pull to support your efforts.

## Traffic: The Call to Adventure

It is when we finally have a steady, reliable stream of visitors that we are called to our hero’s journey. Traffic is like the taxes imposed by the Sherriff of Nottingham. The hero just can’t stand by and watch injustice.
Likewise, you must feel uncomfortable about having thousands of visitors coming to your site – most of them paid for – and then seeing 99% of them turn and leave. This is injustice at it’s most heartbreaking and a clear conversion optimization problem. Many companies will just up the Adwords budget to grow revenue. Heroes fight for the common visitor.
AB testing is a statistical approach to gathering data and making decisions. There is a minimum number of transactions you will want each month in order for your AB tests to reach statistical significance. In general, you can test an idea a month with 300 monthly transactions.
To see if you have the traffic and conversions, use our simple Conversion Upside Calculator. It will tell you how quickly you would expect a positive ROI on your AB testing program.

## Analytics: Understanding the Unknown

Upon accepting the call to adventure, the hero will find herself in a strange new world. Here the rules she is so familiar with will no longer apply. She will see things differently. In the tale of The Frog Prince, a spoiled princess agrees to befriend a talking frog. In exchange the frog will retrieve her favorite golden ball from a deep pool. Upon making the deal, her world changes.
You, too, have lost your golden ball.
Most websites have Google Analytics, Adobe Analytics, Clicky, Mixpanel, or some other analytics package in place. I recommend that you not look at this as a scary forest full of strange tables, crooked graphs and unfathomable options. Instead, look at this as a constantly running focus group. It’s a collection of answers to your most pressing questions.
You should get to know your web analytics tools, but don’t get wrapped around the axel thinking you need to have fancy dashboards and weekly updates. That’s the work of admins.
Instead, sit down with a specific question you want to answer, and figure out how to drill deep into it.
“Where are visitors entering our website?”
“How long are they staying?”
“Which pages seem to cause the most trouble?”
“Are buyers of Product A acting different from buyers of Product B?”
“How important is site search to our visitors?”
This gives you amazing cosmic powers to make decisions that otherwise would have been coin tosses.

## Hypothesis List: The Yellow Brick Road

One of our favorite hero stories is that of Dorothy and her journey through the Kingdom of Oz. This is a favorite because it has all of the elements of the hero’s journey. Our hero’s journey needs a path to follow. Just as Dorothy was told to follow the yellow brick road to Oz, our hypotheses are the yellow bricks in our path to AB testing success.
As you become more familiar with analytics, you will have many ideas sliding out of your head. [pullquote]Ideas are like the slippery fish in an overflowing barrel.[/pullquote] You probably already have a lot of questions about how things are working on your site. You’ve probably collected dozens of ideas from well-meaning coworkers.
It can be overwhelming.
The magical helper for unbounded ideas is called the Hypothesis List. It is like Don Quixote’s Rocinante. It is your powerful steed on which you can rely to carry you through your journey to testing. By building out this Hypothesis List, you will eliminate ideas that aren’t testable, refine ideas that are, and rank them based on expected ROI.
[pullquote]If AB testing tells you which of your ideas are good ones, the Hypothesis List tells you which are most likely to be good ones.[/pullquote]

### Ideas are not Hypotheses

A Hypothesis is an “educated guess”. To be a hypothesis, an idea must be somewhat educated: informed by data, supported by experience, or born from observation. Any idea that begins with “Wouldn’t it be cool if…” is probably not a hypothesis.
When you take an idea, and try to write it into the format of a hypothesis, you quickly realize the difference. Here’s the format of a hypothesis:

If we [make a change], we expect [a desirable result] as measured by [desired outcome].

The change is a modification to copy, layout, navigation, etc. that tests a hypothesis. it is insufficient to say “Get more clicks on Add to Cart”. You must state a specific change, such as, “Increase the size of the Add to Cart button”.
The result is a desired outcome. For most tests, the desired out come is a bottom-line benefit.

• “increase transactions”
• “decrease abandonment”
• “increase phone calls”
• “increase visits to form page”

[pullquote]Soft results such as “increase engagement” are popular, but rarely correlate to more leads, sales or subscribers.[/pullquote]
The outcome is usually the metric by which you will gauge the success of the test.

• Revenue per Visitor
• Form Abandonment Rate

### The ROI Prioritized Hyptohesis List Spreadsheet

• List and categorize your ideas
• Rate and rank to find “Low Hanging Fruit”
• Place into buckets to identify key issues.

Many of your ideas will spawn several detailed hypotheses. Many ideas will simply die from lack of specificity.

### Too Many Ideas

It is not unusual to have more ideas than you can manage. Nonetheless, it makes sense to capture them all. A simple Excel spreadsheet does the trick for collecting, sorting and ranking.

### Too Few Ideas

It may be hard to believe, but you will run out of good hypotheses faster than you know. Plus, there are many hypotheses that will never be obvious to you and your company because as the old saying goes, “You can’t read the label from inside the bottle.”
This is where focus groups, online user testing sites, surveys, and feedback forms play an important role. Too many marketers use input from the qualitative sources as gospel truth. This is a mistake. You’re working toward an AB testing process that will let you test this input.

### Ranking by Expected ROI

We recommend ranking your hypotheses so that the “low hanging fruit” bubbles up to the top. Our ROI Prioritized Hypothesis List ranks them based on four criteria, all ranked on a scale of one to five:

1. Level of Effort: How difficult is this to test and implement?
2. Traffic Affected: How much traffic will this hypothesis affect, and how important is that traffic?
3. Proof: How much evidence did we see in analytics and other tools that this hypothesis really is a problem?
4. Impact: Based on my experience and knowledge, how big of an impact do I really think this hypothesis can drive?

Once you’ve plugged a value in for these criteria for each hypothesis, add 2, 3 and 4 and subtract 1. That’s the weight of each hypothesis. The higher the weight, the lower the fruit hangs.

## The Scarecrow: JavaScript Developer

Every hero has helpers, allies and maybe even a sidekick. You are no exception. Dorothy had the Scarecrow, the first ally she met on the banana-colored road to Oz. The Scarecrow had an amazing brain, but didn’t really know it.
At the end of your journey, you are going to have complete control of your website. You won’t need to put requests into IT. You will have the power to change your website for every visitor because the change happens in the visitor’s browser. Each visitor gets the same website from the server. It’s the JavaScript that instantly transforms what they see into a test treatment, something different and measureable.
Your JavaScript developer will be your Scarecrow. This person must be comfortable with JavaScript, HTML, CSS, the browser DOM and cross-browser issues. This person will enable you to make changes that put your hypotheses to the test.

## The Tin Man: Designer

Dorothy was also befriended by the man who didn’t know he had a heart on the way to Oz.
You’ll want  a designer that isn’t interested in redesigning your pages. All you need is a designer who can change portions of a page. It may be a design change as simple as a new image, or as complex as a new page layout.
Avoid designers who like to add to their egos to your design.

## The Lion: Copywriter

I made the copywriter the lion in this journey because writing and defending bold headlines and copy takes courage. Like Dorothy’s friend, the cowardly lion, most copywriters have been taught to write business speak to online visitors. They have PTSD from having their copy bled on by executives. This won’t work in testing.
Your copywriter needs to be able to writer for personas. He must be brave enough to create “corner copy”, or headlines that test the extremes of emotion, logic, spontaneity and deliberateness.
One of our biggest winning headlines was “Are you ready to stop lying? We can help.” It took bravery to write and defend this headline that delivered a 43% increase in leads for an addiction treatment center.

## Tests and Trials: QA Methodology

Every hero is tested. Villains and the universe put obstacles in place to test the hero’s resolve. You, too, will be tested when you realize that you don’t have just one website. You have ten or twenty. Or thirty.
Your website renders differently in each browser. Safari looks different from Chrome. Internet Explorer seems to be dancing to it’s own HTML tune.
Your website renders differently on smaller screens. Smartphones, phablets, tablets and 4K monitors squeeze and stretch elements until your HTML and CSS snap.
Your website renders differently based on your connection. Fast Wi-Fi is often not available to your mobile visitors. Your development team is probably testing on fast Wi-Fi.
The permutations of these issues means that you can’t design for one site, even if you have responsive design.
Add to this JavaScript that moves things around on each of these different websites, and you have the potential to bring some of your visitors to an unusable website.
At Conversion Sciences, we go to the extreme of purchasing devices and computers that let us test a the most common browsers, screen sizes and operating systems.

Conversion Sciences’ QA station

There are a number of sites that will provide simulators of a variety of devices, browsers and OSes. These have names like BrowserStack, Sauce Labs, and Litmus.
How do you know which of these you should be QAing on? Your analytics database, of course. Look it up.

## Magical Help: AB Testing Tools

As we said above, your executive champion can bestow on you supernatural aids to help you in your journey. This is not the only source of magical helpers. Aladdin found the magic lamp in a Cave of Wonders.
Your magic lamp and “genie” are your AB testing tool. These marvelous tools make our agency possible. AB Testing tools have magical powers.

• They inject JavaScript into our visitors’ browsers, allowing us to change our website without changing the backend.
• They split traffic for us, letting us isolate individual segments of visitors to test.
• They track revenue, leads and subscribers for us, so we know if our changes really generate more business for us.
• They provide the statistical analysis that tells us when we can declare a winner in our tests.
• They provide lovely graphs and reports.

The market leaders currently are Optimizely, Adobe Target, and Visual Website Optimizer (VWO). We have also used Convert.com, Maxymiser, Monetate and Marketizator to test websites.
We call these tools the “supreme court” of analytics. They control many of the variables that pollute data, and give us confidence that our changes will deliver more revenue, leads and subscribers to our clients.

## The Belly of the Whale: Patience

The story of “Jonah and the Whale” appears in the Bible and Quran. In short, God asks Jonah to go to the city of Ninevah. Jonah hems and haws. So God arranges for Jonah to be swallowed by a big fish. After three days of apologizing, the whale spits Jonah out, and he begins his God-ordained journey.
It turns out that the belly of the whale is a theme in many hero myths. Like them, you will find yourself waiting and wondering as your tests slowly gather data. Some will go more slowly than others. Pressures from executives will mount. You must persevere.
[pullquote]Do not rush to make decisions, even if it looks like your test is going to deliver you a winner. Let the process run its course.[/pullquote]

## The Reward: Revenue, Subscribers and Leads

In the end, you will have winning hypotheses and losing hypotheses. Because you won’t launch the losers and will push live the winners, you’ll begin to collect your boon, your reward, your Benjamins.
Be sure to show off a bit. Toot your own horn. Heroes come home to fanfare and celebration. Let your organization know what your AB testing program is doing for them and revel in the glow of success.

## Master of Two Worlds: AB Testing Program

Your journey has taken you from magical helpers to reward. Along the way you entered a new world, learned its rules, overcame tests and trials, and used magic to win the day.
You are now Master of Two Worlds: Master of the old world of pray marketing, and Master of the new world of data-driven marketing. This will be a career builder and a business boon.
This is everything you need to build your AB testing program. Conversion Sciences offers a ready-made turnkey conversion optimization program. Ask for a free strategy consultation an carve months off of your hero’s journey.
[signature]

## AB Testing JavaScript: Great Power, Great Problems

The AB Testing JavaScript that powers tests is powerful, but can lead to many unintended consequences.
Conversion Sciences offers a pretty amazing ability to our clients: A completely “turnkey” testing service. By “turnkey” we mean that our clients don’t need to do anything to their website in order for us to analyze the site, design creative, develop code, QA, test, review and route traffic to winners.
Why is this? Don’t we need new pages designed? Interactions with IT? Release schedules? Sprints?
The reason we have this “phenomenal cosmic power” is that our AB testing tools take control of each visitor’s browser. The changes are made on the visitors’ devices, so the website doesn’t have to be modified.
Well, not until we’ve verified that a change brings in big money.
While this makes us feel all high and mighty, it comes with a big drawback.

## The Magic and Mania of JavaScript

All of the major browsers have a scripting engine built into them. It allows programs to be run inside the browser. The programming language used is called JavaScript. It’s JavaScript that makes websites interactive. Website developers use JavaScript to make text accordion when you click a heading. It is used to rotate images in a carousel.
Unfortunately, developers use JavaScript to do silly things, like parallax animations.

This unnecessary “parallax” motion may be reducing the conversion rates for this site.

And then there’s this.

Don’t use JavaScript animations just because you can.

Yes, JavaScript gives us the power to make our websites harder to use.

## JavaScript is Used in AB Testing Software

Our developers use JavaScript to modify a website when you visit it. The AB testing software “injects” our JavaScript into the browser when the page is loaded.
First the web page is loaded as is from the webserver. Some visitors will see only this default version of the website.
For some visitors, our JavaScript is then injected and executed. The AB Testing software determines who sees the default web page and who will see a variation of the page.

## Phenomenal Cosmic AB Testing Power

We can change almost anything about a page using AB Testing JavaScript.
We change the headline to something different.
We change the order of a rotating carousel, or slider.
We hide elements…

AB Testing flicker can be cause by simply removing elements.

and insert them as well.
We insert video.
We completely change the look of a website.

We completely change the look and feel of this site. The test showed the white site performed the same as the brown site.

## AB Testing JavaScript is a Step Toward Personalization

If we wanted, we could deliver a different website to every visitor who arrives. If you’re thinking “personalization” at this point, then you have an idea of where our industry is heading. [pullquote]AB testing produces the data that makes personalization truly work.[/pullquote] Here are my instagraphic notes from a presentation by Chris Gibbins at Conversion Conference 2016 on using AB testing and personalization.

Instagraphic shows why and how to use AB Testing for personalization.

## AB Testing Flicker, Flash, Blink and Ghosting

Unfortunately, JavaScript can introduce an error into our AB tests. Because we always load the website “as it is” on the server first, there is a possibility that the visitor will see the original for a fraction of a second before our changes get executed.
This has been called a “flash”, “flicker”, and a “blink”. It can have a significant effect on test results.

With AB Testing JavaScript, Flash is not cool.

The problem with this AB testing JavaScript flicker is that it won’t be there if a change is made permanent on the site. Our tests may say the new change generated more revenue, and we’ll change the website to make that change manifest. But there will be no flicker. This means there is another variable in our test.
Was it the new headline we tested or was it the flicker that made more people buy?
[sitepromo]

## The Human Eye is Drawn to Motion

Our eyes and brains have evolved to make motion important. When something moves, our eyes and brains work hard to figure out if the thing moving is edible, if it will eat us, or if we can mate with it. Technology has moved faster than evolution. Even though we’re looking at a website, where there is no immediate source of food, fear or fornication, our lizard brains continue to favor motion.
Imagine this scenario. We are testing the text on a button. That means that for some of our visitors, the button will change. Others will see the default button text. Then we’ll see who buys the most.

The changing element on this page draws your eye, and can make the page impossible to focus on.

If there is a flicker, flash or blink when the text changes, the button will be immediately more visible to those visitors who see it. More of them will see the call to action. This may make more of them consider buying than those who simply scrolled past. If this new treatment generates more revenue, we are left with the question, “Was it the text or was it the motion that created the lift in sales?”
We won’t know until we push the new text onto the server and do a post-rollout analysis to see the real lift. At this point, we may find that the text didn’t increase purchases. It’s a real bitch.
How many buttons have been change because of flicker?

## AB Testing Software Tries to Eliminate Blinking

The AB testing software that we use works to eliminate this blinking and flashing. The issue is so important that Convert Insights has patented their method for eliminating “blink”.
AB testing software like Optimizely, Visual Website Optimizer, Adobe Test, and Marketizator, load asynchronously, meaning that they load our changes as the page loads. This makes it easier for changes to be made before the visitor sees the page.

### How Convert Insights Eliminates “Blink”

“The first thing this snippet does is hide the body element for max. 1.2 seconds (this can be set), this prevents blinking of elements already visual on the site that load under 1.2 seconds (we have yet to get a client that loads faster than this). During the 1.2 seconds, the SmartInsert(R) technology search and replaces DOM-elements every couple of milliseconds and loops through the entire available DOM-elements in the browser of the client. After all elements are replaced the body hidden attribute is set to display the site again either at 1.2 seconds or when the signal is given that all elements have been replaced (or DOM-ready).
Everybody can see how this technology works by loading our Chrome Extension.”
— Dennis van der Heijden, Convert.com

## Eliminating Flash Flicker and Blink in AB Tests

In addition to this, our developers can do things that reduce and eliminate flicker and blink. Every test you do has different issues, and so a variety of tactics can be used to address them.

Don’t use a Tag Manager like Google Tag Manager to serve your AB testing JavaScript software tags. Add them to the page manually. Tag managers can counteract the asynchronous loading of the tool and delay when changes can be made.

### Make Changes with CSS

If the change can be made with the cascading style sheets (CSS), we favor making changes with CSS over using JavaScript. CSS can make changes to an element – like an image or font – that can be applied before an element displays.

### Modal Dialogs and Overlays

Modal dialogs usually don’t display until the visitor takes an action. They can be edited before they are shown without flashing.

### Use a Timer for DOM Changes

All of the images, headings, copy, widgets, and forms are stored in the browser in a big database called the DOM (Document Object Model). When JavaScript makes changes to the DOM, the content on the page changes. The DOM is slow-loading, as you can imagine.
Our developers will set a timer in JavaScript to check for when a changing element is loaded. By watching for our element to be loaded, we can execute a change it before the DOM – and the page – is loaded.

### For the AB Testing Software to Load Our Changes Immediately

The AB testing software provides a synchronous loading mode. Optimizely and VWO use different approaches for this.

### Rethink the Test

Sometimes, we have to go back to the drawing board and design a test that doesn’t cause flash-inducing changes. We will refactor a test to eliminate items that cause a flash or flicker.

### Delay Display of the Content

We can delay the display of the content until the DOM is loaded and all elements have been changed. This causes a different kind of issue, however. The visitor sees a blank page or blank section for longer if they get the page with the change.

### Insert Placeholders Using Fast CSS

When inserting elements, we’ll use CSS to insert a placeholder, then use JavaScript to modify the element after the page has loaded. This reduces redrawing of the page when elements are inserted.

We created a blank box in CSS to minimize AB Testing Flash on this mobile website.

## Optimizing for Mobile

Mobile pages load more slowly. This is because visitors don’t always have a high-quality connection when visiting from the festival, or from inside the walls of a bank while standing in line. For mobile devices, flash can be an even bigger issues.
Fortunately, the tactics we use on the desktop also work on mobile websites. But don’t forget to QA your test treatments on 3G and 4G connections. You may find flicker and blink on mobile devices that didn’t appear on your desktop.

## Great JavaScript Developers are a Must

We spend a considerable amount of our time making sure our JavaScript and CSS changes are like the “native” implementation would look.  It’s one of the things that makes testing hard. Our team has the experience to ensure your tests aren’t falling victim to flicker, flash, blink or ghosting.
If you’d like our team to take over the effort of developing your AB tests, contact us for a free consultation and an overview of our process.
[signature]

## 4 Types of Useful AB Testing Tools You May Not Realize You Have

Talking about AB testing tools to the readers of this blog may be like preaching to the choir. But if you are new to this blog, or new to conversion optimization in general, you may be wondering which AB testing tools you can start using without making a huge investment. Fortunately, there are some AB testing tools out there that are either free or won’t cost you any additional money because you already have them – you just don’t know it yet.
In this post, we’re going to look at AB testing tools that you may have had all along and how to use them to optimize different aspects of your marketing strategy for conversions.

## Website AB Testing Tools

Since most people will want to do AB testing on their website, we’ll start with the tools you can use here. Did you know that if you have fewer than 50,000 unique visitors per month, you can use tools like Optimizely for simple AB testing for free? It’s a really simple tool to use. You just sign up for your free account and start up a website project.

Create a project with Optimizely for free when you have fewer than 50,000 visitors a month

Once you’ve entered your URL, you will be taken to a screen where you can immediately start creating a variation to test on your website.

Create a variation for your AB test

Once you’ve created your variation, you click the Start Experiment button and get the code you need to add to your site.

You will set up a goal so you know which variation leads to the most conversions.

Create a goal for your AB testing experiment

And then sit back and wait for visitors to come to your website to determine which variation gets the most conversions!
If you’re stuck for ideas on what to test on your landing page, you can try the common elements – headlines, subheads, images, calls to action, etc. – as well as some creative options listed in our landing page testing ideas post.
If you have more than 50,000 visitors each month, or would prefer to not add another tool to your toolkit, you can also look into Google Analytics Content Experiments. This allows you to conduct testing with your Google Analytics.
To start, you go to Behavior > Experiments for your website and click the Create Experiment button. Then you define the experiment you want to perform, starting with the goal of your test. You will use your Google Analytics goals to ultimately determine which variation of your AB test is the winner.

Create an experiment in Google Analytics

The key difference between Google Content Experiments and your average AB testing tool is that you have to create an additional page on your website that has the variation, whereas most AB testing tools (like Optimizely) will let you “edit” your page in their editor. So depending on what you want to change, it may be an easy or difficult process to create that second page.

Setting up variants in Google Analytics may require more steps than using a traditional AB testing tool like Optimizely.

Next, you will receive the code you need to insert on your website to start your experiment.

Finally, you will confirm that the code has been installed and you will start your experiment. Once your experiment is completed, Google Analytics will declare a definitive winner.

### Alternatives to AB Testing Your Own Website

An alternative to doing AB testing on your own website is to monitor the tests of others. There’s a free way to do it and a paid way. First, you can try to find your competitor’s website history in the Internet Archives. The downside to the free is that you’ll have a lot of clicking to do.

The Internet Archive brings you the tool Wayback Machine which lets you see how a website appeared on a particular date.

The other option I mentioned in my landing page testing ideas post, Rival IQ, allows you to see your competitor’s website history in a much easier to digest format.

RivalIQ is a paid tool for viewing website histories.

There’s a good chance that if you look through your competitors designs over the last couple of years, you’ll see subtle changes to headlines, images, colors, etc. that will relate to some AB testing. So instead of testing on your own, you can learn from their tests and pave your own unique way from there.

## Email Marketing AB Testing

If you are running email marketing campaigns, chances are you are using a popular email marketing software platform that likely has an AB testing component built in. MailChimp, for example, allows you to select an AB testing campaign and then allows you to test four aspects of your email campaign: the subject line (highly recommended), the from name, the content, and the send time.

AB testing options in MailChimp

You can choose a certain percentage of your recipients to test with and you can choose click rate, open rate, revenue, or other goals to judge the results of your testing. For example, if you chose to test subject lines, you would simply enter two subject lines for your recipients instead of one.

AB testing email subject lines in MailChimp

Or, if you were going to test two different types of newsletter content (such as a text-only email versus an HTML newsletter), you would get two email templates to send to your recipients.

Most email marketing software offers AB testing. At the bare minimum, you can at least test your subject lines. Some go further with the from name testing, email content testing, send time testing, and other forms of AB testing.
But considering that your subject line is the make or break point of whether someone opens your emails, it’s safe to say that so long as you have the option of testing that, you are good!

### Alternatives to AB Testing Your Own Email

There is a simple and free way to monitor your competition’s email and potentially see what headlines are working for them – just sign up for their emails. And be sure to at least open them. If you just ignore them, some will automate you out of their main line of emails. And that might mean you’ll miss out on some good subject lines!
Bonus tip: if your competition is using email marketing software like Infusionsoft, ActiveCampaign, or others that allow automations, you should open the emails and click on the links on occasion. You may get to see one of their automation funnels in action too!

## Blog Content AB Testing

Similar to email AB testing, blog content performance can rely heavily on one specific element: the title. If you choose a great blog title, people will click through and read your post from your blog’s homepage, search engines, social networks, and other sources. If you choose a bad blog title, then you may not get an clicks or readers at all.
That’s why AB testing your blog post titles can be a crucial key to the success of your content marketing strategy. If you have WordPress, Nelio A/B Testing is a tool you can use to do just that.
While it’s not free, it starts at \$29 a month for websites with 5,000 views per month. And it will allow you to test crucial elements of your blog, beyond just the headlines of your blog posts.

You can use Nelio A/B Testing to test WordPress blog content

For serious publishers, WordPress website owners, and WooCommerce website site owners, this can be a powerful AB testing tool that can help you test a variety of things that other testing tools simply can’t.

Going back specifically to blog headlines, if you don’t want to test your own, there are ways of finding out the best headlines for a specific topic. The free way would be to use BuzzSumo – even without an account, you can usually get the top five to ten headlines about a specific topic based on social sharing.

Find the top headlines for a topic using BuzzSumo

If you don’t mind paying, a similarly priced tool that offers even more information that you can try or compare to BuzzSumo is Impactana. Both start at \$99 per month, but Impactana goes a step further by allowing you to see headlines that are not popular based on social shares alone, but also based on views, backlinks, comments, and other metrics (based on the type of content).

Impactana uses more metrics than BuzzSumo to show you the top headlines for topics

This can give you a strong idea of what headlines and content generate the most social buzz, search authority, traffic, and audience engagement.
[sitepromo]

## Social Media Ad Campaign AB Testing

While social media advertising is not free, AB testing for some social media ad platforms is because it’s built right in. Take Facebook, for example. You can create an Ad Campaign, an Ad Set that is targeted to a specific audience through specific placements, and multiple Ads under that set that help you test variations so you can determine which one drives the most conversions.

Next, you will define your Ad Set by choosing your target audience, ad placements (the desktop news feed, the mobile news feed, Instagram, etc.), and setting your budget.

Before you continue, you can save the name of your Ad Set.

Once you’re finished with your first ad creative, you will place your order. Once you do, that ad will go into review and you will get the option to create a similar ad.

This will allow you to create another Ad under the same Ad Campaign and Ad Set. You will get the option to modify the Ad Set if needed.

Otherwise, you can click Continue to create your next Ad variation. This will bring up the same Ad you created before so you can create your variation by changing one specific element, such as the image, originating page, the headline, the text, the call to action, the news feed description, or the display link.

The downside, as you can see above, is that you can’t name the individual ad variations. Therefore, unless you’ve changed the images between them, they all look the same in the Ad Manager view. Hence, to know which variation in terms of originating page, the headline, the text, the call to action, the news feed description, or the display link is working, you will have to click through to the winning variation and view the post to learn from it.

It’s easy to toggle off an ad if it’s not working out

The upside, however, is you can easily toggle off the losing variation of your ad based on its performance.
But overall, this is a great way to use AB testing in your Facebook Ad Campaigns. And it’s the simplest way as it doesn’t require you to use Power Editor, although if you are more comfortable in Power Editor, it can be done there as well.

Start by giving your campaign a name.

When you click to create another ad, you will be able to create an entirely new ad from scratch to test different URLs, headlines, descriptions, and images.

Once you have finished creating your campaign, you will get a clear view in your ads dashboard of how each of your ad variations are performing. This will allow you to learn what works and what doesn’t quickly, as well as allow you to toggle the losing variations off.

There are two free alternatives when researching paid advertisements. The first is Moat. Moat allows you to look at other companies display banner ads. While this isn’t specific to social media ads, it can help you learn about the images and ad copy that big brands use to drive paid traffic to their websites.

Use Moat to discover what your competitors are doing with their ad testing

If you notice particular imagery, copy, calls to action, button colors, or other elements have been used over and over again, you can assume that said elements have been doing well, considering you can almost guarantee big brands are testing the elements that they are paying for.

Between these two resources, you should learn a lot about how to create a successful ad campaign on social media and beyond. And they’re both better options than sitting around and refreshing your Facebook or LinkedIn news feed, hoping to see some ads from your competitors.

## In Conclusion

As you can see, between free and premium tools, there are various ways to A/B test many aspects of your online marketing beyond just your landing pages. Be sure to look at the different aspects of your online marketing strategy and think about the ways you should be testing it to improve your results today!

Kristi Hines is a freelance writer, blogger, and social media enthusiast. You can follow her latest tweets about business and marketing @kikolani or posts on Facebook @kristihinespage to stay informed.
[sitepromo]

## The AB Testing Process that Empowers Marketers

At Conversion Sciences, conversion optimization is so important that we think every site should benefit from it. We take every chance to teach businesses about it. The AB testing process is an important part of conversion optimization and is within reach of almost any business that prizes data-driven marketing.
Conversion rate optimization (CRO) is a systematic process for increasing the rate at which website visitors “convert” into leads and customers.
When visitors arrive at your site, you want them to take a certain action. You might want them to subscribe to your mailing list, purchase a product, call your company, make a donation, fill out your contact form, or any number of things. CRO seeks to maximize the percentage of visitors who perform this desired action.
And as traffic becomes more and more expensive to acquire, CRO continues to become a bigger and bigger deal for online businesses.

## When to Use Conversion Optimization

Conversion optimization has become an important part of digital marketing because available tools are becoming easy to use and inexpensive. Businesses generate more sales from the same traffic.
Businesses will turn to conversion optimization when:

• Their organic search traffic isn’t growing fast enough.
• They aren’t getting enough revenue from their email list.
• They want to compete with bigger companies on the Web.

Conversion optimization gives the business more control over its own destiny, increasing revenue and delighting more customers. [pullquote]AB testing is a powerful tool in the conversion optimization game.[/pullquote]

## Understanding Conversion Optimization

At a high level, a website’s basic revenue model looks like this:
Traffic x Conversion Rate = Revenue
Let’s say you are getting 100,000 visitors each month and converting 3% of them into customers. In order to double your revenue, you can either (A) double traffic by getting 100,000 extra visitors each month, or (B) increase your conversion rate from 3% to 6%.
As you can imagine, it is usually much cheaper to fix a few things on your site and increase the conversion rate than to increase traffic by 100,000 people. And this is based on a simple three variable formula.
In reality, many websites and online enterprises consist of numerous steps in a complex chain of conversion funnels.

Conversion funnels can be complex, and it’s easy to lose someone along the way without CRO and AB testing. Image credit: Digital Marketer

Low conversion rates at any point in this lengthy funnel can gut revenue totals, and consequently, optimizing the conversion rate even slightly throughout the multiple stages of this funnel can result in a massive increase in overall revenue.

## The AB Testing Process

The best data-driven marketers take a systematic approach to optimize a website’s overall conversion rate. And while that approach is fairly complex, the core process includes the following:

1. Data gathering & analysis
2. Hypothesizing & Prioritizing
3. Design & Run AB Tests
4. Interpretation & implementation

To summarize, you begin by gathering intelligence on the your target audience. Next, you predict a series of website changes that will improve the overall conversion rate and then test those changes with a live audience. You run tests to confirm or refute your predicions. Finally, you implement changes that improve the conversion rate and discard those that don’t.

### 1. Data gathering & Analysis

The CRO process begins with gathering and analyzing both quantitative and qualitative data in order to achieve a well-rounded understanding of the website’s target market and how they are engaging with the site.
Data software, surveys, and usability tests are often used to collect and analyze this data.

### 2. Hypothesizing & Prioritizing

Once data has been collected, it’s time to hypothesize a series of site changes that will potentially increase conversions. Each idea for increasing conversion rate is called a hypothesis, or “educated guess”. These predictions are usually based off the data collected, “best practices”, and the personal experience of the data-driven marketer. The hypotheses you focus on will be based on your core testing strategy.
Changes are then made and compared against the original site in front of a live audience using a AB testing.
Each hypothesis should have the form:

If we    describe change   , we expect more visitors to    describe desired outcome
as measured by   metric   .

For example:

If we add “Free shipping on all orders” to our product pages, we expect more visitors to purchase as measured by revenue per visit.
If we include the phone number in our headline, we expect more visitors to call as measured by web-based phone call rate.

Taking the time to write out each your hypotheses ensures:

1. That you are testing something very specific.
2. That you are testing something that results in a desired outcome.
3. That you can measure the results.

Conversion Sciences enters our hypotheses into a spreadsheet and rates each on a scale of 1 to 5 for four categories:

1. Based on my experience, how big of an impact do I expect this hypothesis to have? (1-5 with 5 being a big impact)
2. How much traffic sees the page on which this hypothesis applies? (1-5 with 5 being a lot)
3. How much evidence do I have that this hypothesis is really a problem? (1-5 with 5 being most)
4. How hard is the test to implement? (1-5 with 1 being best)

Add 1, 2 and 3, then subtract 4 to get your hyopthesis weight. Do this for each test to get a ranking and sort the spreadsheet by weight. Those hypotheses with the highest weighting will jump to the top. These are your “low-hanging fruit”, the first things you should test.

### 3. Designing & Running AB Tests

This is where the new tools come into play. AB testing tools offer ways to change your website for some visitors, while leaving it the same for others. The tools allow you to do this without changing your website because the changes are made in the visitors’ browsers.
One visitor will get a page (A) as it is, then the next will get a version of the page (B) with your change. The tools manage this so that about the same number of visitors see each page. These AB testing tools then report on which version generated the most revenue and tell you how much more revenue you would expect to get.
If the original generates more revenue — we call it the Control — you can be assured your change would have hurt your site. If the modified version generates more revenue — we call it a Treatment — then you’ve found an improvement and can code it into the site.
AB Testing tools have a learning-curve. Most offer “WYSIWYG” interfaces for changing elements. Some tests will require that you have a resource familiar with Javascript, HTML and CSS.

### 4. Interpretation & Implementation

After running a series of AB tests, results are analyzed and interpreted and additional tests may be run. The goal is to identify a slate of changes that yield statistically significant improvements in the site’s overall conversion rate.
Verified improvements are implemented as permanent changes to the website, and then new hypotheses may be made and tested until the target conversion rate is achieved.

## What You Can Do With an AB Test

When evaluating page elements to test and improve, CRO specialists typically start with “best practices”. Best practices are techniques that tend to work for many websites. Testing page elements based on best practices will often improve the site’s conversion rate immediately, and they provide a good baseline from which the data-driven marketer can plan and implement more tailored tests.
It’s important to note here that “best practices” do not work for every site. In fact, here’s an entire blog post worth of case studies where doing the exact opposite resulted in massive wins for various businesses. Also, mobile optimization is so new that there really aren’t any best practices. This is why AB testing is so important.
That said, it’s good to have a basic understanding of best practices when attempting to optimize conversions on a website.

### 1. Develop an Effective Value Proposition

Your website must convey a value proposition that gives the visitor a reason to stay and explore. The value proposition is constructed out of copy and images.
The value proposition doesn’t have to be unique, but it must describe the reason you occupy space on the Web. It should include who your offering is targeted at and why they should care about it.
Your value proposition may also include pricing, delivery, return policy, and what make you unique.
Each of your visitors will come in one of four modes: Competitive, Methodical, Humanist, or Spontaneous.

1. COMPETITIVE visitors are looking for information that will make them better, smarter or more cutting-edge. Use benefit statements and payoffs in your headings to draw them into your content.
2. METHODICALS like data and details. Include specifics and proof in your writing to connect with them.
3. HUMANISTS want information that supports their relationships. They will relate to your writing if you share the human element in your topic.
4. SPONTANEOUS visitors are the least patient. They need to know what’s in it for them and may not read your entire story. Provide short headings for them to scan so that they can get to the points that are important to them.

Your goal is to write copy directed at whichever of these groups visit your site.
In addition to understanding what motivates your audience to buy, it’s also important to understand what stands in the way of them making that decision.

Transactional buyers are competitive bargain hunters whose greatest fear is paying a dollar more than they need to. They aren’t looking for “cheap”. They are looking for the greatest possible value they can find for the lowest possible price.
In order to appeal to transactional buyers, your copy should be focused features, price, and savings.
Relational buyers, on the other hand, are focused entirely on quality. Their greatest fear is buying the wrong thing, and they are more than happy to seek out expert help and pay a premium in order to assure they receive a quality product.
In order to appeal to relational buyers, your copy should be benefits focused with educational content, copious social proof, ratings, and reviews to demonstrate that selecting your product is a guaranteed win.
Things to test:

• The language on links and buttons.
• Wording of discounts and specials.
• Description of return policy and shipping policy.
• Adding bullets and highlights to copy.

Remember that nobody cares about your business or products until they’ve found what they are looking for. People only care about how your business can solve THEIR problems. Remember to keep the copy and messaging consistently focused on the customer on every platform and at every point of interaction. Personal stories and intros have their place and can be quite effective, but again, only when the context is customer benefit.
For further reading, check out these great value proposition examples and the case studies on their implementation.

Design is important to your conversion rate but not for the reasons most people think. Your site’s design should be focused two things:

2. Helping the visitor choose to act or to take the next step in their journey

Visitors should have an idea of what your landing page is all about with five seconds of arriving. They should then be taken through a streamlined journey rather than needing to browse around and find their own way.
A simpler, more intuitive, and more straightforward site design a great place to start.
Things to test:

• Making links and buttons with calls-to-action stand out.
• Move important information, such as free shipping offers “above the fold.”
• Swapping columns.
• Increasing the size of images.
• Increasing the font-size of important information like price and stock.

These are some places to start.

### 3. Focus on Entry Pages

A good conversion funnel isn’t just a webpage. It’s a combination of ads, search results, blog posts, email marketing, social media, webinars, and much more. Each of your campaigns will bring visitors to your site on different pages. Start on these pages and find look for ways to help visitors choose the next step.

For ecommerce sites, product pages are often the entry page for search traffic.

For many sites, the most common entry pages will be:

• Category and Product Pages for ecommerce sites
• Post pages for blogs
• Signup pages for webinars, reports and other offers

The Behavior > Site Content > Landing Pages report in Google Analytics will tell you which pages are your most frequently visited entry pages.

Google Analytics offers a list of your entry pages, which they call “Landing Pages”

If you can get more visitors into your site from these entry pages, reducing your bounce rate, you will have more opportunities to win prospects and customers.

## Conclusion

[pullquote]AB testing is a tool that is within reach of more and more marketers.[/pullquote] It is powerful because it

• Helps you understand what your visitors are really looking for.
• Disciplines you to make smaller stepwise changes to your site.

To continue your journey into the world of CRO, check out Conversion Sciences Free CRO Training.
If you have any questions or if you noticed I left out some key info, don’t hesitate to let me know in the comments. And of course, don’t forget to share this post with any of your colleagues who could benefit from an introduction to AB testing.

Jacob McMillen combines professional copywriting with clean web design, giving small-to-midsize businesses the high-converting websites they need to make meaningful online profits.

## 4 Mobile AB Testing Ideas that Worked for Our Clients

Joel Harvey, Conversion Scientist™

This post is excerpted from Designing from the Mobile Web 2.0 by Joel Harvey. Joel discusses what he’s been learning through the tests that Conversion Sciences has done for the mobile web. You can begin to take these as new hypotheses in your business so that your mobile devices and your mobile traffic is converting at higher and higher levels.
I’m sure mobile’s on the top of your mind.
We have tested hundreds of mobile hypotheses over the last couple of years and we’ve learned a lot. There’s still a lot that we don’t know.
We’re going to share some of the key things that we’ve learned along the way with you and show you what you need to be focusing on to start driving wins in your mobile traffic.
[mobile-web-20]

## What Mobile AB Tests Can You Try Right Now?

What are some of the things you can AB test right now to start driving higher mobile conversions?
Here are the hypotheses we’ve tested and things that we’ve seen give us increases across dozens of mobile websites. Some of them are very site specific like the offer link and the copy. This is one thing that yields big results in any business and there’s no one rule of thumb that we can share.
However, there are some things you can do with layout and with execution that, with the right words and the right elements, work across almost many mobile sites.

2. Persistent Calls to Action
3. Footer Content
4. Optimizing for the Right Goals

### 1. Sticky Headers and Footers

For anybody that doesn’t know, what we mean by a “sticky” header or footer, we mean a “bar” of content that persists at the top of the screen. It sticks even when the visitor scrolls.
By locking this on the screen as people scroll we always keep in their view. Just making the existing header sticky, we saw a 9% increase in phone calls for this site. We’ve seen this on ecommerce sites, increasing form fill completion and purchases.
For a follow-up test, we simply added a call to action. The text delivered a relatively large increase in phone calls on top of the increase from adding the sticky header.
There are a number of things you can put into a header.

• Calls to action
• Search boxes
• The company logo
• Shipping policy

Over time this has continued to evolve and conversions have increased.
In our experience, sticky footers, or bars that persist across the bottom of the screen, may work better for your audience. We recommend that you try them both.

### 2. Persistent Calls to Action

Persistent calls to action or parachutes are offers that remain on the screen as the visitor scrolls. These are usually found as the footer.
It’s very similar to a sticky header.

The site at left enjoyed a 20% increase in registrations from a sticky header. The site at right saw a 45% from top and bottom stickies.

Since we found that these parachutes work well as top “stickies” and bottom “stickies, we wondered if we could do both. Our original thought was that it took up too much screen real estate, that it was too much to keep in front of the visitor. However, we found that in most cases you can have both and they’re very complimentary to one another.
We recommend testing to find the right call to action first, and then testing a persistent, or “sticky” call to action.
So why do we call it a parachute?
We call it a parachute because we know that mobile visitors will scroll much farther down a page and much faster than desktop visitors. This is an interesting fact: Mobile visitors are more likely to see all of the content on your page than a desktop user, especially if you have a lot of content and it’s a long page. You can see this in your own scroll maps and in session recordings.
The problem is that they do it fast and they sometimes get lost. Having a parachute someplace for the them to parachute out of this purgatory they’ve gotten themselves into is proving to be very helpful.

The Safari mobile browser doesn’t necessarily do a fantastic job of identifying phone numbers and turning them into click-to-call, or tel-links.
Anywhere where you have a phone number on your site, make sure you’ve explicitly written the tel-link around that phone number.
In the example above, we found this site didn’t have click-to-call. We tested adding the tel-link functionality explicitly, and we saw a 20% increase just in clicks to call. It makes sense. You’re on your mobile device. You don’t have to put it down, write the number down, and then type it in.
There are phones built into these mobile devices, aren’t there?

### 3. Footer Content

This issue dovetails in with what we were just saying about how far a mobile visitor will scroll.
Mobile visitors are much more likely to see the footer of your site. They bounce on the bottom after fast scrolling. On the desktop the footer of the site is this graveyard of crap that we throw into the bottom. It’s the detritus of the site. It usually includes the privacy policy and some legalese, things that are not really going to compel anybody to take action.
Yet, the visitors that scroll to the bottom of your page may be very interested. What are you telling them? “Copyright 2015?” This is not really a deal closer.
We saw on our scroll map reports that about 50% of mobile visitors reach the bottom of these pages. Page footer are rarely inspiring.

Why not do something a little bit more compelling? We changed this to reiterate the value proposition. Why should someone call, and how we can this company help?
We saw about an 8% increase in calls with this very simple change.

#### The Footer on Mobile is not a Graveyard

We’ll take 8% lifts all day and night.
Mobile will make up between 40% and 60% of almost anybody’s traffic at this point in time. We see some outliers, but more or less that’s the range. So let’s just say it’s 50%.
An 8% increase on mobile is a 4% increase on your entire business. There’s not many things that you can do to magically get a 4% increase for your entire business. This is an example of one of those things you can do.
It’s the beauty of conversion optimization. It’s a huge lever to grow your business. Something as simple as making an offer in the footer to reiterate your value proposition can have a meaningful impact on your business.
Get people to take action once they get to the bottom of that page.
[mobile-web-20]

### 4. Optimize for the Right Goals

What mobile goals are you optimizing for? Are they the right ones?
Conversion for your mobile visitors should look different than conversion for your desktop visitors. One of the ways to really leverage the growth in mobile is to understand and accept that.
One of our clients, eSigns.com, allows visitors to design vinyl banners and other signs for their event, yard, etc. It’s a phenomenal tool and a great site. Unfortunately, it is Flash-based tool, and it doesn’t work on iOS or Android devices. Even if it did work, visitors are less likely to design a banner or a sign on their small-screen mobile device.
Essentially we had this mobile dead end. Mobile traffic simply could not convert. eSigns had become resigned to ignoring their mobile traffic.
But we said, “What could we do with this traffic?” We chose to focus on getting email addresses instead.
We shifted our objective.

This mobile entrance overlay enticed mobile visitors to give us their email address.

Instead of saying, “We need to get more conversions and more immediate revenue out of mobile visitors,” we said, “Let’s just focus on getting your email address”. We could tell from upstream indicators that these people are really just kicking tires.
Remember, we’re addressing a consideration set. We decided to do something to remain in their consideration set. Ask them to give us their name and email address. In return, we offered to email them a link to this page so they can come back to it later, on their desktop, when they are ready to do the design.
The results are that 5.3% of the visitors gave us their name and email address. Not bad, especially for an unoptimized experience. Each person who completed this form received an email with a link to that page, and the link was specially coded so we could track it and track those visitors.
Of those who got our email, we found that 26% of the recipients of this email clicked back to the site. Now this didn’t really result in an immediate boost in revenue from those clicks, but it lit a fire under their email list growth rate. They gathered almost 1,400 new email addresses in the first month of this test.
This is a business like many of your businesses that ultimately live and die by their email list. Building your email list gives you control over your own destiny. The marginal cost of delivery is almost zero. Not just for every single recipient of your email, but for every email. It’s so cheap to create new customers and satisfy repeat customers.
To get an idea of just how valuable these new email addresses were, we calculated the value of every recipient of eSign’s emails. We’re fond of a metric whenever we’re evaluating an email called revenue per recipient (RPR).

Collecting emails from the mobile deadend delivered an estimated \$200,000 in additional annual revenue.

For eSigns, we calculated that the average annual revenue per recipient to be \$11. With an average of 1,400 new email addresses per month, we’re generating almost \$200,000 in additional annual income. Instead of ignoring mobile traffic, we took a longer-term approach by getting their email address, and using email to convert them.
Take a hard look at what your mobile visitors will be willing to do to begin a conversation with you.

## Do This Now

These are our main premises. We hope you agree.

• That mobile is a one of the fastest growing channels for your online business.
• That mobile doesn’t have to convert at levels lower than desktop.
• That mobile visitors want and need a unique mobile experience.
• That there are no best practices and that you should test.

If you buy into our beliefs, these are the things you should try right now on your mobile traffic.

• Start testing those persistent calls to action – those parachutes. Test placement, test color, test language, and test the page you’re sending them to.
• Take a hard look at the footer content on your mobile device right now and ask yourself, “If this is the last thing that someone sees on this page, am I compelling them to take action?” If the answer is no, the next question is, “What could we do here to compel them to take action?”
• Ask yourself, “Are we optimizing for the right goals on mobile? Do we really believe that doing X and Y is what we want people to do on our desktop site. Is the best use of our optimization efforts on mobile?” Then test some alternative goals.

Once you’ve nailed your mobile experience, you can start having fun with your mobile site, like this add-to-cart animation.

You can have fun when you’ve delivered the mobile experience your visitors want.

## AB Testing Results are Half-Filled with Losers, and That’s a Good Thing

The law of unintended consequences states that every human endeavor will generate some result that was not, nor could have been foreseen. The law applies to hypothesis testing as well.

In fact, Brian Cugelman introduced me to an entire spectrum of outcomes that is helpful when evaluating AB testing results. Brian was talking about unleashing chemicals in the brain, and I’m applying his model to AB testing results. See my complete notes on his Conversion XL Live presentation below.

## Understanding the AB Testing Results Map

In any test we conduct, we are trying everything we can to drive to a desired outcome. Unfortunately, we don’t always achieve the outcomes we want or intend. For any test, our results lie on one of two spectrums defining four general quadrants.

Map of possible outcomes from hypotheses.

On one axis we ask, “Was the outcome as we intended, or was there unintended result?” On the other axis we ask, “Was it a negative or positive outcome?”

While most of our testing seeks to achieve the quadrant defined by positive, intended outcomes, each of these quadrants gives us an opportunity to move our conversion optimization program a step forward.

## I. Pop the Champaign, We’ve Got a New Control

With every test, we seek to “beat” the existing control, the page or experience that is currently performing the best of all treatments we’ve tried. When our intended outcome is a positive outcome, everyone is all smiles and dancing. It’s the most fun we have in this job.

In general, we want our test outcomes to fall into this quadrant (quadrant I), but not exclusively. There is much to be learned from the other three quadrants.

## II. Testing to Lose

Under what circumstances would we actually run an AB test intending to see a negative outcome? That is the question of Quadrant II. A great example of this is adding “Captcha” to a form.

CAPTCHA is an acronym for “Completely Automated Public Turing test to tell Computers and Humans Apart”. We believe it should be called, “Get Our Prospects to Do Our Spam Management For Us”, or “GOPDOSMFU”. Businesses don’t like to get spam. It clogs their prospect inboxes, wastes the time of sales people and clouds their analytics.

However, we don’t believe that the answer is to make our potential customers take an IQ test before submitting a form.

These tools inevitably reduce form completion rates, and not just for spam bots.

CAPTCHAs reduce spam, but at what cost?

So, if a business wants to add Captcha to a form, we recommend understanding the hidden costs of doing so. We’ll design a test with and without the Captcha, fully expecting a negative outcome. The goal is to understand how big the negative impact is. Usually, it’s too big.

In other situations, a design feature that is brand oriented may be proposed. Often a design decision that enhances the company brand will have a negative impact on conversion and revenue. Nonetheless, we will test to see how big the negative impact is. If it’s reasonable, then the loss of revenue is seen as a marketing expense. In other words, we expect the loss of short-term revenue to offset long term revenue from a stronger brand message.

These tests are like insurance policies. We do them to understand the cost of decisions that fall outside of our narrow focus on website results. The question is not, “Is the outcome negative?” The question is, “How negative is the outcome?”

## III. Losers Rule Statistically

Linus Pauling once said, “You can’t have good ideas without having lots of ideas.” What is implied in this statement is that most ideas are crap. Just because we call them test hypotheses doesn’t mean that they are any more valuable than rolls of the dice.

When we start a conversion optimization process, we generate a lot of ideas. Since we’re brilliant, experienced, and wear lab coats, we brag that only half of these ideas will be losers for any client. Fully half won’t increase the performance of the site, and many will make things worse.

Most of these fall into the quadrant of unintended negative outcomes. The control has won. Record what we learned and move on.

There is a lot to be learned from failed tests. Note that we call them “inconclusive” tests as this sounds better than “failed”.

If the losing treatment reduced conversion and revenue, then you learn something about what your visitors don’t like.

Just like a successful test, you must ask the question, “Why?”.

Why didn’t they like our new background video? Was it offensive? Did it load too slowly? Did it distract them from our message?

## IV. That Wasn’t Expected, But We’ll Take the Credit

Automatic.com was seeking a very specific outcome when we designed a new home page for them: more sales of the adapter and app that connects a smartphone to the car’s electronic brain. The redesign did achieve that goal. However, there was another unintended result. There was an increase in the number of people buying multiple adapters rather than just one.

We weren’t testing to increase average order value in this case. It happened nonetheless. We might have missed it if we didn’t instinctively calculate average order value when looking at the data. Other unintended consequences may be harder to find.

This outcome usually spawns new hypotheses. What was it about our new home page design that made more buyers decide to get an adapter for all of their cars? Did we discover a new segment, the segment of visitors that have more than one car?

These questions all beg for more research and quite possibly more testing.

## When Outcomes are Mixed

There is rarely one answer for any test we perform. Because we have to create statistically valid sample sizes, we throw together some very different groups of visitors. For example, we regularly see a difference in conversion rates between visitors using the Safari browser and those using Firefox. On mobile, we see different results when we look only at visitors coming on an Android than when we look at those using Apple’s iOS.

Android users liked this test but iPhone users really did not.

In short, you need to spend some time looking at your test results to ensure that you don’t have offsetting outcomes.

## The Motivational Chemistry and the Science of Persuasion

Here are my notes from Brian Cugelman’s presentation that inspired this approach to AB testing results. He deals a lot with the science of persuasion.

My favorite conclusions are:

“You will get more mileage from ANTICIPATION than from actual rewards.”

“Flattery will get you everywhere.”

I hope this infographic generates some dopamine for you, and your new found intelligence will produce seratonin during your next social engagement.

Motivational Chemistry infodoodle by Brian Cugelman at ConversionXL Live