The Proven AB Testing Framework Used By CRO Professionals
There is no shortage of AB testing tips, tricks, and references to statistical significance. Here is a proven AB testing framework that guides you to consistent, repeatable results.
How do conversion optimization professionals get consistent performance from their AB testing programs?
If you are looking for a proven framework you can use to approach any website and methodically derive revenue-boosting insights, then you will love today’s infographic.
This is the AB testing framework industry professionals use to increase revenue for multi-million dollar clients:
The Purpose of an AB Testing Framework
It’s easy to make mistakes when AB testing. Testing requires discipline, and discipline requires guiding processes that enforce some level of rigor.
This framework ensures that you, the marketer-experimenter, keep some key principles in mind as you explore your website for increased revenue, leads and subscriptions.
- Don’t base decisions on bad data.
- Create valid hypotheses.
- Design tests that will make a difference.
- Design tests that deliver good data.
- Interpret the test data accurately.
- Always ask, “Why?”
This is the framework CRO professionals use to stay on their best game.
1. Evaluate Existing Data
Here are the first two questions you need to ask when approaching a new site.
- What data is currently available?
- How reliable is this data?
In some cases, you will have a lot to work with in evaluating a new site. Your efforts will be primarily focused on going through existing data and pulling out actionable insights for your test hypotheses.
In other cases, you might not have much to work with or the existing may be inaccurate, so you’ll need to spend some time setting up new tools for targeted data collection.
The data audit identifies data that is available to the data scientist. It typically includes:
- Behavioral analytics package
- Existing customer data, such as sales
- Marketing studies completed
- UX Studies completed
- Product Reviews
- Live Chat Transcripts
- Customer surveys completed
All of these data sources are helpful in developing a rich list of hypotheses for testing.
Since our analytics database is the all-important central clearinghouse for our website, we want to be sure that it is recording everything we need and accurately.
Often, we forget to track some very important things.
- Popover windows are invisible to most analytics packages without some special code.
- Links away from the site are not tracked. It’s important to know where your leaks are.
- Tabbed content lets the visitor get specifics about products and is often not tracked.
- Third party websites, such as shopping carts can break session tracking without special attention.
- Interactions with off-site content are often masked through the use of iframes.
These issues must be addressed in our audit.
It is important that as much data as possible is collected in our analytics database. We never know what questions we will have.
For post-test analysis (see below), we want to be sure our AB testing tool is writing information to the analytics database so we can recreate the test results there. This allows us to drill into the data and learn more about test subjects’ behaviors. This data is typically not available in our testing tools.
Finally, we want to be sure that the data we’re collecting is accurate. For example, if our site is an ecommerce site, we want to be sure the revenue reported by our testing tool and analytics database is right. We will do a correlation calculation of the revenue reported by analytics with the actual sales of our company.
The same kind of correlation can be done for lead generation and phone calls.
We can also use multiple sources of data to validate our digital laboratory. Does the data in analytics match that reported by our testing tool? Is the number of ad clicks reported by our advertising company the same has the number seen in analytics?
Once we have confidence in our setup, we can start collecting more data.
2. Collect Additional Quantitative & Qualitative Data
Once we understand the data already available to us, we’ll need to set up and calibrate tools that can acquire any additional data needed to run effective split tests. For our testing tool, we may choose to run an AA test.
There are two important types of data that give us insight into optimizing a site.
- Quantitative Data
- Qualitative Data
Quantitative data is generated from large sample sizes. Quantitative data tells us how large numbers of visitors and potential visitors behave. It’s generated from analytics databases (like Google Analytics), trials, and AB tests.
The primary goal of evaluating quantitative data is to find where the weak points are in our funnel. The data gives us objective specifics to research further.
There are a few different types of quantitative data we’ll want to collect and review:
- Backend analytics
- Transactional data
- User intelligence
Qualitative data is generated from individuals or small groups. It is collected through heuristic analysis, surveys, focus groups, phone or chat transcripts, and user reviews.
Qualitative data can uncover the feelings your users experience as they view a landing page and the motivations behind how they interact with your website.
Qualitative data is often self-reported data, and is thus suspect. Humans are good at making up rationalizations for how they behave in a situation. However, it is a great source of test hypotheses that can’t be discerned from quantitative behavioral data.
While quantitative data tells us what is happening in our funnel, qualitative data can tell us why visitors are behaving a certain way, giving us a better understanding of what we should test.
There are a number of tools we can use to obtain this information:
- Session recording
- Customer service transcripts
- Interviews with sales and customer service reps
- User testing, such as the 5 second test
3. Review All Website Baselines
The goal of our data collection and review process is to acquire key intelligence on each of our website “baselines”.
- Sitewide Performance
- Funnel Performance
- Technical Errors
- Customer Segments
- Channel Performance
Sitewide Performance is your overall website user experience. It includes general navigation and performance across devices and browsers.
Funnel Performance deals specifically with the chain of conversion events that turns visitors into leads and then customers. It will include landing pages, optin forms, autoresponders, cart checkouts, etc.
Technical Errors are the broken parts on your website or elsewhere in the user experience. These don’t need to be optimized. They need to be fixed.
Customer Segments deals with how different key customer segments are experiencing your site. It’s important to understand the differences in how long-time users, new visitors, small ticket buyers, and big ticket purchasers are engaging with your site.
Channel Performance deals with how various traffic acquisition channels are converting on your site. It’s important to understand the differences between how a Facebook driven view costing you $0.05 and an Adwords driven view costing $3.48 are converting when they reach your site.
4. Turn Data Into Optimization Hypotheses
Once you have a thorough, data-backed understanding of the target website, the next step is to design improvements that you hypothesize will outperform the current setup.
As you evaluate these changes for potential testing, run them through the following flowchart:
You’ll quickly build a list of potential changes to test, and then you’ll need to prioritize them based on your overall testing strategy.
5. Develop A Testing Strategy
AB testing is a time-consuming process that consumes limited resources. You can’t test everything, so where do you focus?
That will depend on your testing strategy.
Ultimately, you will need to develop a tailored strategy for the specific website you are working with and that website/business’ unique goals, but here a few options to choose from.
Flow vs. Completions
One of the first questions you’ll have to ask is where to start. There are two broad strategies here:
- Increase the flow of visits to conversion points (shopping cart, registration form, etc.)
- Increase the completions, the number of visitors who finish your conversion process by buying or registering.
If you find people falling out of the top of your funnel, you may want to optimize there to get more visitors flowing into your cart or registration page. This is a flow strategy.
For a catalog ecommerce site, flow testing may occur on category or product pages. Then tests in shopping cart and checkout process will move faster due to the higher traffic.
Gum Trampoline Strategy
Employ the gum trampoline approach when bounce rates are high, especially from new visitors. The bounce rate is the number of visitors who visit a site and leave after only a few seconds. Bouncers only see one page typically.
With this strategy, you focus testing on landing pages for specific channels.
This strategy is for sites that seem to be working against the visitor at every turn. We see this when visit lengths are low or people leave products in the cart at high rates.
For example, we might try to drive more visitors to the pricing page for an online product to see if that gets more of them to complete their purchase.
Big Rocks Strategy
This strategy is used for sites that have a long history of optimization and ample evidence that an important component is missing. Add fundamental components to the site in an effort to give visitors what they are looking for.
Examples of “big rocks” include ratings and reviews modules, faceted search features, recommendation engines, and live demos.
This strategy includes a full site redesign and might be viable if the business is either changing their backend platform or completely redoing branding for the entire company or the company’s core product.
The nuclear strategy is as destructive as it sounds and should be a last resort.
For additional strategies and a more in-depth look at this topic, check out 7 Conversion Optimization Strategies You Should Consider by Brian Massey.
6. Design Your AB Tests
Once our hypotheses are created and our goals are clearly defined, it’s time to actually the run the AB tests.
Having the right tools will make this process infinitely easier. If you aren’t quite sure what the “right tool” is for your business, check out this article:
The Most Recommended AB Testing Tools By Leading CRO Experts
But even with the right tools, designing an AB test requires a decent amount of work on the user’s end. Tests need to be designed correctly if you want to derive any meaningful insights from the results.
One piece of this that most people are familiar with is statistical significance. Unfortunately, very few people actually understand statistical significance at the level needed to set up split tests. If you suspect that might be you, check out AB Testing Statistics: An Intuitive Guide For Non-Mathematicians.
But there’s a lot more to designing a test than just statistical significance. A well-designed AB test will include the following elements:
- Duration-How long should the test run?
- Goal-What are we trying to increase?
- Percentage of traffic-What percentage of our traffic will see the test?
- Targeting-Who will be entered into the test?
- Treatment Design-The creative for the test treatments.
- Test Code-Moves things around on the page for each treatment.
- Approval-Internal approval of the test and approach.
Tests should be setup to run for a predetermined length of time that incorporates the full cycle of visitor behavior. A runtime of one calendar month is a good rule of thumb.
Test goals, targeting, a display percentages should all be accounted for.
Once the test is designed properly, it’s finally time to actually run it.
7. Run & Monitor Your AB Tests
Running an AB test isn’t as simple as clicking “Run” on your split testing software. There are two critical things that need to happen once the test begins displaying page variations to new visitors.
- Monitor initial data to make sure everything is running correctly
- Run quality assurance throughout the testing period
Once the test begins, it’s important to monitor conversion data throughout the funnel, watch for anomalies, and make sure nothing is setup incorrectly. You are running your tests on live traffic after all, and any mistake that isn’t quickly caught could result in massive revenue loss for the website being tested.
As the tests run, we want to monitor a number of things.
There are a number of things we need look at:
- Statistical significance
- Progression throughout the test
- Tendency for inflated testing results
- Quality of new leads
- Conversion rate vs. revenue
Statistical significance is the first thing we have to look at. A statistically insignificant lift is not a lift. It’s nothing.
But even if our results are significant, we still have to look at the progression of data throughout the testing process. Did the variant’s conversion rate stay consistently higher than the control? Or did it oscillate above and below the control?
If the data is still oscillating at the end of the test period, we might need to continue testing, even if our software is telling us the results are statistically significant.
It’s also important to understand that any lift experienced in testing will almost always be overstated. On average, if a change creates a 30% lift in testing, the actual lift is closer to 10%.
Finally, it’s helpful to run quality assurance throughout the test period, ensuring that split tests are displaying properly across various devices and browsers. Try to break the site again, like you did during the initial site audit, and make sure everything is working.
Once the tests have run through the predetermined ending point, it’s time to review the results.
8. Assess Test Results
Remember that an AB test is just a data collection activity. Now that we’ve collected some data, let’s put that information to work for us.
The first question that will be on our lips is, “Did any of our variations win?” We all love to win.
There are two possible outcomes when we examine the results of an AB test.
- The test was inconclusive. None of the alternatives beat the control. The null hypotheses was not disproven.
- One or more of the treatments beat the control in a statistically significant way.
In the case of an inconclusive test, we want to look at individual segments of traffic. How are specific segments of users engaging with the control versus the variant? Some of the most profitable insights can come from failed tests.
Segments to compare and contrast include:
- Return visitors vs. New visitors
- Chrome browsers vs. Safari browsers vs. Internet Explorer vs. …
- Organic traffic vs. paid traffic vs. referral traffic
- Email traffic vs. social media traffic
- Buyers of premium products vs. non-premium buyers
- Home page visitors vs. internal entrants
These segments will be different for each business, but provide insights that spawn new hypotheses, or even provide ways to personalize the experience.
In the case of a statistical increase in conversion rate, it’s very important to analyze the quality of new conversions. It’s easy to increase conversions, but are these new conversions buying as much as the ones who saw the control?
Ultimately, we want to answer the question, “Why?” Why did one variation win and what does it tell us about our visitors?
This is a collaborative process and speculative in nature. Asking why has two primary effects:
- It develops new hypotheses for testing
- It causes us to rearrange the hypothesis list based on new information
Our goal is to learn as we test, and asking “Why?” is the best way to cement our learnings.
9. Implement Results: Harvesting
This is the step in which we harvest our winning increases in conversion, and we want to get these changes rolled out onto the site as quickly as possible. The strategy for this is typically as follows:
- Document the changes to be made and give them to IT.
- IT will schedule the changes for a future sprint or release.
- Drive 100% of traffic to the winning variation using the AB testing tool. We call this a “routing test.”
- When the change is released to the site by IT, turn off the routing test.
It is not unusual for us to create a new routing test so that we can archive the results of the AB test for future reference.
As another consideration, beware of having too many routing tests running on your site. Conversion Sciences reports that some smaller businesses rely on the routing tests to modify their test, and have dozens of routing tests running. This can cause a myriad of problems.
In one case, a client made a change to the site header and forgot to include the code that enabled the AB testing tool. All routing tests were immediately turned off because the testing tool wasn’t integrated.
Conversion rates plummeted until the code was added to the site. In one sense, this is a validation of the testing process. Conversion Sciences dubbed it a “Light Switch” test.
This is the framework CRO professionals use to consistently generate conversion lifts for their clients using AB testing.
- The 20 Most Recommended AB Testing Tools By Leading CRO Experts (2023 Update) - June 13, 2021
- AB Testing Research: Do Your Conversion Homework - December 8, 2018
- 8 Elements of a High Converting Squeeze Page - June 21, 2018
I love it and do it myself. Thank you for the inspiration.
You are welcome.
Seems like it’s a fine checklist. Thanks for the guidance Jacob!
Of course! Thanks Bogdan
Nice one Jacob! Love how you laid it out for us.
You’re welcome, Cynthia.
I love it! It is one of the best articles I have read on AB testing, thanks for sharing practical steps. Jacob, conversion and AB testing is always a tough job to do so without proper setup. Once again thank you for the article.
Thanks for letting us know, Ankit.
Thanks for the kind words Ankit!
When you say that a statistically insignificant result is nothing, I’d have to disagree. A statistically insignificant result in a properly powered test can mean the affirmation of the null hypothesis (to the extent warranted by the power and the selected minimum effect of interest when planning the test). So, it can in fact be highly informative, as in “stop this losing test before we lose any more money”. That is, in a sequential setting, in a fixed-sample setting there should really be monitoring of statistical significance.
With this in mind, when you say you monitor the statistical significance of the tests, I hope you are not having that as a stopping rule of any kind? As in : http://blog.analytics-toolkit.com/2017/the-bane-of-ab-testing-reaching-statistical-significance/
Georgi, thanks for the helpful link. There are no absolutes in testing, so we are being a bit hyperbolic when we say it is “nothing.” There is so much additional information we can gain from a split test besides the one answer. We hope to entice testers to do additional analysis of tests and certainly do post-test analysis.