Adobe Target Sample Size Calculator
If you choose to use a manual A/B Test activity rather than Auto-Allocate, the Target Sample Size Calculator helps you determine the sample size needed for a successful test. A manual A/B test is a fixed horizon test, so the calculator is helpful. Using the calculator for an Auto-Allocate activity is optional because Auto-Allocate declares a winner for you. The calculator gives you a rough estimate of the sample size needed. Continue reading for more information about how to use the calculator.
Before setting up your A/B test, access the Adobe Target Sample Size Calculator.
It is important to determine an adequate sample size (number of visitors) before doing any A/B test to establish the time that the activity should run before evaluating the results. Simply monitoring the activity until statistical significance is achieved causes the confidence interval to be vastly underestimated, making the test unreliable. The intuition behind this result is that, in the event a statistically significant result is detected, the test is stopped, and a winner is declared. However, if the result is not statistically significant, the test is allowed to continue. This procedure strongly favors the positive outcome, which increases the false positive rate, and so distorts the effective significance level of the test.
This procedure can result in many false positives, which leads to implementation of offers that do not deliver the predicted lift in the end. Poor lift itself is a dissatisfying outcome, but an even more serious consequence is that, over time, the inability to accurately predict lift erodes organizational trust in testing as a practice.
This article discusses the factors that must be balanced when a sample size is determined and introduces a calculator for estimating an adequate sample size. Calculating the sample size using the sample size calculator (link provided above) before any A/B test begins helps ensure that you always run high-quality A/B tests that comply with statistical standards.
There are five user-defined parameters that define an A/B test. These parameters are interlinked so when four of them are established, the fifth can be calculated:
- Statistical significance
- Statistical power
- Minimum reliably detectable lift
- Baseline conversion rate
- Number of visitors
For an A/B test, the statistical significance, statistical power, minimum reliably detectable lift, and baseline conversion rate are set by the analyst and then the required number of visitors is calculated from these numbers. This article discusses these elements and gives guidelines for how to determine these metrics for a specific test.
The figure below illustrates the four possible outcomes of an A/B test:
It is desirable to get no false positives or false negatives. However, obtaining zero false positives can never be guaranteed by a statistical test. It is always possible that observed trends are not representative of the underlying conversion rates. For example, in a test to see if heads or tails on a coin flip was more likely, even with a fair coin, you could get ten heads on ten tosses just by chance. The statistical significance and power help us quantify the false positive and false negative rates and allow us to keep them at reasonable levels for a given test.