This article documents the detailed statistical calculations used in manual A/Bn tests in Adobe Target. Definitions are provided for Conversion Rate, Confidence Interval of Conversion Rate, Lift, Confidence Interval for Lift, and Confidence.
The information in this article replaces the Adobe Target Calculations for A/B Testing pdf file that was previously available for download on this site.
The following section explains the calculations used in the previous illustration.
The following illustration shows Conversion Rate, Confidence Interval of Conversion Rate, and the number of Conversions in a Target report. For example, the first line shows that for Experience A: the Conversion Rate is 25.81% with a Confidence Interval of ±7.7% and 32 conversions were recorded. Given that 124 Visitors saw the experience, this equates to 32/124 = 25.81%.
The conversion rate or mean, μν, for each experience ν in an experiment is defined as a ratio of the sum of the metric to the number of units assigned to that metric, Nν:
Yiν is the value of the metric for each unit i, that has been assigned to a given experience ν.
The sum over units i depends on the choice of counting methodology.
sessionId). When the
sessionIdchanges, or the visitor reaches the conversion step, a new visit is counted.
The confidence interval of the conversion rate is intuitively defined as range of possible conversion rates that is consistent with the underlying data.
When running experiments, the conversion rate for a given experience is an estimate of the “true” conversion rate. To quantify the uncertainty in this estimate, Target uses a confidence interval. Target always reports a 95% confidence interval, which means that in the end, 95% of confidence intervals calculated include the true conversion rate of the experience.
A 95% confidence interval of conversion rate μν is defined as the range of values:
Where the standard error for the mean is defined as
Where an unbiased estimate of the sample standard deviation is used:
When the campaign is a conversion rate campaign (i.e., the conversion metric is binary), the standard error reduces to:
The following illustration shows Lift and Confidence Interval of Lift in a Target Report. The number represents the average of the range of the lift bounds, and the arrow reflects if the lift is positive or negative. The arrow displays in grey until the confidence passes 95%. After confidence passes the threshold, the arrow is green or red based on a positive or negative lift.
The lift between an experience ν, and the control experience ν0 is the relative “delta” in conversion rates, defined as
Where the individual conversion rates are as defined above. More simply,
Lift(Experience N) = (Performance_Experience_N - Performance_Control)/ Performance_Control
If the conversion rate of the control experience ν0 is 0, there is no lift.
The boxplot graph in the Average Lift and Confidence Interval column represents the average value and 95% Confidence Interval of Lift. The boxplot is grey when there is any overlap in the confidence interval of a given non-control experience with the confidence interval of control experience. The boxplot is green or red when the range of given experience’s confidence interval is above or below the confidence interval of control experience.
The standard error of the lift between an experience ν, and the control experience ν0 is defined as:
Then the 95% Confidence Interval of the lift is:
This calculation uses the “Delta” method, and is described in more detail in this document
The last column shows the confidence in a Target report. The confidence of an experience is a probability (denoted as a percentage) of obtaining a result as extreme as the one that is observed, given the null hypothesis is true. In terms of p-values, the confidence displayed is 1 - p-value. Intuitively, higher confidence means that it is less likely that the control and non-control experience have equal conversion rates.
In Target, a two-tailed Welch’s t-test is performed between the test experience and the control experience to test if the means of test and control experiences are the same. Because we usually do not know if sample sizes and variances of two groups are the same before running the experiment, and Target also allows you to have unequal percentages of traffic sent to each experience, we do not assume that the variance for each experience is equal. Thus, Welch’s t-test is chosen instead of Student’s t-test.
To perform Welch’s t-test, we first start calculating the t-statistic and the degrees of freedom, then run a two-tailed t-test to generate the p-value. Finally, we calculate the confidence based on p-value.
The t-statistic is defined to be the difference of the means of any two independent random variables, ν and ν0, divided by the standard error of the difference:
Where μv and μv0 are the means of ν and ν0 respectively, and the standard error of the difference between μv and μv0 are given by:
Where σ2v and σ2v0 are the variances of two experiences ν and ν0 respectively, and Nv and Nv0 are sample sizes for ν and ν0 respectively.
For Welch’s t-test, the degree of freedom is calculated as following:
And degree of freedom for ν and ν0 are defined as:
Then the p-value can be computed from the area in the tails of the t-distribution:
Finally, the confidence reported in Target is defined as:
The downloaded CSV report includes only raw data and does not include calculated metrics, such as revenue per visitor, lift, or confidence used for A/B tests.
To compute these statistical quantities, download the Target Complete Confidence Calculator Excel file to input the activity’s value.