Statistical calculations in A/Bn tests

Last update: May 12, 2026

Topics:
Reports

This article documents the detailed statistical calculations used in manual A/Bn tests in Adobe Target. Definitions are provided for Conversion Rate, Confidence Interval of Conversion Rate, Lift, Confidence Interval for Lift, Confidence, and Bayesian decision metrics.

An A/B Test (Manual) activity supports two statistical methodologies, selected per activity in Goals & Settings:

Welch’s t-test: a frequentist methodology that reports a Confidence percentage and confidence interval, based on a fixed-sample-size hypothesis test. Used for activities with a Revenue or Engagement primary goal.
Bayesian: reports results as probabilities, such as Chance to Beat Control and credible intervals, computed from the full posterior distribution of each experience’s goal metric. This setting is only available for activities whose primary goal metric is Conversion.

Welch’s t-test

Mean performance

The following section explains the calculations used in the following illustration.

Target report showing the Conversion Rate, Average Lift and Confidence Interval, and Confidence of an A/B Test activity.

Conversion Rate and Revenue Per Visitor (RPV) Campaigns

The following illustration shows Conversion Rate, Confidence Interval of Conversion Rate, and the number of Conversions in a Target report. For example, the first line shows that for Experience A: the Conversion Rate is 25.81% with a Confidence Interval of ±7.7% and 32 conversions were recorded. Given that 124 Visitors saw the experience, this equates to 32/124 = 25.81%.

{width="25%"}

The conversion rate or mean, μ_ν, for each experience ν in an experiment is defined as a ratio of the sum of the metric to the number of units assigned to that metric, N_ν:

{width="125px"}

Here,

Y_iν is the value of the metric for each unit i, that has been assigned to a given experience ν.
The sum over units i depends on the choice of counting methodology.
- If Visitors is used as the counting methodology, each unit is a unique visitor defined as a unique participant in the activity for the life of the activity.
- If Visits is used as the counting methodology, each unit is a unique visit defined as a unique participant in an experience during a Target session (with a unique sessionId). When the sessionId changes, or the visitor reaches the conversion step, a new visit is counted.
- If Activity Impressions is used as the counting methodology, each unit is a unique impression defined as each time a visitor loads any page of the activity.

Confidence Interval of Mean/Conversion Rate

The confidence interval of the conversion rate is intuitively defined as range of possible conversion rates that is consistent with the underlying data.

When running experiments, the conversion rate for a given experience is an estimate of the “true” conversion rate. To quantify the uncertainty in this estimate, Target uses a confidence interval. Target always reports a 95% confidence interval, which means that in the end, 95% of confidence intervals calculated include the true conversion rate of the experience.

A “Confidence” number is also reported next to the currently leading or winning experience. This figure is reported only until the leading experience’s Confidence reaches at least 60%. If two experiences are present in the activity, this number represents the confidence level that the experience is performing better than the other experience. If more than two experiences are present in the activity, this number represents the confidence level that the experience is performing better than the defined “Control” experience. If the “Control” experience is winning, no “Confidence” figure is reported.

A 95% confidence interval of conversion rate μ_ν is defined as the range of values:

{width="30%"}

Where the standard error for the mean is defined as

{width="75px"}

Where an unbiased estimate of the sample standard deviation is used:

{width="200px"}

When the campaign is a conversion rate campaign (i.e., the conversion metric is binary), the standard error reduces to:

{width="150px"}

Lift

The following illustration shows Lift and Confidence Interval of Lift in a Target Report. The number represents the average of the range of the lift bounds, and the arrow reflects if the lift is positive or negative. The arrow displays in grey until the confidence passes 95%. After confidence passes the threshold, the arrow is green or red based on a positive or negative lift.

{width="35%"}

The lift between an experience ν, and the control experience ν₀ is the relative “delta” in conversion rates, defined as

{width="15%"}

Where the individual conversion rates are as defined above. More simply,

Lift(Experience N) = (Performance_Experience_N - Performance_Control)/ Performance_Control

If the conversion rate of the control experience ν₀ is 0, there is no lift.

Confidence Interval of Lift

The boxplot graph in the Average Lift and Confidence Interval column represents the average value and 95% Confidence Interval of Lift. The boxplot is grey when there is any overlap in the confidence interval of a given non-control experience with the confidence interval of control experience. The boxplot is green or red when the range of given experience’s confidence interval is above or below the confidence interval of control experience.

The standard error of the lift between an experience ν, and the control experience ν₀ is defined as:

metric-mean {width="35%"}

Then the 95% Confidence Interval of the lift is:

{width="40%"}

This calculation uses the “Delta” method, and is described in more detail in this document

Confidence

The last column shows the confidence in a Target report. The confidence of an experience is a probability (denoted as a percentage) of obtaining a result as extreme as the one that is observed, given the null hypothesis is true. In terms of p-values, the confidence displayed is 1 - p-value. Intuitively, higher confidence means that it is less likely that the control and non-control experience have equal conversion rates.

In Target, a two-tailed Welch’s t-test is performed between the test experience and the control experience to test if the means of test and control experiences are the same. Because we usually do not know if sample sizes and variances of two groups are the same before running the experiment, and Target also allows you to have unequal percentages of traffic sent to each experience, we do not assume that the variance for each experience is equal. Thus, Welch’s t-test is chosen instead of Student’s t-test.

To perform Welch’s t-test, we first start calculating the t-statistic and the degrees of freedom, then run a two-tailed t-test to generate the p-value. Finally, we calculate the confidence based on p-value.

The t-statistic is defined to be the difference of the means of any two independent random variables, ν and ν₀, divided by the standard error of the difference:

{width="100px"}

Where μ_v and μ_v0 are the means of ν and ν₀ respectively, and the standard error of the difference between μ_v and μ_v0 are given by:

{width="150px"}

Where σ²_v and σ²_v₀ are the variances of two experiences ν and ν₀ respectively, and N_v and N_v₀ are sample sizes for ν and ν₀ respectively.

For Welch’s t-test, the degree of freedom is calculated as following:

{width="180px"}

And degree of freedom for ν and ν₀ are defined as:

{width="100px"}

Then the p-value can be computed from the area in the tails of the t-distribution:

{width="20%"}

Finally, the confidence reported in Target is defined as:

{width="20%"}

Bayesian statistics

Instead of computing a p-value from an approximated distribution, a Bayesian activity’s report expresses results as probabilities, computed from the full posterior distribution of each experience’s goal metric. This makes it safe to monitor a Bayesian report continuously, since there is no statistical penalty for checking results before a fixed sample size is reached, and it can converge faster on smaller samples than Welch’s t-test.

The Bayesian methodology also lets marketers feed in a hypothesis based on their past experimentation and results for the control variant.

The Bayesian methodology is only available for activities whose primary goal metric is Conversion, activities with a Revenue or Engagement primary goal always use Welch’s t-test. For more information about selecting a methodology, see Goals and settings.

Average Lift and Credible interval

{width="35%"}

Average lift and the credible interval together measure performance improvement and its uncertainty in a Bayesian activity. Average lift is the mean percentage change between a treatment and the control, while the credible interval defines the range within which the true lift falls at a specified probability.

Chance to Beat Control

{width="35%"}

Chance to Beat Control is the probability that an experience’s goal metric outperforms the Control experience, for example, “92% chance B beats A”. This is the primary decision metric for a Bayesian activity: a challenger experience is a candidate to replace Control when its Chance to Beat Control meets the activity’s decision threshold.

Performing Calculations offline

The downloaded CSV report includes only raw data and does not include calculated metrics, such as revenue per visitor, lift, or confidence used for A/B tests.

To compute these statistical quantities, download the Target Complete Confidence Calculator Excel file to input the activity’s value.

recommendation-more-help

target-help-main