A summary of the sampling methodology used for some reports, sampling error rates, and a list of reports that return information based on sampled data.
Some Audience Manager reports display results based on a sampled set of the total amount of available data. The sampled data ratio is 1:54. For reports that use sampled data, this means your results are based on 1 record out of every set of 54 records.
These reports use statistical sampled data because they need a tremendous amount of computing power to generate results. Sampling helps strike a balance between reduced computational demands, maintaining system performance, and providing accurate results.
Errors can occur in reports that generate overlap data. An error is defined as the percentage of records that:
It’s important to note that our tests and models show that the error rate decreases in an inverse proportion to the number of records in your data set. Data sets that have a lot of records generate fewer errors than sets with a small number of records. Let’s look at this assertion in a more quantitative manner. As shown in the following table, for a set number of records, 95% of your report results will be below a specific error rate.
|Number of Records||Error Rate|
|500 - 1,000||95% are under a 42% error rate.|
|1,000 - 1,500||95% are under a 34% error rate.|
|10,000 - 50,000||95% are under a 14% error rate.|
|50,000||95% are under a 6% error rate.|
|100,000||95% are under a 4% error rate.|
|500,000 (or more)||95% are under a 2% error rate.|
Based on the Minhash sampling methodology, Audience Manager uses a novel method to compute trait and segment estimators on top of a One Permutation Hashing data sketch. This new method produces a lower variance than the standard estimator for Jaccard similarity. See the section below for the reports that use this methodology.
The Audience Manager reports that use statistical sampled data and the Minhash sampling methodology include:
|Statistical sampling||Minhash sampling methodology|
|Addressable Audience data (customer- and segment-level data).||Overlap reports (trait-to-trait, segment-to-trait, and segment-to-segment)|
|The Total Devices metric for a Profile Merge Rule.||Trait Recommendations|
|Data Explorer uses sampled data in the Search tab and any Saved Searches||Audience Marketplace Recommendations|