How are decision trees used by Random Forest?
Decision trees can be a powerful statistical tool. However, they have some disadvantages. Most critically, they can “over-fit” the data so that an individual tree poorly predicts future data that wasn’t used to build the initial tree. This challenge is known as the bias-variance tradeoff in statistical learning. Random forests help overcome this over-fitting challenge. At the highest level, a random forest is a collection of decision trees that are built slightly differently on the same data set that “vote” together to yield a better model than an individual tree. The trees are built by randomly selecting a subset of visit records with replacement (known as bagging), and randomly selecting a subset of the attributes, so that the forest consists of slightly different decision trees. This method introduces small variations into the trees that are created in the Random Forest. Adding in this controlled amount of variance helps improve the predictive accuracy of the algorithm.
How do the Target personalization algorithms use Random Forest?
How models are built
The following diagram summarizes how models are built for Auto-Target and Automated Personalization activities:
- Target collects data on visitors while randomly serving experiences or offers
- After Target hits a critical mass of data, Target performs feature engineering
- Target builds Random Forest models for each experience or offer
- Target checks if the model meets a threshold quality score
- Target pushes the model to production to personalize future traffic
Target uses data that it collects automatically, and custom data provided by you, to build its personalization algorithms. These models predict the best experience or offer to show to visitors. Generally, one model is built per experience (if an Auto-Target activity) or per offer (if an Automated Personalization activity). Target then displays the experience or offer that yields the highest predicted success metric (for example, conversion rate). These models must be trained on randomly served visits before they can be used for prediction. As a result, when an activity first starts, even those visitors who are in the personalized group are randomly shown different experiences or offers until the personalization algorithms are ready.
Each model must be validated to ensure that it is good at predicting the behavior of visitors before it is used in your activity. Models are validated based on their Area Under the Curve (AUC). Because of the need for validation, the exact time when a model starts serving personalized experiences depends on the details of the data. In practice, and for traffic planning purposes, it usually takes more than the minimum number of conversions before each model is valid.
When a model becomes valid for an experience or offer, the clock icon to the left of experience/offer name changes to a green checkbox. When there are valid models for at least two experiences or offers, some visits start to become personalized.