Read more about Data Workbench’s End-of-life announcement.
The Decision Tree menu includes features to set the positive use case, filters, leaf distribution options, confusion matrix, and other advanced options.
|Go||Click to run the decision tree algorithm and display the visualization. This is grayed-out until there are inputs.|
|Reset||Clears inputs and decision tree model and resets the process.|
|Save||Save the Decision Tree. You can save the Decision Tree in different formats:
|Options||See table below for Options menu.|
|Set Positive Case||Defines the current workspace selection as the model's Positive Case. Clears the case if no selection exists.|
|Set Population Filter||Defines the current workspace selection as the model's Population Filter and will be drawn from visitors who satisfy this condition. The default is "Everyone."|
|Show Complex Filter Description||Displays descriptions of the defined filters. Click to view the filtering scripts for the Positive Case and Population Filter.|
|Hide Nodes||Hides nodes with only a small percentage of the population. This menu command displays only when the decision tree is displayed.|
Click Options > Confusion Matrix to view the Accuracy, Recall, Precision and F-Score values. The closer to 100 percent, the better the score.
The Confusion Matrix gives four counts of accuracy of the model using a combination of values:
Tip: These numbers are obtained by applying the resulted scoring model of the 20 percent testing data withheld and already known as the true answer. If the score is greater than 50 percent, it is predicted as a positive case (that matches the defined filter). Then, Accuracy = (TP + TN)/(TP + FP + TN + FN), Recall = TP / (TP + FN), and Precision = TP / (TP + FP).
|Display Legend||Allows you to toggle a legend key on and off in the Decision Tree. This menu command displays only when the decision tree is displayed.|
|Advanced||Click to open Advanced menu for in-depth use of Decision Tree. See table below for menu options.|
|Training Set Size||
Controls the size of the training set used for the model building. Larger sets take longer to train, smaller sets take less time.
Allows the user to specify whether to use the Min-Max or the Z Score technique to normalize inputs into the model.
|SMOTE Over-Sampling Factor||When the Positive Case does not occur very often (less than 10 percent) in the training sample, SMOTE is used to provide additional samples. This option allows the user to indicate how many more samples to create using SMOTE.|
|Leaf Class Distribution Threshold||Allows you to set the threshold assumed for a leaf during the tree building process. By default, all members of a node must be identical for it to be a leaf (prior to pruning stage).|