Read more about Data Workbench’s End-of-life announcement.
Visitor clustering lets you leverage customer characteristics to dynamically categorize visitors and generate cluster sets based on selected data inputs, thus identifying groups that have similar interests and behaviors for customer analysis and targeting.
The clustering process requires you to identify metrics and dimension elements to use as inputs, and allows you to choose a specific target population to apply these elements to create specified clusters. When you run the clustering process, the system uses the metric and dimension inputs to determine appropriate initial centers for the specified number of clusters. These centers are then used as a starting point to apply the K-Means algorithm.
The Maximum Iterations in the Options menu allows the analyst to specify the maximum number of iterations to be performed by the clustering algorithm. Setting this option may result in faster completion of the clustering process based on the maximum iterations cap at the expense of exact convergence of the cluster centers.
Once the clusters have been defined, the Cluster Dimension can be saved for use just like any other dimension. It can also be loaded into the Cluster Explorer to examine the separation of cluster centers.
In the Cluster Builder, you can select Options > Algorithm to select algorithms when defining clusters. Currently, there are 3 supported algorithms:
There are 2 ways to run the clustering process:
The algorithm has the following restrictions:
In the DPU.cfg file, the value for ‘Query, Memory Limit’ is set to 500 MB by default. This value must be increased while running multiple clustering jobs. For instance, if you are running 5 clustering jobs in parallel, increase this value to 1 GB. There is no way to cancel the clustering job without restarting the Server.
The number of iterations (number of times the data is scanned) and the convergence threshold that you configure, grossly affects the clustering performance. The following table provides a broader guideline that you can follow:
|Number of Clusters||Algorithm||Iterations||Convergence Threshold||Normalization|