Deduplication deduplication

Description description

The Deduplication activity allows you to delete duplicates in the result(s) of the inbound activities.

Context of use context-of-use

The Deduplication activity is generally used following targeting activities or after importing a file and before activities that allow the use of targeted data.

During deduplication, inbound transitions are processed separately. For example, if profile ‘A’ is present in the result of query 1, and also in the result of query 2, it will not be deduplicated.

It is therefore advised that a deduplication only has one inbound transition. To do this, you can combine your different queries by using activities that correspond to your targeting needs such as a union activity, an intersection activity, etc. For example:

Related topics

Configuration configuration

To configure a deduplication activity, you must enter a label, the method and the deduplication criteria, as well as the options relating to the result.

  1. Drag and drop a Deduplication activity into your workflow.

  2. Select the activity, then open it using the button from the quick actions that appear.

  3. Select the Resource type on which the deduplication has to be carried out:

    • Database resource if the deduplication is carried out on data that already exists in the database. Select the Filtering dimension and the Targeting dimension, depending on the data that you want to deduplicate. By default, deduplication is carried out on the profiles.
    • Temporary resource if the deduplication is carried out on the workflow’s temporary data: select the Targeted set containing the data to deduplicate. This use case can be encountered after importing a file or if the data in the database was enriched (with a segment code, for example).
  4. Select the Number of unique records to keep. The default value for this field is 1. The value 0 allows you to keep all the duplicates.

    For example, if records A and B are considered duplicates of record Y, and a record C is considered as a duplicate of record Z:

    • If the value of the field is 1: only the Y and Z records are kept.
    • If the value of the field is 0: all the records are kept.
    • If the value of the field is 2: records C and Z are kept and two records from A, B, and Y are kept, by chance or depending on the deduplication method selected thereafter.
  5. Define the Duplicate identification criteria by adding conditions in the list provided. Specify the fields and/or expressions for which the identical values allow the duplicates to be identified: email address, first name, last name, etc. The order of the conditions allows you to specify those to process first.

  6. In the drop-down list, select the Deduplication method to use:

    • Choose for me: randomly selects the record to be kept out of the duplicates.

    • Following a list of values: lets you define a value priority for one or more fields. To define the values, select a field or create an expression, then add the value(s) into the appropriate table. To define a new field, click the Add button located above the list of values.

    • Non-empty value: this lets you keep records for which the value of the selected expression is not empty as a priority.

    • Using an expression: this lets you keep the records in which the value of the expression entered is the smallest or the biggest.

  7. If needed, manage the activity’s Transitions to access the advanced options for the outbound population.

  8. Confirm the configuration of your activity and save your workflow.