This use case describes how to use of the Merge functionality in the Deduplication activity.
For more information on this functionality, refer to this section.
The Deduplication activity is used for removing duplicate rows from a data set. In this use case, the data shown below is duplicated based on the Email field.
|Last modification date||First Name||Last Name||Mobile Phone||Phone|
With the Deduplication activity’s Merge fonctionality, you can configure a set of rules for the deduplication to define a group of fields to merge into a single resulting data record. For example, with a set of duplicate records, you can choose to keep the oldest phone number or most recent name.
To enable the merge functionality, you first need to configure the Deduplication activity. To do this, follow these steps:
Open the activity, then click the [Edit configuration] link.
Select the reconciliation field to use for the deduplication, then click Next. In this example, we want to deduplicate based on the email field.
Click the Advanced parameters link, then activate the Merge records and Use several record merging criteria options.
The Merge tab is added to the Deduplication configuration screen. We will use this tab to specify the data to merge when performing deduplication.
Here are the rules we want to use to merge the data into a single record:
To configure these rules, follow these steps:
Open the Merge tab, then click the Add button.
Specify the identifier and label of the group of fields to be merged.
Indicate the conditions for selecting the records to be taken into account.
Sort on the last modification date in order to select the most recent name.
Select the fields to merge. In this example, we want to keep the first name and last name fields.
The fields are added to the set of data to merge and a new element is added to the workflow schema.
Repeat these steps to configure the mobile phone and phone fields.
After configuring these rules, the following data is received at the end of the Deduplication activity.
|Modification date||First Name||Last Name||Mobile Phone||Phone|
The result is merged from the three records according to the rules configured earlier. After comparison, it is concluded that the most recent name and mobile phone are used, along with the original phone number.
|First Name||Last Name||Mobile Phone||Phone|
Note that the first name that has been merged is “Bobby”, because we configured a “Name” rule made of both the first name and last fields.
As a result, “Bob” (the most recent first name) could not be taken into account because its associated last name field was empty. The most recent combination of first and last names was merged in the final record.