Deleting duplicate records in Adobe Experience Platform Data Lake
Adobe Experience Platform Data Lake does not support deleting a single duplicate record from a dataset while keeping other records for the same identity. The platform instead supports deleting entire identities or rebuilding datasets with deduplicated data. To fix this, delete data at the identity level or rebuild and replace the dataset using AEP tools.
Description description
Environment
- Product: Adobe Experience Platform (AEP)
- Constraints: Applies to both Delta-migrated and Non-Delta configurations; record-level deletion is not available.
Issue/Symptoms
- Deleting only a single duplicate record from a dataset is not possible.
- Deletion methods remove all records associated with an identity.
- Removing duplicates requires rebuilding the dataset.
Resolution resolution
Note: Adobe Experience Platform Data Lake does not support deleting a single duplicate row while keeping other records for the same identity. All supported deletion options work at the identity level or require replacing the dataset with deduplicated data.
To address duplicate records in Adobe Experience Platform Data Lake, use one of the supported approaches.
-
Delete records by identity using the Data Lifecycle Record Delete feature:
- Prepare a list of primary IDs, such as customer IDs, that require deletion.
- Submit a delete request in the Data Lifecycle workspace and specify the target dataset.
- This action deletes all records for the specified IDs from Data Lake, Profile, and Identity Service.
-
Rebuild and replace the dataset with deduplicated data:
- Create a new dataset that uses the same schema as the original dataset.
- Use Query Service or Data Distiller to extract one clean record per identity based on your deduplication rules.
- Validate that the new dataset contains only unique records.
- After validation, delete the original dataset to remove its data from Data Lake, Profile, and Identity Service, and continue using the cleaned dataset.
After completing either approach, verify that duplicate records are removed by reviewing the updated datasets in Adobe Experience Platform.