Datasets UI guide

This user guide provides instructions on performing common actions when working with datasets within Adobe Experience Platform user interface.

Getting started

This user guide requires a working understanding of the following components of Adobe Experience Platform:

  • Datasets: The storage and management construct for data persistence in Experience Platform.
  • Experience Data Model (XDM) System: The standardized framework by which Experience Platform organizes customer experience data.
    • Basics of schema composition: Learn about the basic building blocks of XDM schemas, including key principles and best practices in schema composition.
    • Schema Editor: Learn how to build your own custom XDM schemas using the Schema Editor within the Platform user interface.
  • Real-Time Customer Profile: Provides a unified, real-time consumer profile based on aggregated data from multiple sources.
  • Adobe Experience Platform Data Governance: Ensure compliancy with regulations, restrictions, and policies regarding the usage of customer data.

View datasets

In the Experience Platform UI, select Datasets in the left-navigation to open the Datasets dashboard. The dashboard lists all available datasets for your organization. Details are displayed for each listed dataset, including its name, the schema the dataset adheres to, and status of the most recent ingestion run.

An image that highlights the Datasets item within the left navigation bar.

By default, only the datasets that you have ingested into are shown. If you want to see the system-generated datasets, enable the Show system datasets toggle. System-generated datasets are only used to process other components. For example, the system-generated profile export dataset is used to process the profile dashboard.

The toggle that lets you choose whether or not system datasets should be displayed is highlighted.

Select the name of a dataset to access its Dataset activity screen and see details of the dataset you selected. The activity tab includes a graph visualizing the rate of messages being consumed as well as a list of successful and failed batches.

Details of your selected dataset are highlighted.
Sample batches that belong to your selected dataset are highlighted.

Preview a dataset

From the Dataset activity screen, select Preview dataset near the top-right corner of your screen to preview up to 100 rows of data. If the dataset is empty, the preview link will be deactivated and will instead say that the preview is not available.

The Preview dataset button is highlighted.

In the preview window, the hierarchical view of the schema for the dataset is shown on the right.

A preview of the dataset is displayed. Information about the structure, as well as sample values, are shown.

For more robust methods to access your data, Experience Platform provides downstream services such as Query Service and JupyterLab to explore and analyze data. See the following documents for more information:

Create a dataset

To create a new dataset, start by selecting Create dataset in the Datasets dashboard.

The Create dataset button is highlighted.

In the next screen, you are presented with the following two options for creating a new dataset:

Create a dataset with an existing schema

In the Create dataset screen, select Create dataset from schema to create a new empty dataset.

The Create dataset from schema button is highlighted.

The Select schema step appears. Browse the schema listing and select the schema that the dataset will adhere to before selecting Next.

A list of schemas is shown. The schema that will be used to create the dataset is highlighted.

The Configure dataset step appears. Provide the dataset with a name and optional description, then select Finish to create the dataset.

Configuration details of the dataset are inserted. This includes details such as the dataset name and description.

Create a dataset with a CSV file

When a dataset is created using a CSV file, an ad hoc schema is created to provide the dataset with a structure that matches the provided CSV file. In the Create dataset screen, select Create dataset from CSV file.

The Create dataset from CSV file button is highlighted.

The Configure step appears. Provide the dataset with a name and optional description, then select Next.

Configuration details of the dataset are inserted. This includes details such as the dataset name and description.

The Add data step appears. Upload the CSV file by either dragging and dropping it onto the center of your screen, or select Browse to explore your file directory. The file can be up to ten gigabytes in size. Once the CSV file is uploaded, select Save to create the dataset.

NOTE

CSV column names must start with alphanumeric characters, and can contain only letters, numbers, and underscores.

The Add data screen is displayed. The location where you can upload the CSV file for the dataset is highlighted.

Enable a dataset for Real-Time Customer Profile

Every dataset has the ability to enrich customer profiles with its ingested data. To do so, the schema that the dataset adheres to must be compatible for use in Real-Time Customer Profile. A compatible schema satisfies the following requirements:

  • The schema has at least one attribute specified as an identity property.
  • The schema has an identity property defined as the primary identity.

For more information on enabling a schema for Profile, see the Schema Editor user guide.

To enable a dataset for Profile, access its Dataset activity screen and select the Profile toggle within the Properties column. Once enabled, data that is ingested into the dataset will also be used to populate customer profiles.

NOTE

If a dataset already contains data and is then enabled for Profile, the existing data is not automatically consumed by Profile. After a dataset is enabled for Profile, it is recommended that you re-ingest any existing data to have it contribute to customer profiles.

The Profile toggle is highlighted within the dataset details page.

Manage and enforce data governance on a dataset

Data usage labels allow you to categorize datasets and fields according to usage policies that apply to that data. See the Data Governance overview to learn more about labels, or refer to the data usage labels user guide for instructions on how to apply labels to datasets.

Delete a dataset

You can delete a dataset by first accessing its Dataset activity screen. Then, select Delete dataset to delete it.

NOTE

Datasets created and utilized by Adobe applications and services (such as Adobe Analytics, Adobe Audience Manager, or Offer Decisioning) cannot be deleted.

The Delete dataset button is highlighted within the dataset details page.

A confirmation box appears. Select Delete to confirm the deletion of the dataset.

The confirmation modal for deletion is displayed, with the Delete button highlighted.

Delete a Profile-enabled dataset

If a dataset is enabled for Profile, deleting that dataset through the UI will delete it from data lake, Identity Service, and the Profile store within Platform.

You can delete a dataset from the Profile store only (leaving the data in the Data Lake) using the Real-Time Customer Profile API. For more information, see the profile system jobs API endpoint guide.

Monitor data ingestion

In the Experience Platform UI, select Monitoring in the left-navigation. The Monitoring dashboard lets you view the statuses of inbound data from either batch or streaming ingestion. To view the statuses of individual batches, select either Batch end-to-end or Streaming end-to-end. The dashboards list all batch or streaming ingestion runs, including those that are successful, failed, or still in progress. Each listing provides details of the batch, including the batch ID, the name of the target dataset, and the number of records ingested. If the target dataset is enabled for Profile, the number of ingested identity and profile records is also displayed.

The monitoring batch end-to-end screen is shown. Both monitoring and batch-to-batch are highlighted.

You can select on an individual Batch ID to access the Batch overview dashboard and see details for the batch, including error logs should the batch fail to ingest.

Details of the selected batch are displayed. This includes the number of records ingested, the number of records failed, the batch status, the file size, the ingestion start and end times, the dataset and batch IDs, the organization ID, the dataset name, and the access information.

If you wish to delete the batch, you can do so by selecting Delete batch found near the top right of the dashboard. Doing so will also remove its records from the dataset the batch was originally ingested to.

The Delete batch button is highlighted on the dataset details page.

Next steps

This user guide provided instructions for performing common actions when working with datasets in the Experience Platform user interface. For steps on performing common Platform workflows involving datasets, please refer to the following tutorials:

On this page