A dataflow is a scheduled task that retrieves and ingests data from a source to a dataset in Adobe Experience Platform. This tutorial provides steps on how to create a dataflow for a payments source using the Platform UI.
NOTE
In order to create a dataflow, you must already have an authenticated account with a payments source. A list of tutorials for creating different payments source accounts in the UI can be found in the sources overview.
For Experience Platform to ingest data, timezones for all table-based batch sources must be configured to UTC.
Getting started
This tutorial requires a working understanding of the following components of Platform:
Sources: Platform allows data to be ingested from various sources while providing you with the ability to structure, label, and enhance incoming data using Platform services.
Real-Time Customer Profile: Provides a unified, real-time consumer profile based on aggregated data from multiple sources.
Data Prep: Allows data engineers to map, transform, and validate data to and from Experience Data Model (XDM).
Add data
After creating your payments source account, the Add data step appears, providing an interface for you to explore your payments source account’s table hierarchy.
The left half of the interface is a browser, displaying a list of data tables contained in your account. The interface also includes a search option that allows you to quickly identify the source data you intend to use.
The right half of the interface is a preview panel, allowing you to preview up to 100 rows of data.
NOTE
The search source data option is available to all table-based sources excluding the Adobe Analytics, Amazon Kinesis, and Azure Event Hubs.
Once you find the source data, select the table, then select Next.
Provide dataflow details
The Dataflow detail page allows you to select whether you want to use an existing dataset or a new dataset. During this process, you can also configure settings for Profile dataset, Error diagnostics, Partial ingestion, and Alerts.
Use an existing dataset
To ingest data into an existing dataset, select Existing dataset. You can either retrieve an existing dataset using the Advanced search option or by scrolling through the list of existing datasets in the dropdown menu. Once you have selected a dataset, provide a name and a description for your dataflow.
Use a new dataset
To ingest into a new dataset, select New dataset and then provide an output dataset name and an optional description. Next, select a schema to map to using the Advanced search option or by scrolling through the list of existing schemas in the dropdown menu. Once you have selected a schema, provide a name and a description for your dataflow.
Enable Profile and error diagnostics
Next, select the Profile dataset toggle to enable your dataset for Profile. This allows you to create a holistic view of an entity’s attributes and behaviors. Data from all Profile-enabled datasets will be included in Profile and changes are applied when you save your dataflow.
Error diagnostics enables detailed error message generation for any erroneous records that occur in your dataflow, while Partial ingestion allows you to ingest data containing errors, up to a certain threshold that you manually define. See the partial batch ingestion overview for more information.
Enable alerts
You can enable alerts to receive notifications on the status of your dataflow. Select an alert from the list to subscribe to receive notifications on the status of your dataflow. For more information on alerts, see the guide on subscribing to sources alerts using the UI.
When you are finished providing details to your dataflow, select Next.
Map data fields to an XDM schema
The Mapping step appears, providing you with an interface to map the source fields from your source schema to their appropriate target XDM fields in the target schema.
Platform provides intelligent recommendations for auto-mapped fields based on the target schema or dataset that you selected. You can manually adjust mapping rules to suit your use cases. Based on your needs, you can choose to map fields directly, or use data prep functions to transform source data to derive computed or calculated values. For comprehensive steps on using the mapper interface and calculated fields, see the Data Prep UI guide.
Once your source data is successfully mapped, select Next.
Schedule ingestion runs
The Scheduling step appears, allowing you to configure an ingestion schedule to automatically ingest the selected source data using the configured mappings. By default, scheduling is set to Once. To adjust your ingestion frequency, select Frequency and then select an option from the dropdown menu.
TIP
Interval and backfill are not visible during a one-time ingestion.
If you set your ingestion frequency to Minute, Hour, Day, or Week, then you must set an interval to establish a set time frame between every ingestion. For example, an ingestion frequency set to Day and an interval set to 15 means that your dataflow is scheduled to ingest data every 15 days.
During this step, you can also enable backfill and define a column for the incremental ingestion of data. Backfill is used to ingest historical data, while the column you define for incremental ingestion allows new data to be differentiated from existing data.
See the table below for more information on scheduling configurations.
Scheduling configuration
Description
Frequency
Configure frequency to indicate how often the dataflow should run. You can set your frequency to:
Once: Set your frequency to once to create a one-time ingestion. Configurations for interval and backfill are unavailable when creating a one-time ingestion dataflow. By default, the scheduling frequency is set to once.
Minute: Set your frequency to minute to schedule your dataflow to ingest data on a per-minute basis.
Hour: Set your frequency to hour to schedule your dataflow to ingest data on a per-hour basis.
Day: Set your frequency to day to schedule your dataflow to ingest data on a per-day basis.
Week: Set your frequency to week to schedule your dataflow to ingest data on a per-week basis.
Interval
Once you select a frequency, you can then configure the interval setting to establish the time frame between every ingestion. For example, if you set your frequency to day and configure the interval to 15, then your dataflow will run every 15 days. You cannot set the interval to zero. The minimum accepted interval value for each frequency is as follows:
Once: n/a
Minute: 15
Hour: 1
Day: 1
Week: 1
Start Time
The timestamp for the projected run, presented in UTC time zone.
Backfill
Backfill determines what data is initially ingested. If backfill is enabled, all current files in the specified path will be ingested during the first scheduled ingestion. If backfill is disabled, only the files that are loaded in between the first run of ingestion and the start time will be ingested. Files loaded prior to the start time will not be ingested.
Load incremental data by
An option with a filtered set of source schema fields of type, date, or time. The field that you select for Load incremental data by must have its date-time values in UTC timezone in order to correctly load incremental data. All table-based batch sources pick incremental data by comparing a delta column time stamp value to the corresponding flow run window UTC time, and then copying the data from the source, if any new data is found within the UTC time window.
Review your dataflow
The Review step appears, allowing you to review your new dataflow before it is created. Details are grouped within the following categories:
Connection: Shows the source type, the relevant path of the chosen source file, and the amount of columns within that source file.
Assign dataset & map fields: Shows which dataset the source data is being ingested into, including the schema that the dataset adheres to.
Scheduling: Shows the active period, frequency, and interval of the ingestion schedule.
Once you have reviewed your dataflow, select Finish and allow some time for the dataflow to be created.
Monitor your dataflow
Once your dataflow has been created, you can monitor the data that is being ingested through it to see information on ingestion rates, success, and errors. For more information on how to monitor dataflow, see the tutorial on monitoring accounts and dataflows in the UI.
Delete your dataflow
You can delete dataflows that are no longer necessary or were incorrectly created using the Delete function available in the Dataflows workspace. For more information on how to delete dataflows, see the tutorial on deleting dataflows in the UI.
Next steps
By following this tutorial, you have successfully created a dataflow to bring data from your payments source to Platform. Incoming data can now be used by downstream Platform services such as Real-Time Customer Profile and Data Science Workspace. See the following documents for more details:
Hi there. I’m going to give you a quick overview of how to ingest data from your CRM systems into Adobe Experience Platform. Data ingestion is a fundamental step to getting your data in Experience Platform so you can use it to build 360-degree, real-time customer profiles and use them to provide meaningful experiences. Adobe Experience Platform allows data to be ingested from various external sources by giving you the ability to structure, label, and enhance incoming data using platform services. You can ingest data from a wide variety of sources, such as Adobe applications, cloud-based storage, databases, and many others. Experience Platform provides tools to ensure that the ingested data is XDM compliant and helps prepare the data for real-time customer profiles and other services. When you log into platform, you will see Sources in the left navigation. Clicking sources will take you to the source catalog screen, where you can see all of the source connectors currently available in Platform. Note that there are so connectors for Adobe applications, CRM solutions, cloud storage providers, and more. Let’s explore how to ingest data from CRM systems into Experience Platform. Each source has its specific configuration details, but the general configuration for CRM source connectors are somewhat similar. For our video, let’s use the Salesforce CRM system. Select the desired source. When setting up a source connector for the very first time, you will be provided with an option to configure. For an already configured source connector, you will be given an option to add data. Since this is our first time creating a Salesforce account, let’s click on creating a new account and provide a source connection details. Complete the required fields for account authentication, and then initiate a source connection request. If the connection is successful, click next to proceed to data selection. In this step, you can explore the list of accessible objects in Salesforce CRM. Let’s search for the loyalty object and quickly preview the object data before we continue. Let’s proceed to the next step to assign a target dataset for the incoming data. You can choose an existing dataset or create a new dataset. Let’s choose the new dataset option and provide a dataset name and description. To create a dataset, you need to have an associated schema. Using the schema finder, assign a schema to this dataset. Upon selecting a schema for this dataset, Experience Platform performs a mapping between the source file field and the target field. This mapping is performed based on the title and type of the field. This pre-mapping of standard fields are editable. You can quickly clear all mapping and add a custom mapping between a source field and a target field. To do so, choose an attribute from the source file and map it to a corresponding schema attribute. To select a source field, you can either use the target field dropdown option or typing to find a field and then map it to a target field. Like how we mapped the loyalty field, let’s map the CRM ID target field to our schema field. Similarly, you can complete the mapping for other fields. Add calculated field option lets you run functions on source fields to prepare the data for ingestion. You can choose from a list of pre-defined functions that can be applied to your source fields. For example, we can combine the first name field and the last name field into a calculated field using the concatenation function before ingesting the data into a dataset field. Upon selecting a function, you can notice the function documentation on the right-hand side of the screen. You can also preview the sample result of a calculated field. Let’s save all the changes leave the window. You can view the calculated field displayed as a source field. Now, let’s quickly map the calculated field to a schema target field. After reviewing the field mapping, you can also preview data to see how the ingested data will get stored in your dataset. If the mapping looks good, let’s move to the next step. Scheduling lets you choose the frequency at which data should flow from source to a dataset. Let’s select a frequency of 15 minutes for this video and set a start time for data flow. To let historical data to be ingested, enable the Backfill option. Backfill is a Boolean value that determines what data is initially ingested. If backfill is enabled, all current files in the specified path will be ingested during the first scheduled injection. If backfill is disabled, only the files that are loaded in between the first run of the injection and the start time will be ingested. Files loaded before start time will not get ingested. Select load incremental data by assigning to a field that helps us distinguish between new and existing data. Let’s move the Dataflow step. Provide a name for your Dataflow.
In the Dataflow detail step, the partial ingestion toggle allows you to enable or disable the use of partial batch ingestion. The error threshold allows you to set the percentage of acceptable errors before the entire batch fills. By default, this value is set to 5%. Let’s review this source configuration details and then save your changes. We do not see any data flow on statuses as we have set a frequency of 15 minutes for our current data flow runs. So let’s wait for the data flow to run. Let’s refresh the page and you can now see that our data flow run status has been completed. Open the Dataflow run to view more details about the activity. Our last data flow run was completed successfully without any failed records. If there were any failed records, since we enabled error diagnosis for our data flows, we should be able to view the error code and error description for the failed records. Experience Platform also lets users preview or download the error diagnosis to determine what went wrong with the failed records. Let’s go back to the Dataflow activity tab. At this point, we verified that the data flow was completed successfully from the source to our dataset. Let’s open our dataset to verify the data flow and activities. You can open the Luma customer loyalty dataset right from the data flow window, or you can access it using the datasets option from the left navigation. Under the Dataset activity, you can see a quick summary of ingested batches and failed batches during a specific time window. Scroll down to view the ingested batch ID. Each batch represents actual data ingestion from a source connector to a target dataset. Let’s quickly preview the dataset to ensure that data integration was successful and our calculated fields are populated. We now have the dataset populated with data from Salesforce CRM. Finally, let’s see how to enable this data for real-time customer profile. In the real-time customer profile, you can see each customer’s holistic view that combines data from multiple channels, including online, offline CRM and third-party data. To enable our dataset for the real-time customer profile, ensure that the associated schema is enabled for real-time profile. Once the schema is enabled for profile, it cannot be disabled or deleted. Also, fields cannot be removed from the schema after this point. These implications are essential to keep in mind when working with the data in your production environment. It is recommended to verify and test the data ingestion process to capture and address any issues that may arise before enabling the dataset and schema for profile. Now, let’s enable profile for our dataset and save all the changes. In the next successful batch run, data ingested into our dataset will be used for creating real-time customer profile. Adobe Experience Platform allows data to be ingested from external sources by providing you with the ability to structure, label and enhance incoming data using platform services. -