Connect Campaign data using S3 as source on Adobe Experience Platform

Learn how to ingest data using a source connector in Experience Platform to update the profile data with exported campaign logs.

In this video, we’ll configure a data flow for importing Campaign log data into Adobe Experience Platform using a source. Experience Platform allows data to be ingested from multiple external sources while providing you with the ability to structure, label, and enhance data as it comes in. You can ingest data from a variety of places, such as Adobe applications, cloud-based storages, databases, and more. This video follows a workflow for creating an Amazon S3 source, ingesting Campaign log data using the source, mapping the data to a dataset, and then scheduling the data flow to happen multiple times per day.
Steps for creating the necessary schema and dataset can be found in an additional video. Please ensure that you have the necessary schema and dataset in place before attempting to configure your data flow. To create a source, we will be using the Experience Platform UI.
To begin, we select Sources from the left navigation, which opens the Sources catalog. Campaign data is exported to a storage location that is then ingested into Platform using a source. The storage location can be Amazon S3, SFTP with password, SFTP with SSH key, or Azure blob connections. The preferred method to send data to Adobe Campaign is through Amazon S3 or Azure blob. In this video, we will use an Amazon S3 connection so we can search by Amazon S3 or use the categories in the catalog to select the cloud storage. We then select configure to begin configuring our Amazon S3 source. To connect our Amazon S3 account, select New account if it isn’t selected automatically, and then provide an account name, S3 access key, and S3 secret key.
The account name is used to identify this specific account on the accounts tab and helps to distinguish it from other S3 accounts, as Platform can create multiple sources of the same type for different purposes. We can also optionally provide a description of the account, which is useful if your organization is creating multiple accounts. we can also provide the S3 bucket name. Since our S3 account contains multiple buckets, we will specify the bucket for our Campaign log data, as well as a folder path to the specific folder within the bucket containing our Campaign data. Providing a bucket and folder limits the searching that is required during the select data step, simplifying the work we have to do. Once the account information has been entered, we can select Connect To Source to continue. After connecting we need to select the data for our source. The left side of the UI contains a directory browser, displaying the files and directories in our S3 bucket. Since we provided a specific folder within our S3 bucket, only the files within that folder are displayed. The right side of the screens enable us to preview up to a hundred rows of data from a compatible file. The data format dropdown provides options for delimited JSON and XDM Parquet. Since we are looking at CSV files, we will select delimited with a comma delimiter.
The sample data automatically populates, showing the columns in our CSV file, and the individual entries in each row. We can preview the data and after confirming everything looks good, select next to continue to mapping.
To begin mapping our Campaign data in S3 for ingestion into Experience Platform, we must first select the target dataset. Our target dataset is an existing dataset. So we select the dataset button to open up the select dataset dialogue.
We can select our dataset from a list by scrolling through all of our organization’s datasets or by using the search bar. Once we find the dataset we want to use, we can choose it by selecting the radio button next to the dataset name and selecting to confirm our choice. Remember the steps for creating the necessary schema and dataset are included in another video. Please be sure to complete that video before proceeding. After confirming the dataset, a pre-mapping of standard fields takes place where our source fields in the CSV are mapped to target fields in the dataset. We can then review all the mappings to ensure they are correct. We are also required to resolve all issues before continuing.
All of the mapping fields look correct, except we need a mapping for the ID field. We can select a target field by using the field selector next to the target field area. On the map source field dialog we choose the ID field from our schema and then use Select to return to the mapper.
Once all the mappings have been updated and confirmed the Next button illuminates, allowing us to proceed with scheduling our data flow.
To schedule our data flow, you must set the frequency and add a start time for the flow of data to begin. Data flows can be scheduled according to organizational needs, including only once, which is especially useful when trying to backfill data. It is recommended to perform the export up to six times a day, depending on the load already present on your instance. For this example, we will schedule for every four hours or six times per day, and de-select backfill because we will only be interested in our new data going forward. After scheduling our data flow, we can then proceed to provide a data flow name and turn on Air Diagnostics, as well as partial ingestion. Enabling partial ingestion provides the ability to ingest data containing errors up to a certain threshold that you can set. Enabling Air Diagnostics will provide details on any incorrect data that is batched separately.
The final step in creating the source connection is to review the details of our connection, dataset, and schema, as well as the scheduling. Once we confirm that the information is correct, we select finish to complete our source setup. And the first ingestion of data will occur according to our schedule.
After our data flow has run as scheduled, we can visit the Data Flows tab to view the details and select the name of our data flow. to view the data flow activity details.
From the activity page, we can update our data flow or edit the schedule as well as disable our data flow or even delete it if necessary. Clicking on the name of the dataset takes us to the dataset details. From the dataset activity page, we can select Preview Dataset to see a sample of records that were imported.
After watching this video, you’ve learned how to connect your Campaign log data stored in Amazon S3 to Experience Platform using a source. This data flow then regularly checks for updated files and ingest the data into a platform dataset. Thanks for watching. -