Batch data ingestion overview

This video gives an overview of batch data ingestion in Adobe Experience Platform and shows how to ingest batch data using the API. For more information, please visit the Data Ingestion documentation.

Transcript
Hello and welcome. My name is David Shin. I’m a senior data engineer here at Adobe. In this video, I will explain batch ingestion in Adobe Experience Platform and the prerequisites before you can start ingesting data into the platform. I will describe high-level steps that get involved ingesting the data and demonstrate how to use Platform web UI and batch ingestion APIs to upload the data. In order to upload the data into Adobe Experience Platform it is required to create a batch in target data set and upload one or more files. The files will be associated to the batch and get uploaded as a single unit. If a file needs to be deleted, it’s not possible to delete selective files from batch. Batches are units of data consists of one or more files and operation like ingest and delete are done at the level of batch in it. The prerequisites that need to be met before data can be ingested into Adobe Experience Platform include the following. First, a dataset is created. When the dataset is created, either using UI or API, the data set is registered into Catalog Service that acts as a metadata store, providing information such as where data is within Adobe Experience Platform. When data is loaded, Catalog Service tracks the metadata for batch files such as the number of successfully ingested records as well as any fail records and associated error messages. Adobe Experience Platform only supports CSV, Parquet and JSON file formats. The content of the files needs to be confirmed to the Dataset XDM schema. The files can contain a subset of the schema and data types that are different from the target XDM schema files in certain cases. Batch ingestion APIs are hosted on top of Adobe IMS service. Any request to the platform requires access token and authentication to the IMS service. You don’t need to provide an IMS token when you’re using UI to ingest data. The UI generates the token for you after successful login. As up today, there are mainly three ways to upload a batch data into Adobe Experience Platform. You can use batch APIs, set up a data connector while using the platform web UI. Regardless of what option you choose to use to upload data internally all the methods go through the same steps to upload a data. Here are the steps. First, all requests to the platform requires an access token to be authenticated to the IMS service. The IMS service is the gateway that manages authentication and safeguards against unauthorized access. Second, a data set is created. A data set can be created using existing XDM schema or from our CSV file. All dataset information is available in the Catalog Service. Third, the batch is created. The loading status indicates that files are been uploaded into the staging environment but are not ready to be promoted to the platform Data Lake. The batch information is registered in the Catalog Service. Fourth, one or more files are uploaded. Depending on the file size and the error threshold that can be tolerated, different batch ingestion APIs are used. As an example, if the files are too large, the API gateway limit can be exceeded causing extended time-outs. requests for body size and other constructions. In this case, a large file upload batch ingestion API needs to be used. The details of the files, including uploaded batch size are registered in the Catalog Service. Fifth, the batch is marked as complete and the batch status is changed to loaded and ready to be promoted to the platform Data Lake. Data validations are carried out to check whether the files can be read, parsed, and that they confirmed to the target schema. If a file doesn’t match the target schema, the platform makes a best effort to convert the data to the express target type. For instance, JSON or CSV does not support, update or date time type. Therefore, the files are expressed using string or number and the platform converts them during ingestion time to the target XDM type. If validation fails, the parse status changes to fail. Lastly, to files are promoted to the platform Data Lake and the batch status changed to success, which means files are available for downstream applications such as profile service and service, to automatically consume the data depending if needed. Now that we understand how the batch ingestion process works in Adobe Experience Platform, I would like to give you a demo, how to upload data using the platform web UI and APIs. Let’s go to the main page of the platform UI and click the data sets.
In the search bar, type Loyalty to search for the Loyalty Balance dataset and then click the dataset.
I have already created the dataset with the schema which is comprised of individual XDM profile and Loyalty Balance makes sense. I have prepared a simple data and here is how the file looks like.
The file contains 100 records.
I will drag and drop the file here.
As you can see, the batch is initially in the loading status and then it gets changed to processing status. Only one file can be uploaded for each batch using the Platform web UI. Let’s give it one or two minutes to wait for the batch status to become success. We see that the status of the batch is now success. Let’s click the link for the batch. We can see that Honda records have been ingested and other information about the batch such as total ingested batch size is available on their D overview. Now, let’s ingest data using the platform APIs. I will use Postman to submit a request, to profile using this mode data ingestion APIs. I’m using this Postman request to download crypto library for IRS 256 and assign it to a Postman environment variable.
This API requests uses the crypto library to get the access token from the IMS service. The access token is assigned to a Postman environment variable which will be used in the subsequent API requests. In this request, I’m creating a batch in the Loyalty Balance dataset. The dataset ID needs to be specified in the request body. The dataset ID can be found from the Platform web UI.
The further multi- line JSON, is configure false since the files contain each JSON record that is allow to a single row. Since the request is successful, I’m getting this batch ID in the response. I’m using this API request to upload profile. The batch ID and data’s ID, need to be changed.
The file name in the URL, needs to be unique for each API requests. Otherwise, the last file that is uploaded will override all the previous files with the same name.
The file will be uploaded from my local desk.
This API request is used to signal batch completion. Let’s use the batch ID and click the send button.
This API requests is used to check the status of the batch.
The status of the batch is staging.
Scrolling down the response page we can see that one file has been uploaded.
Clicking the send button again, the batch status is now success.
Now that you have completed this video, you should be able to explain batch ingestion in Adobe Experience Platform and discuss about prerequisites. You should also be able to describe the high-level steps to upload a group of files as a single unit. In the platform and demonstrate how to ingest data using the platform web UI and batch ingestion APIs. -
recommendation-more-help
9051d869-e959-46c8-8c52-f0759cee3763