Ingest batch data

In this lesson, you will ingest batch data into Experience Platform using a variety of methods.

Batch data ingestion allows you to ingest a large amount of data into Adobe Experience Platform at once. You can ingest batch data in a one time upload within Platform’s interface or using the API. You can also configure regularly scheduled batch uploads from third-party services such as cloud storage services using Source connectors.

Data Engineers will need to ingest batch data outside of this tutorial.

Before you begin the exercises, watch this short video to learn more about data ingestion:

Permissions required

In the Configure Permissions lesson, you set up all the access controls you need to complete this lesson, specifically:

  • Permission item Data Management > View Datasets, Manage Datasets and Data Monitoring
  • Permission items Data Ingestion > View Sources and Manage Sources
  • Permission item Profile Management > View Profiles
  • Permission item Sandboxes > Luma Tutorial
  • User-role access to the Luma Tutorial Platform product profile
  • Developer-role access to the Luma Tutorial Platform product profile (for API)

Additionally you need access to an (S)FTP server or cloud storage solution for the Sources exercise.

Ingest data in batches with Platform UI

Data can be uploaded directly into a dataset on the datasets screen in JSON and parquet formats.

Download and prep the data

First, get the sample data and customize it for your tenant:

  1. Download luma-data.zip to your Luma Tutorial Assets folder.

  2. Unzip the file, to create a folder called luma-data which contains the four data files we will use in this lesson

  3. Open luma-loyalty.json in a text editor and replace all instances of _techmarketingdemos with your own underscore-tenant id, as seen in your own schemas:
    Underscore tenant id

  4. Save the updated file

Ingest the data

  1. In the Platform UI, click Datasets in the left navigation

  2. Click on your Luma Loyalty Dataset

  3. Scroll down until you see the Add Data section in the right column

  4. Upload the luma-loyalty.json file.

  5. Once the file uploads a row for the batch will appear

  6. If you reload the page after a few minutes, you should see that the batch has successfully uploaded with 1000 records and 1000 profile fragments.

    Ingestion

NOTE

There are a few options, Error diagnostics and Partial ingestion that you will see on various screens in this lesson. These options aren’t covered in the tutorial. Some quick info:

  • Enabling error diagnostics generates data about the ingestion of your data, which you can then review using the Data Access API. Learn more about it in the documentation.
  • Partial ingestion allows you to ingest data containing errors, up to a certain threshold which you can specify. Learn more about it in the documentation

Validate the data

There are a few ways to confirm the data was successfully ingested.

Validate in the Platform UI

To confirm the data was ingested into the dataset:

  1. On the same page where you have ingested the data, click the Preview dataset button on top-right

  2. Click Preview button and you should be able to see some of the ingested data.

    Preview the successful dataset

To confirm the data landed in Profile:

  1. Click Profiles in the left navigation
  2. Click the icon next to the Select identity namespace field to open the modal
  3. Select your Luma Loyalty Id namespace
  4. The enter one of the loyaltyId values from your dataset, e.g. 5625458
  5. Click the Show profile button
    Confirm a profile from the dataset

Validate with data ingestion events

If you subscribed to data ingestion events in the previous lesson, check your unique webhook.site URL. You should see three requests show up in the following order, with some time in between them, with the following eventCode values:

  1. ing_load_success—the batch as ingested
  2. ig_load_success—the batch was ingested into identity graph
  3. ps_load_success—the batch was ingested into profile service

Data ingestion webhook

See the documentation for more details on the notifications.

Ingest data in batches with Platform API

Now we will upload data using the API.

NOTE

Data architects, feel free to just upload the CRM data via the UI method you just used.

Download and prep the data

  1. You should have already downloaded and unzipped luma-data.zip into your Luma Tutorial Assets folder.
  2. Open luma-crm.json in a text editor and replace all instances of _techmarketingdemos with your own underscore-tenant id, as seen in your schemas
  3. Save the updated file

Get the dataset id

First we need to get the id of the dataset id of the dataset into which we want to ingest data:

  1. Open Postman
  2. If you haven’t made a call in the last 24 hours, your authorization tokens have probably expired. Open the call Adobe I/O Access Token Generation > Local Signing (Non-production use-only) > IMS: JWT Generate + Auth via User Token and click Send to request new JWT and Access Tokens, just like you did in the Postman lesson.
  3. Open your environment variables and make sure the value of CONTAINER_ID is still tenant
  4. Open the call Catalog Service API > Datasets > Retrieve a list of datasets. and click Send
  5. You should get a 200 OK response
  6. Copy the id of the Luma CRM Dataset from the Response body
    Get the dataset id

Create the batch

Now we can create a batch in the dataset:

  1. Download Data Ingestion API.postman_collection.json to your Luma Tutorial Assets folder

  2. Import the collection into Postman

  3. Select the request Data Ingestion API > Batch Ingestion > Create a new batch in Catalog Service.

  4. Paste the following as the Body of the request, replacing the datasetId value with your own:

    {
        "datasetId":"REPLACE_WITH_YOUR_OWN_DATASETID",
        "inputFormat": {
            "format": "json"
        }
    }
    
  5. Click the Send button

  6. You should get a 201 Created response containing the id of your new batch!

  7. Copy the id of the new batch
    Batch Created

Ingest the data

Now we can upload the data into the batch:

  1. Select the request Data Ingestion API > Batch Ingestion > Upload a file to a dataset in a batch.

  2. In the Params tab, enter your dataset id and batch id into their respective fields

  3. In the Params tab, enter luma-crm.json as the filePath

  4. In the Body tab, select the binary option

  5. Select the downloaded luma-crm.json from your local Luma Tutorial Assets folder

  6. Click Send and you should get a 200 OK response with ‘1’ in the response body

    Data uploaded

At this point, if you look at your batch in the Platform UI, you will see that it is in a “Loading” status:
Batch loading

Because the Batch API is often used to upload multiple files, you need need to tell Platform when a batch is complete, which we will do in the next step.

Complete the batch

To complete the batch:

  1. Select the request Data Ingestion API > Batch Ingestion > Finish uploading a file to a dataset in a batch.

  2. In the Params tab, enter COMPLETE as the action

  3. In the Params tab, enter your batch id. Do not worry about dataset id or filePath, if they are present.

  4. Make sure the URL of the POST is https://platform.adobe.io/data/foundation/import/batches/:batchId?action=COMPLETE and that there aren’t any unnecessary references to the datasetId or filePath

  5. Click Send and you should get a 200 OK response with ‘1’ in the response body

    Batch complete

Validate the data

Validate in the Platform UI

Validate the data has landed in the Platform UI just like you did for the Loyalty dataset.

First, confirm the batch shows that 1000 records have ingested:

Batch success

Next, confirm the batch using Preview dataset:

Batch preview

Finally, confirm one of your profiles has been created by looking up one of the profiles by the Luma CRM Id namespace, e.g. 112ca06ed53d3db37e4cea49cc45b71e

Profile ingested

There is one interesting thing that just happened that I want to point out. Open that Danny Wright profile. Note that the profile has both a Lumacrmid and a Lumaloyaltyid. Remember the Luma Loyalty Schema contained two identity fields, Luma Loyalty Id and CRM Id. Now that we’ve uploaded both datasets, they’ve merged into a single profile. The Loyalty data had Daniel as the first name and “New York City” as the home address, while the CRM data had Danny as the first name and Portland as the home address for the customer with the same Loyalty Id. We will come back to why the first name displays Danny in the lesson on merge policies.

Congratulations, you’ve just merged profiles!

Profile merged

Validate with data ingestion events

If you subscribed to data ingestion events in the previous lesson, check your unique webhook.site URL. You should see three requests come in, just like with the loyalty data:

Data ingestion webhook

See the documentation for more details on the notifications.

Ingest data with Workflows

Let’s look at another way of uploading data. The workflows feature allows you to ingest CSV data which is not already modeled in XDM.

Download and prep the data

  1. You should have already downloaded and unzipped luma-data.zip into your Luma Tutorial Assets folder.
  2. Confirm that you haveluma-products.csv

Create a workflow

Now let’s set up workflow:

  1. Click Workflows in the left navigation
  2. Select Map CSV to XDM schema and click the Launch button
    Launch the workflow
  3. Select your Luma Product Catalog Dataset and click the Next button
    Select your dataset
  4. Add the luma-products.csv file you just downloaded and click the Next button
    Select your dataset
  5. Now you are in our mapper interface, in which you can map a field from the source data (one of the column names in our luma-products.csv file) to XDM fields in the target schema. In our example, the column names are close enough to the schema field names that the mapper is able to auto-detect the right mapping! If the mapper was unable to auto-detect the right field, you would click the icon to the right of the target field to select the correct XDM field. Also, if you didn’t want to ingest one of the columns from the CSV, you could delete the row from the mapper. Feel free to play around and change column headings in the luma-products.csv to get familiar with how the mapper works.
  6. Click the Finish button
    Select your dataset

Validate the data

When the batch has uploaded, verify the upload by previewing the dataset.

Since the Luma Product SKU is a non-people namespace, we won’t see any profiles for our product skus.

You should see the three hits to your webhook.

Ingest data with Sources

Okay, you did things the hard way. Now let’s move into the promised land of automated batch ingestion! When I say, “SET IT!” you say, “FORGET IT!” “SET IT!” “FORGET IT!” “SET IT!” “FORGET IT!” Just kidding, you would never do such a thing! Ok, back to work. You’re almost done.

Click on Sources in the left navigation to open the Sources catalog. Here you will see a variety of out-of-the-box integrations with industry-leading data and storage providers.

Source catalog

Okay, let’s ingest data using a source connector.

This exercise will be choose-your-own-adventure style. I am going to show the workflow using the FTP source connector. You can either use a different Cloud Storage source connector that you use at your company, or just upload the json file using the dataset UI like we did with the loyalty data.

Many of the Sources have a similar configuration workflow, in which you:

  1. Enter your authentication details
  2. Select the data you want to ingest
  3. Select the Platform dataset into which you want to ingest it
  4. Map the fields to your XDM schema
  5. Choose the frequency with which you want to re-ingest data from that location
NOTE

The Offline Purchase data we will be using in this exercise contains datetime data. Datetime data should be in either ISO 8061 formatted strings (“2018-07-10T15:05:59.000-08:00”) or Unix Time formatted in milliseconds (1531263959000) and are converted at ingestion time to the target XDM type. For more on data conversion and other constraints, see the Batch Ingestion API documentation.

Download, prep, and upload the data to your preferred cloud storage vendor

  1. You should have already downloaded and unzipped luma-data.zip into your Luma Tutorial Assets folder.
  2. Open luma-offline-purchases.json in a text editor and replace all instances of _techmarketingdemos with your own underscore-tenant id, as seen in your schemas
  3. Choose your preferred cloud storage provider, making sure it is available in the Sources catalog
  4. Upload luma-offline-purchases.json to a location in your preferred cloud storage provider

Ingest the data to your preferred cloud storage location

  1. In the Platform UI, filter the Sources catalog to Cloud storage

  2. Note there are convenient links to documentation under the ...

  3. In the box of your preferred Cloud storage vendor, click the Configure button
    Click configure

  4. Authentication is the first step. Enter the name for your account, e.g. Luma's FTP Account and your authentication details. This step should be fairly similar for all cloud storage sources, although the fields may vary slightly. Once you’ve entered the authentication details for an account, you can reuse them for other source connections that might be sending different data on different schedules from other files in the same account

  5. Click the Connect to source button

  6. When Platform has successfully connected to the Source, click the Next button
    Authenticate to the source

  7. On the Select data step, the UI will use your credentials to open the folder on your cloud storage solution

  8. Select the file(s) you would like to ingest, e.g. luma-offline-purchases.json

  9. As the Data format, select XDM JSON

  10. You can then preview the json structure and sample data in your file

  11. Click the Next button
    Select your data file(s)

  12. On the Mapping step, select your Luma Offline Purchase Event Dataset and click the Next button. Note in the message that since the data we are ingesting is a JSON file, there is no mapping step where we map source field to target field. JSON data must be in XDM already. If you were ingesting a CSV, you would see the full mapping UI on this step:
    Select your dataset

  13. On the Scheduling step, you choose the frequency with which you want to re-ingest data from the Source. Take a moment to look at the options. We are just going to do a one-time ingestion, so leave the Frequency on Once and click the Next button:
    Schedule your data flow

  14. On the Dataflow detail step, you can choose a name for your dataflow, enter an optional description, turn on error diagnostics, and partial ingestion. Leave the settings as they are and click the Next button:
    Edit details of your data flow

  15. On the Review step, you can review all of your settings together and either edit them or click the Finish button

  16. After saving you will land on a screen like this:
    Complete

Validate the data

When the batch has uploaded, verify the upload by previewing the dataset.

You should see the three hits to your webhook.

Look up the profile with value 5625458 in the loyaltyId namespace again to see if there are any purchase events in their profile. You should see one purchase. You can dig into the details of the purchase by clicking View JSON:

Purchase event in profile

ETL Tools

Adobe partners with multiple ETL vendors to support data ingestion into Experience Platform. Because of the variety of third party vendors, ETL is not covered in this tutorial, although you are welcome to review some of these resources:

Additional Resources

Now let’s stream data using the Web SDK

On this page