The Adobe Experience Platform Data Ingestion API allows you to ingest data into Platform as batch files. Data being ingested can be the profile data from a flat file in a CRM system (such as a Parquet file), or data that conforms to a known schema in the Experience Data Model (XDM) registry.
The Data Ingestion API reference provides additional information on these API calls.
The following diagram outlines the batch ingestion process:
The Data Ingestion API allows you to ingest data as batches (a unit of data that consists of one or more files to be ingested as a single unit) into Experience Platform in three basic steps:
To upload a file larger than 512MB, the file will need to be divided into smaller chunks. Instructions to upload a large file can be found here.
This guide provides example API calls to demonstrate how to format your requests. These include paths, required headers, and properly formatted request payloads. Sample JSON returned in API responses is also provided. For information on the conventions used in documentation for sample API calls, see the section on how to read example API calls in the Experience Platform troubleshooting guide.
In order to make calls to Platform APIs, you must first complete the authentication tutorial. Completing the authentication tutorial provides the values for each of the required headers in all Experience Platform API calls, as shown below:
{ACCESS_TOKEN}
{API_KEY}
{IMS_ORG}
All resources in Experience Platform are isolated to specific virtual sandboxes. All requests to Platform APIs require a header that specifies the name of the sandbox the operation will take place in:
{SANDBOX_NAME}
For more information on sandboxes in Platform, see the sandbox overview documentation.
All requests that contain a payload (POST, PUT, PATCH) require an additional header:
Before data can be added to a dataset, it must be linked to a batch, which will later be uploaded into a specified dataset.
POST /batches
Request
curl -X POST "https://platform.adobe.io/data/foundation/import/batches" \
-H "Content-Type: application/json" \
-H "x-gw-ims-org-id: {IMS_ORG}" \
-H "x-sandbox-name: {SANDBOX_NAME}" \
-H "Authorization: Bearer {ACCESS_TOKEN}" \
-H "x-api-key : {API_KEY}"
-d '{
"datasetId": "{DATASET_ID}"
}'
Property | Description |
---|---|
datasetId |
The ID of the dataset to upload the files into. |
Reponse
{
"id": "{BATCH_ID}",
"imsOrg": "{IMS_ORG}",
"updated": 0,
"status": "loading",
"created": 0,
"relatedObjects": [
{
"type": "dataSet",
"id": "{DATASET_ID}"
}
],
"version": "1.0.0",
"tags": {},
"createdUser": "{USER_ID}",
"updatedUser": "{USER_ID}"
}
Property | Description |
---|---|
id |
The ID of the batch that was just created (used in subsequent requests). |
relatedObjects.id |
The ID of the dataset to upload the files into. |
After successfully creating a new batch for uploading, files can then be uploaded to a specific dataset.
You can upload files using the Small File Upload API. However, if your files are too large and the gateway limit is exceeded (such as extended timeouts, requests for body size exceeded, and other constrictions), you can switch over to the Large File Upload API. This API uploads the file in chunks, and stitches data together using the Large File Upload Complete API call.
The examples below use the Apache Parquet file format. An example that uses the JSON file format can be found in the batch ingestion developer guide.
Once a batch is created, data can be uploaded to a preexisting dataset. The file being uploaded must match its referenced XDM schema.
PUT /batches/{BATCH_ID}/datasets/{DATASET_ID}/files/{FILE_NAME}
Property | Description |
---|---|
{BATCH_ID} |
The ID of the batch. |
{DATASET_ID} |
The ID of the dataset to upload files. |
{FILE_NAME} |
The name of file as it will be seen in the dataset. |
Request
curl -X PUT "https://platform.adobe.io/data/foundation/import/batches/{BATCH_ID}/datasets/{DATASET_ID}/files/{FILE_NAME}.parquet" \
-H "content-type: application/octet-stream" \
-H "x-gw-ims-org-id: {IMS_ORG}" \
-H "x-sandbox-name: {SANDBOX_NAME}" \
-H "Authorization: Bearer {ACCESS_TOKEN}" \
-H "x-api-key : {API_KEY}" \
--data-binary "@{FILE_PATH_AND_NAME}.parquet"
Property | Description |
---|---|
{FILE_PATH_AND_NAME} |
The path and filename of the file to be uploaded into the dataset. |
Reponse
#Status 200 OK, with empty response body
To upload a large file, the file must be split into smaller chunks and uploaded one at a time.
POST /batches/{BATCH_ID}/datasets/{DATASET_ID}/files/{FILE_NAME}?action=initialize
Property | Description |
---|---|
{BATCH_ID} |
The ID of the batch. |
{DATASET_ID} |
The ID of the dataset ingesting the files. |
{FILE_NAME} |
The name of file as it will be seen in the dataset. |
Request
curl -X POST "https://platform.adobe.io/data/foundation/import/batches/{BATCH_ID}/datasets/{DATASET_ID}/files/part1=a/part2=b/{FILE_NAME}.parquet?action=initialize" \
-H "x-gw-ims-org-id: {IMS_ORG}" \
-H "x-sandbox-name: {SANDBOX_NAME}" \
-H "Authorization: Bearer {ACCESS_TOKEN}" \
-H "x-api-key: {API_KEY}"
Reponse
#Status 201 CREATED, with empty response body
After the file has been created, all subsequent chunks can be uploaded by making repeated PATCH requests, one for each section of the file.
PATCH /batches/{BATCH_ID}/datasets/{DATASET_ID}/files/{FILE_NAME}
Property | Description |
---|---|
{BATCH_ID} |
The ID of the batch. |
{DATASET_ID} |
The ID of the dataset to upload the files into. |
{FILE_NAME} |
Name of file as it will be seen in the dataset. |
Request
curl -X PATCH "https://platform.adobe.io/data/foundation/import/batches/{BATCH_ID}/datasets/{DATASET_ID}/files/part1=a/part2=b/{FILE_NAME}.parquet" \
-H "content-type: application/octet-stream" \
-H "x-gw-ims-org-id: {IMS_ORG}" \
-H "x-sandbox-name: {SANDBOX_NAME}" \
-H "Authorization: Bearer {ACCESS_TOKEN}" \
-H "x-api-key: {API_KEY}" \
-H "Content-Range: bytes {CONTENT_RANGE}" \
--data-binary "@{FILE_PATH_AND_NAME}.parquet"
Property | Description |
---|---|
{FILE_PATH_AND_NAME} |
The path and filename of the file to be uploaded into the dataset. |
Reponse
#Status 200 OK, with empty response
After all files have been uploaded to the batch, the batch can be signaled for completion. By doing this, the Catalog DataSetFile entries are created for the completed files and associated with the batch generated above. The Catalog batch is then marked as successful, which triggers downstream flows to ingest the available data.
Request
POST /batches/{BATCH_ID}?action=COMPLETE
Property | Description |
---|---|
{BATCH_ID} |
The ID of the batch to be uploaded into the dataset. |
curl -X POST "https://platform.adobe.io/data/foundation/import/batches/{BATCH_ID}?action=COMPLETE" \
-H "x-gw-ims-org-id: {IMS_ORG}" \
-H "x-sandbox-name: {SANDBOX_NAME}" \
-H "Authorization: Bearer {ACCESS_TOKEN}" \
-H "x-api-key : {API_KEY}"
Reponse
#Status 200 OK, with empty response
While waiting for the files to uploaded to the batch, the batch’s status can be checked to see its progress.
API format
GET /batch/{BATCH_ID}
Property | Description |
---|---|
{BATCH_ID} |
The ID of the batch that is being checked. |
Request
curl GET "https://platform.adobe.io/data/foundation/catalog/batch/{BATCH_ID}" \
-H "Authorization: Bearer {ACCESS_TOKEN}" \
-H "x-gw-ims-org-id: {IMS_ORG}" \
-H "x-sandbox-name: {SANDBOX_NAME}" \
-H "x-api-key: {API_KEY}"
Reponse
{
"{BATCH_ID}": {
"imsOrg": "{IMS_ORG}",
"created": 1494349962314,
"createdClient": "MCDPCatalogService",
"createdUser": "{USER_ID}",
"updatedUser": "{USER_ID}",
"updated": 1494349963467,
"externalId": "{EXTERNAL_ID}",
"status": "success",
"errors": [
{
"code": "err-1494349963436"
}
],
"version": "1.0.3",
"availableDates": {
"startDate": 1337,
"endDate": 4000
},
"relatedObjects": [
{
"type": "batch",
"id": "foo_batch"
},
{
"type": "connection",
"id": "foo_connection"
},
{
"type": "connector",
"id": "foo_connector"
},
{
"type": "dataSet",
"id": "foo_dataSet"
},
{
"type": "dataSetView",
"id": "foo_dataSetView"
},
{
"type": "dataSetFile",
"id": "foo_dataSetFile"
},
{
"type": "expressionBlock",
"id": "foo_expressionBlock"
},
{
"type": "service",
"id": "foo_service"
},
{
"type": "serviceDefinition",
"id": "foo_serviceDefinition"
}
],
"metrics": {
"foo": 1337
},
"tags": {
"foo_bar": [
"stuff"
],
"bar_foo": [
"woo",
"baz"
],
"foo/bar/foo-bar": [
"weehaw",
"wee:haw"
]
},
"inputFormat": {
"format": "parquet",
"delimiter": ".",
"quote": "`",
"escape": "\\",
"nullMarker": "",
"header": "true",
"charset": "UTF-8"
}
}
}
Property | Description |
---|---|
{USER_ID} |
The ID of the user who created or updated the batch. |
The "status"
field is what shows the current status of the batch requested. The batches can have one of the following states:
Status | Description |
---|---|
Abandoned | The batch has not completed in the expected timeframe. |
Aborted | An abort operation has explicitly been called (via Batch Ingest API) for the specified batch. Once the batch is in a “Loaded” state, it cannot be aborted. |
Active | The batch has been successfully promoted and is available for downstream consumption. This status can be used interchangeably with “Success”. |
Deleted | Data for the batch has been completely removed. |
Failed | A terminal state that results from either bad configuration and/or bad data. Data for a failed batch will not show up. This status can be used interchangeably with “Failure”. |
Inactive | The batch was successfully promoted, but has been reverted or has expired. The batch is no longer available for downstream consumption. |
Loaded | Data for the batch is complete and the batch is ready for promotion. |
Loading | Data for this batch is being uploaded and the batch is currently not ready to be promoted. |
Retrying | The data for this batch is being processed. However, due to a system or transient error, the batch failed - as a result, this batch is being retried. |
Staged | The staging phase of the promotion process for a batch is complete and the ingestion job has been run. |
Staging | Data for the batch is being processed. |
Stalled | The data for the batch is being processed. However, the batch promotion has stalled after a number of retries. |