Import batch data to AEP
AEP can ingest batch files that contain profile data from a flat file (such as parquet) or data that conforms to a known schema in the Experience Data Model (XDM) registry.
AEP can ingest data using batch files. The following formats are accepted: JSON, Parquet, and CSV.
This article will cover the following:
- Batch ingestion prerequisites
- Batch ingestion best practices and limits
- How to create a batch
- How to complete a batch
- How to check the status of a batch
The Postman collection is referenced throughout the article using the associated calls by number. More details on installing and using the Postman collection are available on the Github README page. There are also sample datasets of loyalty and profile data.
For all calls in this tutorial, use Postman call folders: 4: Batch Import, 4a: Batch import for PROFILE data OR 4b: Batch import for EVENT data.
Batch ingestion prerequisites
- Define a schema and create a dataset.
- Data must be formatted in JSON, Parquet, or CSV.
- Authenticate to the platform.
- Gather the values for required headers from the authentication tutorial linked above.
Batch ingestion best practices and limits
- Maximum batch size: 100 GB
- Maximum number of files per batch: 1500
- If a file is larger than 512MB it will need to be divided into smaller chunks. More details can be found in the developer guide
- Maximum number of properties or fields per row: 10,000
- Maximum number of batches per minute, per user: 138
Create a batch
In this tutorial we will use JSON as the format. More format examples can be found in the developer guide
Create a batch using JSON as the input format (be sure to include a dataset ID and that your data conforms to the XDM schema linked to the dataset):
curl -X POST "https://platform.adobe.io/data/foundation/import/batches" \
-H "Accept: application/json" \
-H "x-gw-ims-org-id: {IMS_ORG}" \
-H "x-sandbox-name: {SANDBOX_NAME}" \
-H "Authorization: Bearer {ACCESS_TOKEN}" \
-H "x-api-key : {API_KEY}"
-d '{
"datasetId": "{DATASET_ID}",
"inputFormat": {
"format": "json"
}
}'
Response:
{
"id": "{BATCH_ID}",
"imsOrg": "{IMS_ORG}",
"updated": 0,
"status": "loading",
"created": 0,
"relatedObjects": [
{
"type": "dataSet",
"id": "{DATASET_ID}"
}
],
"version": "1.0.0",
"tags": {},
"createdUser": "{USER_ID}",
"updatedUser": "{USER_ID}"
}
Upload files
Files can now be uploaded to the newly created batch (using the batch_id from the response above).
curl -X PUT "https://platform.adobe.io/data/foundation/import/batches/{BATCH_ID}/datasets/{DATASET_ID}/files/{FILE_NAME}.json" \
-H "content-type: application/octet-stream" \
-H "x-gw-ims-org-id: {IMS_ORG}" \
-H "x-sandbox-name: {SANDBOX_NAME}" \
-H "Authorization: Bearer {ACCESS_TOKEN}" \
-H "x-api-key : {API_KEY}" \
--data-binary "@{FILE_PATH_AND_NAME}.json"
Response:
200 OK
Complete a batch
Once all the files have been uploaded, this call will signal that the batch is ready for promotion:
curl -X POST "https://platform.adobe.io/data/foundation/import/batches/{BATCH_ID}?action=COMPLETE" \
-H "x-gw-ims-org-id: {IMS_ORG}" \
-H "x-sandbox-name: {SANDBOX_NAME}" \
-H "Authorization: Bearer {ACCESS_TOKEN}" \
-H "x-api-key : {API_KEY}"
Response:
200 OK
Check the status of a batch
The batch status can be checked in the UI or via the API (see call below). To check in the UI, navigate to the DataSet to see the status.
The various batch ingestion statuses can be found here.
curl GET "https://platform.adobe.io/data/foundation/catalog/batch/{BATCH_ID}" \
-H "Authorization: Bearer {ACCESS_TOKEN}" \
-H "x-gw-ims-org-id: {IMS_ORG}" \
-H "x-sandbox-name: {SANDBOX_NAME}" \
-H "x-api-key: {API_KEY}"
Response:
{
"{BATCH_ID}": {
"imsOrg": "{IMS_ORG}",
"created": 1494349962314,
"createdClient": "MCDPCatalogService",
"createdUser": "{USER_ID}",
"updatedUser": "{USER_ID}",
"updated": 1494349963467,
"externalId": "{EXTERNAL_ID}",
"status": "success",
"errors": [
{
"code": "err-1494349963436"
}
],
"version": "1.0.3",
"availableDates": {
"startDate": 1337,
"endDate": 4000
},
"relatedObjects": [
{
"type": "batch",
"id": "foo_batch"
},
{
"type": "connection",
"id": "foo_connection"
},
{
"type": "connector",
"id": "foo_connector"
},
{
"type": "dataSet",
"id": "foo_dataSet"
},
{
"type": "dataSetView",
"id": "foo_dataSetView"
},
{
"type": "dataSetFile",
"id": "foo_dataSetFile"
},
{
"type": "expressionBlock",
"id": "foo_expressionBlock"
},
{
"type": "service",
"id": "foo_service"
},
{
"type": "serviceDefinition",
"id": "foo_serviceDefinition"
}
],
"metrics": {
"foo": 1337
},
"tags": {
"foo_bar": [
"stuff"
],
"bar_foo": [
"woo",
"baz"
],
"foo/bar/foo-bar": [
"weehaw",
"wee:haw"
]
},
"inputFormat": {
"format": "parquet",
"delimiter": ".",
"quote": "`",
"escape": "\\",
"nullMarker": "",
"header": "true",
"charset": "UTF-8"
}
}
}