Streaming ingestion validation
- 主题:
- 数据摄入
创建对象:
- 开发人员
Streaming ingestion allows you to upload your data to Adobe Experience Platform using streaming endpoints in real time. Streaming ingestion APIs support two modes of validation - synchronous and asynchronous.
Getting started
This guide requires a working understanding of the following components of Adobe Experience Platform:
- Experience Data Model (XDM) System: The standardized framework by which Experience Platform organizes customer experience data.
- Streaming Ingestion: One of the methods by which data can be sent to Experience Platform.
Reading sample API calls
This tutorial provides example API calls to demonstrate how to format your requests. These include paths, required headers, and properly formatted request payloads. Sample JSON returned in API responses is also provided. For information on the conventions used in documentation for sample API calls, see the section on how to read example API calls in the Experience Platform troubleshooting guide.
Gather values for required headers
In order to make calls to Platform APIs, you must first complete the authentication tutorial. Completing the authentication tutorial provides the values for each of the required headers in all Experience Platform API calls, as shown below:
- Authorization: Bearer
{ACCESS_TOKEN}
- x-api-key:
{API_KEY}
- x-gw-ims-org-id:
{ORG_ID}
All resources in Experience Platform, including those belonging to the Schema Registry, are isolated to specific virtual sandboxes. All requests to Platform APIs require a header that specifies the name of the sandbox the operation will take place in:
- x-sandbox-name:
{SANDBOX_NAME}
All requests that contain a payload (POST, PUT, PATCH) require an additional header:
- Content-Type:
application/json
Validation coverage
Streaming Validation Service covers validation in the following areas:
- Range
- Presence
- Enum
- Pattern
- Type
- Format
Synchronous validation
Synchronous validation is a method of validation that provides immediate feedback about why an ingestion failed. However, upon failure, the records that fail validation are dropped and prevented from being sent downstream. As a result, synchronous validation should only be used during the development process. When doing synchronous validation, the callers are informed of both the result of the XDM validation, and, if it failed, the reason for failure.
By default, synchronous validation is not turned on. To enable it, you must pass in the optional query parameter syncValidation=true
when making API calls. In addition, synchronous validation is currently only available if your stream endpoint is on the VA7 data center.
syncValidation
query parameter is only available for the single message endpoint and cannot be used for the batch endpoint.If a message fails during synchronous validation, the message will not be written to the output queue, which provides immediate feedback for users.
API format
POST /collection/{CONNECTION_ID}?syncValidation=true
{CONNECTION_ID}
id
value of the streaming connection previously created.Request
Submit the following request to ingest data to your data inlet with synchronous validation:
curl -X POST https://dcs.adobedc.net/collection/{CONNECTION_ID}?syncValidation=true \
-H "Content-Type: application/json" \
-d '{JSON_PAYLOAD}'
{JSON_PAYLOAD}
Response
With synchronous validation enabled, a successful response includes any encountered validation errors in its payload:
{
"type": "http://ns.adobe.com/adobecloud/problem/data-collection-service/inlet",
"status": 400,
"title": "Invalid XDM Message Format",
"report": {
"message": "inletId: [6aca7aa2d87ebd6b2780ca5724d94324a14475f140a2b69373dd5c714430dfd4] imsOrgId: [7BF122A65C5B3FE40A494026@AdobeOrg] Message is invalid",
"cause": {
"_streamingValidation": [
{
"schemaLocation": "#",
"pointerToViolation": "#",
"causingExceptions": [
{
"schemaLocation": "#",
"pointerToViolation": "#",
"causingExceptions": [],
"keyword": "additionalProperties",
"message": "extraneous key [workEmail] is not permitted"
},
{
"schemaLocation": "#",
"pointerToViolation": "#",
"causingExceptions": [],
"keyword": "additionalProperties",
"message": "extraneous key [person] is not permitted"
},
{
"schemaLocation": "#/properties/_id",
"pointerToViolation": "#/_id",
"causingExceptions": [],
"keyword": "type",
"message": "expected type: String, found: Long"
}
],
"message": "3 schema violations found"
}
]
}
}
}
The above response lists how many schema violations were found, and what the violations were. For example, this response states that the keys workEmail
and person
were not defined in the schema, and therefore are not allowed. It also flags the value for _id
as incorrect, since the schema expected a string
, but a long
was inserted instead. Note that once five errors are encountered, validation service will stop processing that message. Other messages will continue to be parsed, however.
Asynchronous validation
Asynchronous validation is a method of validation that does not provide immediate feedback. Instead, the data is sent to a failed batch in Data Lake to prevent data loss. This failed data can be later retrieved for further analysis and replay. This method should be used in production. Unless otherwise requested, streaming ingestion operates in asynchronous validation mode.
API format
POST /collection/{CONNECTION_ID}
{CONNECTION_ID}
id
value of the streaming connection previously created.Request
Submit the following request to ingest data to your data inlet with asynchronous validation:
curl -X POST https://dcs.adobedc.net/collection/{CONNECTION_ID} \
-H "Content-Type: application/json" \
-d '{JSON_PAYLOAD}'
{JSON_PAYLOAD}
Response
With asynchronous validation enabled, a successful response returns the following:
{
"inletId": "f6ca9706d61de3b78be69e2673ad68ab9fb2cece0c1e1afc071718a0033e6877",
"xactionId": "1555445493896:8600:8",
"receivedTimeMs": 1555445493932,
"syncValidation": {
"skipped": true
}
}
Please note how the response states that synchronous validation has been skipped, as it has not been explicitly requested.
Appendix
This section contains information about what the various status codes mean for responses for ingesting data.