Data preparation and ingestion blueprint

Data preparation and ingestion Blueprint encompasses all the methods by which data can be prepared and ingested into Adobe Experience Platform.

Data preparation includes the mapping of source data to Experience Data Model (XDM) schema. It also includes performing transformations on data, including date formatting, field splitting/concatenation/conversions, and joining/merging/re-keying of records. Data preparation helps unify customer data to provide aggregated/filtered analysis, including reporting or preparing data for customer profile assembly/data science/activation.

Architecture

Reference architecture for the Data Preparation and Ingestion Blueprint {modal="regular"}

Data ingestion guardrails

The below diagram illustrates the average performance guardrails and latency for data ingestion into Adobe Experience Platform.

Experience Platform Data Flow {width="90%" modal="regular"}

Data ingestion methods

Streaming Sources
Method
Common Use Cases
Protocols
Considerations
Adobe Web/Mobile SDK
  • Data collection from websites and mobile apps.
  • Preferred method for client side collection.
Push, HTTP, JSON
  • Implement multiple Adobe applications leveraging a single SDK.
HTTP API Connector
  • Collection from streaming sources, transactions, relevant customer events and signals
Push, REST API, JSON
  • Data is streamed directly to the hub so no real-time Edge segmentation or event forwarding.
Edge Network API
  • Collection from streaming sources, transactions, relevant customer events and signals from the globally distributed Edge Network
Push, REST API, JSON
  • Data is streamed through the Edge Network. Support for real-time segmentation on the Edge.
Adobe Applications
  • Prior implementation of Adobe Analytics, Marketo, Campaign, Target, AAM
Push, Source Connectors and API
  • Recommended approach is migration to Web/Mobile SDK over traditional application SDKs.
Streaming Source Connectors
  • Ingestion of a enterprise event stream, typically used for sharing enterprise data to multiple down-stream applications.
Push, REST API, JSON
  • Must be streamed in XDM format.
Streaming Sources SDK
  • Similar to HTTP API Connector, allows self-service configuration card of a external data stream.
Push, HTTP API, JSON
  • Edge Network
Batch Sources
Method
Common Use Cases
Protocols
Considerations
Batch Ingestion API
  • Ingestion from a enterprise managed que. Cleansing and transformation of data prior to ingestion.
Push, JSON or Parquet
  • Must manage batches and files for ingestion
Batch Source Connectors
  • Common approach for ingestion of files from cloud storage locations.
  • Connectors to common CRM and marketing applications.
  • Ideal for ingesting large amounts of historical data.
Pull, CSV, JSON, Parquet
  • Not always on, immediate ingestion.
  • Recurring frequency checks to ingest delta files minimum every 15 minutes.
Data Landing Zone
  • Adobe provisioned file storage location to push files to for ingestion.
Push, CSV, JSON, Parquet
- Files are provided a 7 day TTL
Batch Sources SDK
  • Allows self-service configuration card of an external data source.
  • Ideal for partner connectors or for a tailored workflow experience for setting up a enterprise connector.
Pull, REST API, CSV or JSON Files
  • 15 min minimum frequency
  • Examples: MailChimp, One Trust, Zendesk
Methods of Ingestion
Description
Web/Mobile SDK

Latency:

  • Real time - same page collection to Edge Network
  • Streaming ingestion to Profile < 15 minutes at the 95th percentile
  • Streaming ingestion to data lake (micro batch ~15 minutes)

Documentation:

Streaming Sources

Streaming Sources
Latency:

  • Real time - same page collection to Edge Network
  • Streaming ingestion to Profile ~1 minute
  • Streaming ingestion to data lake (micro batch ~15 minutes)
Streaming API

Edge Network Server API (preferred) - supports Edge Services including Edge Segmentation and
Data Collection Core Service API - does not support Edge Services, routes directly to the hub.
Latency:

  • Real time - same page collection to Edge Network
  • Streaming ingestion to Profile ~1 minute
  • Streaming ingestion to data lake (micro batch ~15 minutes)
  • 7 GB/hour

Documentation

ETL Tooling

Use ETL tools to modify and transform enterprise data before ingestion into Experience Platform.

Latency:

  • Timing dependent on external ETL tool scheduling, then standard ingestion guardrails apply based on the method used for ingestion.
Batch Sources
Scheduled fetch from sources
Latency: ~ 200 GB/hour

Documentation
Video Tutorials
Batch API

Latency:

  • Batch ingestion to Profile dependent on size and traffic loads ~45 minutes
  • Batch ingestion to data lake dependent on size and traffic loads

Documentation

Adobe Application Connectors

Automatically ingest data that is sourced from Adobe Experience Cloud Applications

Data preparation methods

Methods of Data Preparation
Description
External ETL Tool (Snaplogic, Mulesoft, Informatica, etc.)
Perform complex transformations in ETL tooling and use standard Experience Platform Flow Service APIs or source connectors to ingest the resultant data.
Query Service - Data Prep
Joins, Splits, Merge, Transform, Query, and Filter data into a new dataset. Using Create Table as Select (CTAS)
Documentation
XDM Mapper & Data Prep functions (Streaming and Batch)
Map source attributes in CSV or JSON format into XDM attributes during Experience Platform ingestion.
Compute functions on data as it is ingested; that is, data formatting, splitting, concatenation, and so on.
Documentation
recommendation-more-help
045b7d44-713c-4708-a7a6-5dea7cc2546b