Data Preparation and Ingestion blueprint

Last update: 2023-10-31
  • Created for:
  • Developer
    User

Data Preparation and Ingestion Blueprint encompasses all the methods by which data can be prepared and ingested into Adobe Experience Platform.

Data preparation includes the mapping of source data to Experience Data Model (XDM) schema. It also includes performing transformations on data, including date formatting, field splitting/concatenation/conversions, and joining/merging/re-keying of records. Data preparation helps unify customer data to provide aggregated/filtered analysis, including reporting or preparing data for customer profile assembly/data science/activation.

Architecture

Reference architecture for the Data Preparation and Ingestion Blueprint

Data ingestion guardrails

The below diagram illustrates the average performance guardrails and latency for data ingestion into Adobe Experience Platform.

Experience Platform Data Flow

Data ingestion methods

Streaming Sources

Method

Common Use Cases

Protocols

Considerations

Adobe Web/Mobile SDK

  • Data collection from websites and mobile apps.
  • Preferred method for client side collection.

Push, HTTP, JSON

  • Implement multiple Adobe applications leveraging a single SDK.

HTTP API Connector

  • Collection from streaming sources, transactions, relevant customer events and signals

Push, REST API, JSON

  • Data is streamed directly to the hub so no real-time Edge segmentation or event forwarding.

Edge Network API

  • Collection from streaming sources, transactions, relevant customer events and signals from the globally distributed Edge network

Push, REST API, JSON

  • Data is streamed through the Edge Network. Support for real-time segmentation on the Edge.

Adobe Applications

  • Prior implementation of Adobe Analytics, Marketo, Campaign, Target, AAM

Push, Source Connectors and API

  • Recommended approach is migration to Web/Mobile SDK over traditional application SDKs.

Streaming Source Connectors

  • Ingestion of a enterprise event stream, typically used for sharing enterprise data to multiple down-stream applications.

Push, REST API, JSON

  • Must be streamed in XDM format.

Streaming Sources SDK

  • Similar to HTTP API Connector, allows self-service configuration card of a external data stream.

Push, HTTP API, JSON

  • Edge Network

Batch Sources

Method

Common Use Cases

Protocols

Considerations

Batch Ingestion API

  • Ingestion from a enterprise managed que. Cleansing and transformation of data prior to ingestion.

Push, JSON or Parquet

  • Must manage batches and files for ingestion

Batch Source Connectors

  • Common approach for ingestion of files from cloud storage locations.
  • Connectors to common CRM and marketing applications.
  • Ideal for ingesting large amounts of historical data.

Pull, CSV, JSON, Parquet

  • Not always on, immediate ingestion.
  • Recurring frequency checks to ingest delta files minimum every 15 minutes.

Data Landing Zone

  • Adobe provisioned file storage location to push files to for ingestion.

Push, CSV, JSON, Parquet

- Files are provided a 7 day TTL

Batch Sources SDK

  • Allows self-service configuration card of an external data source.
  • Ideal for partner connectors or for a tailored workflow experience for setting up a enterprise connector.

Pull, REST API, CSV or JSON Files

  • 15 min minimum frequency
  • Examples: MailChimp, One Trust, Zendesk

 

Methods of Ingestion Description
Web/Mobile SDK Latency:
  • Real time - same page collection to Edge Network
  • Streaming ingestion to Profile < 15 minutes at the 95th percentile
  • Streaming ingestion to data lake (micro batch ~15 minutes)
Documentation:
Streaming Sources Streaming Sources
Latency:
  • Real time - same page collection to Edge Network
  • Streaming ingestion to Profile ~1 minute
  • Streaming ingestion to data lake (micro batch ~15 minutes)
Streaming API Edge Network Server API (preferred) - supports Edge Services including Edge Segmentation and
Data Collection Core Service API - does not support Edge Services, routes directly to the hub.
Latency:
  • Real time - same page collection to Edge Network
  • Streaming ingestion to Profile ~1 minute
  • Streaming ingestion to data lake (micro batch ~15 minutes)
  • 7 GB/hour
Documentation
ETL Tooling Use ETL tools to modify and transform enterprise data before ingestion into Experience Platform.

Latency:
  • Timing dependent on external ETL tool scheduling, then standard ingestion guardrails apply based on the method used for ingestion.
Batch Sources Scheduled fetch from sources
Latency: ~ 200 GB/hour

Documentation
Video Tutorials
Batch API Latency:
  • Batch ingestion to Profile dependent on size and traffic loads ~45 minutes
  • Batch ingestion to data lake dependent on size and traffic loads
Documentation
Adobe Application Connectors Automatically ingest data that is sourced from Adobe Experience Cloud Applications

Data preparation methods

Methods of Data Preparation Description
External ETL Tool (Snaplogic, Mulesoft, Informatica, etc.) Perform complex transformations in ETL tooling and use standard Experience Platform Flow Service APIs or source connectors to ingest the resultant data.
Query Service - Data Prep Joins, Splits, Merge, Transform, Query, and Filter data into a new dataset. Using Create Table as Select (CTAS)
Documentation
XDM Mapper & Data Prep functions (Streaming and Batch) Map source attributes in CSV or JSON format into XDM attributes during Experience Platform ingestion.
Compute functions on data as it is ingested; that is, data formatting, splitting, concatenation, and so on.
Documentation

On this page