Data Ingestion overview

Adobe Experience Platform brings data from multiple sources together in order to help marketers better understand the behavior of their customers. Adobe Experience Platform Data Ingestion represents the multiple methods by which Experience Platform ingests data from these sources, as well as how that data is persisted within the Data Lake for use by downstream Experience Platform services.

This document introduces the three main ways in which data is ingested into Experience Platform, with links to their respective overview documentation for more detailed information.

Batch ingestion

Batch ingestion allows you to ingest data into Experience Platform as batch files. Batches are units of data that consist of one or more files to be ingested as a single unit. Once ingested, batches provide metadata that describes the number of records successfully ingested, as well as any failed records and associated error messages.

Manually uploaded datafiles such as flat CSV files (mapped to XDM schemas) and Parquet dataframes must be ingested using this method.

See the batch ingestion overview for more information.

TIP
Use single-line JSON instead of multi-line JSON as input for batch ingestion. Single-line JSON allows for better performance as the system can divide one input file into multiple chunks and process them in parallel, whereas multi-line JSON cannot be split. This can significantly reduce data processing costs and improve batch processing latency.

Streaming ingestion

Streaming ingestion allows you to send data from client- and server-side devices to Experience Platform in real time. Experience Platform supports the use of data inlets to stream incoming experience data, which is persisted in streaming-enabled datasets within the Data Lake. Data inlets can be configured to automatically authenticate the data they collect, ensuring that the data is coming from a trusted source.

See the streaming ingestion overview for more information.

Sources

Experience Platform allows you to set up source connections to various data providers. These connections enable you to authenticate to your external data sources, set times for ingestion runs, and manage ingestion throughput.

Source connections can be configured to gather data from other Adobe applications (such as Adobe Analytics and Adobe Audience Manager), third-party cloud storage sources (such as Azure Blob, Amazon S3, FTP servers, and SFTP servers), and third-party CRM systems (such as Microsoft Dynamics and Salesforce).

See the Sources overview for more information.

ML-Assisted schema creation ml-assisted-schema-creation

To quickly integrate new data sources, you can now use machine learning algorithms to generate a schema from sample data. This automation simplifies the creation of accurate schemas, reduces errors, and speeds up the process from data collection to analysis and insights.

See the ML-assisted schema creation guide for more information on this workflow.

Next steps and additional resources

This document provided a brief introduction to the different aspects of Data Ingestion in Experience Platform. Please continue to read the overview documentation for each ingestion method to familiarize yourself with their different capabilities, use cases, and best practices. You can also supplement your learning by watching the ingestion overview video below. For information on how Experience Platform tracks the metadata for ingested records, see the Catalog Service overview.

WARNING
The term “Unified Profile” thats used in the following video is out-of-date. The terms “Profile” or “Real-Time Customer Profile” are the correct terms used in the Experience Platform documentation. Please refer to the documentation for the latest functionality.
recommendation-more-help
2ee14710-6ba4-4feb-9f79-0aad73102a9a