Access dataset batch metadata

To ensure that system columns (metadata columns) are included in the query results, use the SQL command set drop_system_columns=false in your Query Editor. This configures the behavior of your SQL query session. This input must be repeated if you start a new session.

Next, to view the system fields of the dataset, execute a SELECT all statement to display the results from the dataset, for example select * from movie_data. The results include two new columns on the right-hand side _acp_system_metadata and _ACP_BATCHID. The metadata columns _acp_system_metadata and _ACP_BATCHID help identify the logical and physical partitions of ingested data.

The DBVisualizer UI with the movie_data table and its metadata columns displayed and highlighted.

When data is ingested into Experience Platform, it is assigned a logical partition based on the incoming data. This logical partition is represented by _acp_system_metadata.sourceBatchId. This ID helps to group and identify the data batches logically before they are processed and stored.

After the data is processed and ingested into the data lake, it is assigned a physical partition represented by _ACP_BATCHID. This ID reflects the actual storage partition in the data lake where the ingested data resides.

Use SQL to understand logical and physical partitions

To help understand how the data is grouped and distributed after ingestion, use the following query to count the number of distinct physical partitions (_ACP_BATCHID) for each logical partition (_acp_system_metadata.sourceBatchId).

SELECT  _acp_system_metadata, COUNT(DISTINCT _ACP_BATCHID) FROM movie_data
GROUP BY _acp_system_metadata

The results of this query are shown in the image below.

The results of a query to show the number of distinct physical partitions for each logical partition.

These results demonstrate that the number of input batches does not necessarily match the number of output batches, as the system determines the most efficient way to batch and store the data in the data lake.

For the purpose of this example, it is assumed that you have ingested a CSV file into Experience Platform and created a dataset called drug_checkout_data.

The drug_checkout_data file is a deeply nested set of 35,000 records. Use the SQL statement SELECT * FROM drug_orders; to preview of the first set of records in the JSON-based drug_orders dataset.

The image below shows a preview of the file and its records.

A preview of the first set of records in the JSON-based drug_orders dataset.