[Beta]{class="badge informative"}
Use the Databricks Delta Sharing source connector in the UI use-deltashare-in-the-ui
Read this guide to learn how to use the Databricks Delta Sharing source connector in the Adobe Experience Platform user interface.
Get started
This tutorial requires a working understanding of the following Experience Platform components:
- Sources: Use Sources to create connections and dataflows for supported external data sources.
- Experience Data Model (XDM) schemas: Shared tables are represented in Experience Platform through relational schemas.
- Datasets: Shared data is represented as virtual datasets in Experience Platform. The source data is not physically ingested or copied into the Experience Platform data lake.
- Query Service / Data Distiller: Use Query Service or Data Distiller to query and work with virtual datasets.
Navigate the sources catalog
In the Experience Platform UI, select Sources from the left navigation to access the Sources workspace. Select the appropriate category in the Categories panel. Alternatively, use the search bar to navigate to the specific source that you want to use.
To use Delta Sharing, select the Delta Sharing for Databricks source card under the Data sharing and then select Add data.
Use an existing account
To use an existing account, select Existing account and select the Delta Sharing account that you want to use from the accounts interface.
Create a new account
To create a new account, select New account and provide a name and an optional description for your account. Provide values for the following authentication credentials:
- Endpoint
- Bearer token
- Share credentials version
- Expiration time
When finished, select Connect to source and allow for a few moments for your connection to establish.
Select your data
Next, select the for which you want to create a virtual dataset in Experience Platform and platform-based applications. Use the table directory to navigate to the desired data and use the preview interface to view the contents and structure of the selected data. When finished, select Next to select columns for your schema.
Select your schema
After selecting a table from your Delta Sharing source, Experience Platform automatically infers the relational schema. At this stage, you are required to provide a schema name before proceeding. Optionally, you may also specify a primary key and a version descriptor to further define your schema.
Primary key: Set a primary key if your table has one. Consider the following factors when selecting a primary key:
- Select a key that is unique per row for the logical entity you care about (e.g., one row per order, per customer, per transaction).
- Select a key that is stable over time (doesn’t change once written).
- Select a key that is not a high‑cardinality, non‑business surrogate that is meaningless for governance (e.g., a random “row_id” that the upstream regenerates).
Version descriptor: The version descriptor marks a column that tells you which row is the “latest” record for a given key. Use this as a reference in the case that your table keeps multiple versions of the same entity, and you want a well‑defined way to choose the current or latest one. Consider the following factors when selecting a version descriptor:
- A timestamp such as
last_updated_atormodified_ts. - An increasing numeric version such as
version_numorsequence_number.
You can leave the version descriptor empty if you fall into the following scenarios:
- The table is purely transactional / event‑level (This means that each row is a one‑time event and doesn’t represent a mutable “entity” with versions).
- There’s no reliable “latest” indicator column.
- You haven’t validated what the timestamp/version column really means.
Provide dataset and dataflow details
A dataset is a management construct for a collection of data, typically a table, that contains a schema with columns or fields. In Data Sharing, the selected data is represented in Experience Platform as a virtual dataset. The data remains in the source system and is not ingested or persisted into the data lake.
Once your virtual dataset is configured, provide details for your dataflow, including a name, an optional description, and alert configurations.
Experience Platform can produce event-based alerts which users can subscribe to, these options allow a running dataflow to trigger these. For more information, read the alerts overview
- Sources Dataflow Run Start: Select this alert to receive a notification when your dataflow run begins.
- Sources Dataflow Run Success: Select this alert to receive a notification if your dataflow ends without any errors.
- Sources Dataflow Run Failure: Select this alert to receive a notification if your dataflow run ends with any errors.
Review your dataflow
The Review step appears, allowing you to review the details of your dataflow before it is created. Details are grouped within the following categories:
- Connection: Shows the account name, source platform, and the source name.
- Assign dataset and map fields: Shows the target dataset and the schema that the dataset adheres to.
After confirming the details are correct, select Finish.
Monitor your dataflow
Once your dataflow has been created, you can monitor its status and activity to view information such as run status, success, and errors. For more information, see the tutorial on monitoring accounts and dataflows in the UI.