Ingest data using database source connector

This video walks through how to perform a batch ingest of data from a database source into Adobe Experience Platform’s Real-Time Customer Profile and Experience Data Lake, in a seamless and scalable manner. For more detailed product documentation, see database on the Source Connectors overview page or the Google Big Query source connector documentation.

Transcript

Hi there, I’m going to give you a quick overview of how to ingest data from your database systems into Adobe Experience Platform. Data ingestion is a fundamental step to getting your data in Experience Platform so you can use it to build 360 degree real-time customer profiles and use them to provide meaningful experiences. Adobe Experience Platform allows data to be ingested from various sources while giving you the ability to structure, label and enhance incoming data using platform services. In this video, let me show you how to configure a database source connector and ingest data to a dataset in Experience Platform. We will also cover how to enable data for real-time customer profile. When you log into platform, you will see sources in the left navigation. Clicking sources will take you to the source catalog screen where you can see all of the source connectors currently available in platform. Note that, there are source connectors for Adobe applications, CRM solutions, databases and more. Let’s explore how to ingest data from databases into Experience Platform. Each database has its specific configuration details, but the general configuration for a database source connectors are somewhat similar. For our video, let’s use the Google BigQuery database as a source. When setting up a source connector for the very first time, you will be provided with an option to configure. For an already configured source connector, instead of the configure option in the UI, you will be given an option to add data to an existing configuration. Click on the configuration option. Since this is our first time creating a Google BigQuery account, let’s click on creating a new account and provide the source connection details.

Complete the required fields for account authentication and then initiate a source connection request. If the connection is successful, click next to Proceed to Data Selection. In this step, you can explore all the data stored in your database.

Let’s select and preview the customer data. Let’s proceed to the next step to assign a target dataset for the incoming data. You can choose an existing dataset or create a new dataset. Let’s choose the new dataset option and provide a dataset name and description. To create a dataset, you need to have an associated schema. Using the schema finder, assign a schema to this dataset. Upon selecting a schema for this dataset, Experience Platform performs a mapping between the source file field and the target field. This mapping is performed based on the title and type of the field. You can notice that Platform performed a pre-mapping for most of the fields in our source data. This pre-mapping of standard fields are editable. As you can see, the points source field is currently not mapped to a target field. Let’s browse through the schema and select a target field for points.

You can easily remove a mapping if it is not required. Add calculated field option lets you run functions on source fields to prepare the data for ingestion. You can choose from a list of predefined functions that can be applied to your source field. For example, we can combine the first name field and the last name field into a calculated field using the concatenation function before ingesting the data into a dataset field. Let’s search for the concatenation function from the list. Upon selecting a function, you can notice the function documentation on the right hand side of the screen. Let’s use the syntax to combine the first name and the last name field separated by a space. You can also preview the sample result of a calculated field. Let’s save our changes and leave the window. You can view the calculated field displayed as a no source field. Now let’s quickly map the calculated field to a schema target field. Instead of browsing through the schema, optionally, you can type in to search for a target field. After reviewing the field mapping, you can also preview data to see how the ingested data will get stored in your dataset. If the mapping looks good, let’s move to the next step. Scheduling lets you choose a frequency at which data should flow from source to a dataset. Let’s select a frequency of 15 minutes for this video and set a start time for data flow. Backfill is a boolean value that determines what data is initially ingested. If backfill is enabled, all current files in the specified path will be ingested during the first scheduled ingestion. If backfill is disabled, only the files that are loaded in between the first run of the ingestion and the start time will be ingested. Select load incremental data by assigning to a field that will help us distinguish between new and existing data. In our case, let’s use the last updated date field. Note that the chosen date field should be in the UTC date format. Let’s move to the data flow step. Provide a name for your data flow. In the data flow detail step, the partial ingestion tool allows you to enable or disable the use of partial batch ingestion. The error threshold allows you to set the percentage of acceptable errors before the entire batch fails. By default, this value is set to 5%. Let’s review the source configuration details and save your changes. We do not see any data flow run statuses as we have set a frequency of 15 minutes for our data flow runs. So let’s wait for the data flow to run. Let’s refresh the page and you can now see that our data flow run status has been completed. Open the data flow run to view more details about the activity. Do you wonder why the most recent data flow run was successful when it had failed records? That’s because we enabled partial ingestion when we set up the data flow and chose an error threshold value of 5%. Since we enabled error diagnosis for our data flows, you can also see the error code and description in the data flow run overview window. Experience platform lets users preview error diagnosis to determine what went wrong with the failed records. Error diagnosis provides details about error code, column name and error description. In our case, you can notice that some of the date fields are not properly formatted. You can download the error diagnosis and share it with the respective team for fixing the failed records. Let’s go back to the data flow activity tab. At this point, we verified that data flow was completed successfully from source to our dataset. Let’s open our dataset to verify the data flow and activities. We can open the Luma customer loyalty dataset right from the data flow window or we can access it using the datasets option from the left navigation. Under the dataset activity, you can see a quick summary of ingested batches and failed batches during a specific time window. Scroll down to view the ingested batch ID. Each batch represents actual data ingestion from a source connector to a target dataset. Batch ID displays metadata about records successfully ingested, failed records and status. Please take note of missing values for new profile fragments and existing profile fragments for our batch ID. Let’s quickly preview the dataset to ensure that data ingestion was successful and our calculated fields are populated. We now have the dataset populated with data from a database source system. Finally, let’s see how to enable this data for real-time customer profile. With real-time customer profile, you can see each customer’s holistic view that combines data from multiple channels including online, offline, CRM and third-party data. To enable our dataset for a real-time profile, ensure that the associated schema is enabled for real-time profile. Once a schema is enabled for profile, it cannot be disabled or deleted. Also, fields cannot be removed from the schema after this point. These implications are essential to keep in mind when working with your data in your production environment. We recommend you to verify and test the data ingestion process to capture and address any issues that may arise before enabling the dataset and schema for the profile. Now let’s enable profile for our dataset and save our changes. Before our next batch ingestion, in the background, let me ensure that failed records are now correctly formatted. In the next successful batch run, data ingested into our dataset will be used for creating real-time customer profiles. Let’s refresh the screen and you can notice that our latest batch ingestion was completed successfully with no failed records. This time, since we had the profile flag enabled for our dataset, you can note that 1000 new profile fragments were created and updated 0 existing profile fragments. Adobe Experience Platform allows data to be ingested from external sources while providing you with the ability to structure, label and enhance incoming data using platform services. You can ingest data from a variety of sources such as Adobe applications, cloud-based storage, databases and many others.

recommendation-more-help
9051d869-e959-46c8-8c52-f0759cee3763