Stream data using Source Connectors

Learn how to stream data in real-time from a cloud storage source to Platform and use the data in real-time for customer engagement.

Hi there. In this video, let me show you how to stream data in real time from a cloud story source to platform and use this data in real time for customer engagement. Data ingestion is a fundamental step to getting your data in experienced platform so you can use it to build 360 degree real-time customer profiles and use them to provide meaningful experiences. Adobe experience platform allows data to be ingested from various external sources while giving you the ability to structure, labor and enhance incoming data using platform services. When you log in the platform, you will see sources in the left navigation. Clicking sources will take you to the source catalog screen where you can see all of the source connectors currently available in platform. Note that there are source connectors for Adobe applications, CRM solutions, cloud storage providers, and more. Look on the cloud storage source category. Currently, just stream real-time data to platform from external cloud storage, you can either use Amazon Kinesis or Azure EventHubs. For this video, let me choose Amazon Kinesis and show you how to set up the source connector. When setting up a source connector for the very first time, you will be provided with an option to configure. For a configured source connector, you will be given an option to add data. Let’s use the configure option. Since this is our first time creating a Kinesis account, let’s click on creating a new account and provide source connection details. Complete the required fields for account authentication and then initiate a source connection request. If the connection is successful, click next to proceed to data selection. In this step, we select the Kinesis data stream from which we want to stream data into platform. Let’s select the Luma customers events data stream. Let’s proceed to the next step to assign a target data set for the incoming streaming data. You can choose an existing data set or create a new dataset. Let’s choose a new dataset option and provide a dataset name and description. To create a dataset, you need to have an associated schema. Using the schema finder, assign a schema to this dataset. We can preview the schema structure later in this video. For now, let’s move to the data flow details step and provide a data flow name and description. Let’s review the source configuration details and then save your changes. Upon successfully saving your configuration, you will be redirected to the data flow screen. At this point, we have successfully configured the source connector for streaming data from cloud storage solutions. From the left navigation, let’s click on schemas, browse through the schema list and open the schema that we chose when configuring the source connector. The selector schema consists of field that collect information about a user’s profile details like membership ID, loyalty number, contact details, et cetera. Let’s navigate back to the schema UI and download a sample file for our schema and open it using a text editor. The sample file provides a reference for structuring your data when ingesting into datasets that employ the schema. In the next step, let’s see how to send data to the Amazon Kinesis data stream. Based on the schema template, I have created JSON sample data for a customer with some dummy values. Before we use this data, it is important to make sure that the data’s format and structure must comply with an existing experience data model schema. Now, let’s copy the XDM stamp for entity and embed it directly under the request body. Note that the header element and the body element contains a reference to our schema to where the data will be ingested. We now have a sample data that is ready to be sent to the Amazon Kinesis data stream. Let’s split screens to the Amazon Kinesis homepage and open the data stream that’s already set up. A producer is an application that writes data to Amazon Kinesis data streams. We can build producers for Kinesis data streams using the AWS SDK for Java and the Kinesis producer library. There are several ways in which you can put records into a data stream. In this video, we will be using the AWS CLI to write data to a data stream. If you would like to explore the other options, please refer to the Amazon Kinesis documentation. Open a terminal window and let’s run a command to obtain the list of data streams in your instance. Let’s use the Luma customer events data stream. Make a note of the stream name as we might need that in the next step. There are two different operations in the Kinesis data streams API that add data to a stream, put records and put record. The put records operation sends multiple records to your stream per HTTP request, and the singular put record operations sends records to your stream one at a time. You should prefer using put records for most applications because it will achieve a higher throughput per data producer. Since we only have one record, let’s use the put record option. Let’s quickly obtain the syntax for the put record option. To write data to a data stream, you need this stream name, partition key and a data blob. Let’s scroll down to see the data blog request to be in a specific format. Let’s scan sample data into base 64 encoded before we write to the data stream. Let’s use an online tool to convert our JSON formatted data into base 64 encoded. Copy the data to your clipboard. Now, let’s run and put record command to write data to Amazon Kinesis. Let’s forward the data stream name and add the data from the clipboard in single quotes. A successful record write to data stream will include a sequence number and an ID value. Let’s switch to the Kinesis monitoring dashboard and verify that the put record was successful. We can view the record count graph as it shows a successful data write. It’s time to verify if the data returned to the Amazon Kinesis data stream is ingested to platform using the source connector configuration that we set up at the beginning of this video. Let’s open platform UI and navigate to sources and select data flows. Open the dataset associated with the data flow. Under the dataset activity, you can see a quick summary of ingested batches and fail batches during a specific time window. Scroll down to view the ingested batch ID. Note that we have a successful batch that ingested one record into our dataset. Open the batch ID to get an overview. For any reason, if the record ingestion fails, you can obtain the error message and error code from the batch overview page. Let’s quickly preview the data set to ensure that data ingestion was successful and our fields are populated. Close the preview window. With real-time customer profile, you can see each customer’s wholistic view that compares data from multiple channels, including online, offline, CRM and third party data. To use this data for real time customer engagements, let’s enable the dataset for real-time customer profiles. I hope I was able to show you how to stream data in real time from a cloud storage source to platform and use this data in real time for customer engagement. -

Additional Resources