Monitor data flows in Adobe Experience Platform
Last update: Fri Feb 14 2025 00:00:00 GMT+0000 (Coordinated Universal Time)
- Topics:
- Data Ingestion
CREATED FOR:
- Beginner
- Developer
Learn how to monitor and track batch and streaming data ingested into Adobe Experience Platform using the user-interface and APIs.
Transcript
These are the areas I’ll cover in this video. Monitor data that gets ingested into Experience Platform from both the user interface and using APIs. Experience Platform provides data monitoring capabilities for streaming and batch ingestion. To better understand how data flows through different systems through Experience Platform, I’ll assume a marketer’s role by configuring a data flow from a source using a source connector. Adobe Experience Platform lets you bring data from various sources to help marketers better understand the behaviors of their customers. One of the main questions to answer in a data platform is this. How is data flowing in and out? I’m logged in to Experience Platform. In the left panel, I’ll select Sources under Destinations. This displays a list of source connectors that can be set up using an intuitive workflow. I’ll be ingesting data from an Amazon S3 connector. I’ll search for S3 in the search box to easily locate it. Next, I’ll select the Add Data command. This displays pre-existing S3 buckets that have already been configured for use in this instance of platform. I’m going to select an S3 account that has data available to ingest. Then, I’ll select Next in the upper right corner. On the Select Data step of the workflow, it shows all the directories and files available. I’m going to select the Loyalty folder, and then I’ll select luma-loyalty-csv, that file. Now in the Preview panel, I’ll select the data format, which is delimited, given the CSV file that we’re using. Once I do this, I’ll get a sample data preview below. As I scroll the preview data, I’ll notice some missing pieces of information in some of the rows. I’ll select Next in the upper right corner. On the Dataset Details step, I’ll choose an existing dataset to store the incoming data. I also want Error Diagnostics and Partial Ingestion enabled. Error Diagnostics provides detailed error messages for ingested batches, and the failed records can be downloaded using the API. Enable Partial Ingestion allows you to establish the threshold of record percentages that is acceptable to occur before the entire batch fails. When I enable this feature, I can also set an error threshold, which I’ll increase to 10%. I’ll also enter a data flow name and description. Toward the bottom, you can also subscribe to receive alerts for different events, as you see here. Alright, now I’m going to select Next in the upper right corner. On the Mapping step, I’ll review the mapping of the source fields to the schema used for the dataset that will receive these records. Everything looks correct, so I’ll select Next in the upper right corner. On the Scheduling step, I’ll choose a frequency of 15 minute intervals, and I’ll keep the other settings the same. Now on the Review step, I’ll select Finish in the upper right corner. A data flow gets created at this point, and I’m going to give it a few minutes for the run to be completed. Now that there’s been a successful run of the new S3 data flow I created, I’ll show you the activity details for it. This shows the data flow is executed. A data flow run is an instance of the flow. I’ll select the last item to view its properties. This opens the data flow run overview. It provides details around the number of rows ingested, records failed, and other high-level information associated with the data flow. If there are any errors associated with the run, they’ll show up at the bottom under Data Flow Run Errors. This shows the error code, the number of records failed, and description of the error. Next, I’ll select Preview Error Diagnostics to see more details. As a point of interest, you can preview up to 100 errors here, if there were that many. Now you can also download the error diagnostics and file manifest. We’re going to be covering how to make these API calls soon. Next, we’ll look at a streaming data flow to see the activity details available for this type of ingestion. I’ll start by selecting Sources in the left panel under Connections. Then, I’ll select Data Flows from the top navigation. I’m going to use the filter feature to quickly locate the streaming data flow we’ll be reviewing. When I do that, this displays a list of many types of data flows. I’ll select Adobe Data Collection from the list. This filters the data flows coming from Adobe tags, formerly known as Adobe Launch. I’ll go to the second page and select the first data flow at the top. I see the top line data flow status metrics for ingested and failed. In the run table below, each run contains additional details, most of which were available for the batch source previously viewed. There’s an additional capability and status for streaming data flows called Records with Warnings. Non-critical mapper transformation errors are reported as warnings to allow for partially ingested rows. The second row contains a run that includes records with warnings. Notice it still ingests those rows. There were 63 records received and 63 records with warnings. Only 46 of these were ingested, but that’s because there were 17 records that failed. I’m going to click on that run to view the details. In the data flow run errors section, I’ll first see the records failed information, including the error code, record count, and description. We’ll be reviewing the API shortly to investigate this further. For now though, I’ll click on Records with Warnings. Again, I see the error code, which is a non-critical mapper issue, and a descriptor of that error. I’ll also click on Preview Error Diagnostics in the right corner. I’ll select the first row, then scroll to the right to see the fields expected and the data passed in for that row. I notice there’s no data mapped to the IdentityMap.ecid.0.Primary field. I’ll verify that’s the issue by clicking on another row or two. That’s it, that’s the issue. Something like this is likely due to an update in one of the field groups used in the schema this data is validated against. It’s not a critical error, so the data is still ingested instead of failed. I’ll close the Error Diagnostics preview window. As stated earlier, if you want a deeper understanding of what happens to data coming from source data flows and a destination dataset, using APIs is the way to go. Before you use API calls, you must create a project in the Adobe Developer Console and add the Adobe Experience Platform APIs product. There is a separate video on this topic. This generates the environment variables required for running API requests. For my demonstration, I’ll be using the Postman application. While it’s not required for executing API calls, Adobe does provide many Postman files to help you get started quickly. Here’s a GitHub page where you can download JSON files for the APIs. I’ll use the Authentication, Flow Service, and Observability Insights collections. The authentication APIs are found in the IMS directory. The remainder of this video assumes a working knowledge of Postman. Refer to the Postman documentation for assistance if you’re not familiar with the application. I’ve already downloaded and imported the full AEP API collection into Postman. These are available in the left panel. I’ll be using the authentication calls to generate a JWT Access Token. This is required for authenticating all other calls we’ll be looking at. Now we’ll look at the Flow Service API calls. This is used to collect and centralize customer data from various sources. First, we’ll make a Flow Service API call to retrieve the details of all data flows in my organization and then search for the Luma Loyalty S3 data flow that I created earlier. When I run the List Flows API, I see the data flow I created earlier, the one with Luma Loyalty CSV. From here I can obtain details about the data flow. I want to copy the ID of my data flow into the clipboard. I’m going to copy and paste the ID from the previous API call to the value field under Path Variables, and then I’ll hit Send. This returns data flow runs based on the scheduled frequency. I’ll search for the last executed Flow Run ID. I’ll make another call to get more details about this flow run. I’ll select Retrieve a Flow Run under Runs, and I’ll copy and paste the Flow Run ID in the value field under Path Variables. Now I’ll hit Send. This captures details about the data that was ingested and shows various metrics about the data. These metrics are also sent to Observability to support use cases like understanding how many records have been ingested over the last seven days for all Amazon S3 workflows. Let’s look at the basic structure of a flow run in the response. High-level information related to the flow run is returned at the top, such as the run ID, sandbox name, and IMS-org, which is really your AEP instance name. As you scroll down, you’ll see the metrics section regarding the duration, size, records, and file summary. The activities section is used to see how data moves internally from source to target, with all activities capturing the metrics corresponding to them. At a high level, data movement is a two-step process. The first is copy. This involves copying data from the source, which is the S3 bucket I used to create the data flow, to a staging location. The second step is promotion. This moves data from Adobe Experience Platform’s staging location to the master location, host validation, transformation, etc. In the Copy Activities summary, notice a thousand records are copied in the cloud storage part to the staging location. I’ll check whether all the records were successfully ingested in the promotion activity. Under the Records summary for the promotion activity, notice that of the 1000 records, 996 were ingested successfully into the dataset and 4 records were not. We’d seen these errors earlier in the user interface. In the Status Summary section, you can view any errors that occurred. To understand what happened to the failed records, you can run an error diagnostic called Failed Records Endpoint. In the Extensions section under Status Summary, there’s a link next to Failed Records. When I click this, it opens the GET request in a new tab in Postman. I’ll hit Send and the JSON error URL is returned. I’ll copy and paste this URL into the GET line and hit Send again. In the user interface, you’re limited to viewing a hundred errors. However, using the API, you retrieve details for all failed records. I can see exactly which records weren’t ingested and I can take steps to fix these errors and ingest a new file. Metrics of flowruns are also sent to Observability Insights API. It enables more powerful aggregate queries for answering questions pertaining to flows to a dataset. This API is a RESTful one and it allows you to expose key metrics. It also helps organizations monitor health checks for platform services, historical trends, and provides performance indicators for various platform functions. I’m going to open the GET Retrieve Metrics Data API. First, I’ll look at the key header fields to set up for this call. They include metrics, the ID, and date range. In my example, I’m querying three metrics for a specific dataset ID during the range of September 1st through September 18th. The complete list of metrics available is included in the online documentation for Observability Insights API. Now I’ll run the API and look at the response body. Here, I can run the statistics request for the dataset and see the activity. I chose a dataset that didn’t have any data flows this month yet, but this is just an example of what’s possible. The experience platform also provides a user interface that helps keep track of data ingested. I’m logged into experience platform and I’ll navigate to monitoring under data management. I’ll select batch end-to-end and then remove the filter. This shows all the data ingested into experience platform within the given time frame. You should now feel comfortable with creating a data flow and retrieving diagnostic data related to that flow in the user interface and using Adobe Experience Platform APIs. Good luck!
recommendation-more-help
9051d869-e959-46c8-8c52-f0759cee3763