Monitor dataflows for sources in the UI

IMPORTANT

Streaming sources, such as the HTTP API source are not currently supported by the monitoring dashboard. At this moment, you can only use the dashboard to monitor batch sources.

In Adobe Experience Platform, data is ingested from a wide variety of sources, analyzed within Experience Platform, and activated to a wide variety of destinations. Platform makes the process of tracking this potentially non-linear flow of data easier by providing transparency with dataflows.

The monitoring dashboard provides you with a visual representation of the journey of a dataflow. You can use an aggregated monitoring view and navigate vertically from the source level, to a dataflow, and to a dataflow run, allowing you to view the corresponding metrics that contribute to a dataflow’s success or failure. You can also use the monitoring dashboard’s cross-service monitoring capacity to monitor a dataflow’s journey from a source, to Identity Service, and to Profile.

This tutorial provides steps to monitor your dataflow, using both aggregated monitoring view and cross-service monitoring.

Getting started

This tutorial requires a working understanding of the following components of Adobe Experience Platform:

  • Dataflows: Dataflows are a representation of data jobs that move data across Platform. Dataflows are configured across different services, helping move data from source connectors to target datasets, to Identity and Profile, and to Destinations.
    • Dataflow runs: Dataflow runs are the recurring scheduled jobs based on the frequency configuration of selected dataflows.
  • Sources: Experience Platform allows data to be ingested from various sources while providing you with the ability to structure, label, and enhance incoming data using Platform services.
  • Identity Service: Gain a better view of individual customers and their behavior by bridging identities across devices and systems.
  • Real-time Customer Profile: Provides a unified, real-time consumer profile based on aggregated data from multiple sources.
  • Sandboxes: Experience Platform provides virtual sandboxes which partition a single Platform instance into separate virtual environments to help develop and evolve digital experience applications.

Aggregated monitoring view

In the Platform UI, select Monitoring from the left navigation to access the Monitoring dashboard. The Monitoring dashboard contains metrics and information on all sources dataflows, including insights into the health of data traffic from a source to Identity Service, and to Profile.

At the center of the dashboard is the Source ingestion panel, which contains metrics and graphs that display data on records ingested and records failed.

monitoring-dashboard

By default, the data displayed contains ingestion rates from the last 24 hours. Select Last 24 hours to adjust the time frame of records displayed.

change-date

A calendar pop-up window appears, providing you options for alternative ingestion time frames. Select Last 30 days and then select Apply

adjust-time-frame

The graphs are enabled by default and you can disable them to expand the list of sources below. Select the Metrics and graphs toggle to disable the graphs.

metrics-and-graphs

Source ingestion Description
Records ingested The total number of records ingested.
Records failed The total number of records that were not ingested due to errors in the data.
Total failed dataflows The total number of dataflows with a failed status.

The source ingestion list displays all sources that contain at least one existing account. The list also includes information on each source’s ingestion rate, number of failed records, and total number of failed dataflows based on the time frame that you applied.

source-ingestion

To sort through the list of sources, select My sources and then select your category of choice from the dropdown menu. For example, to focus on cloud storages, select Cloud storage

sort-by-category

To view all existing dataflows across all sources, select Dataflows.

view-all-dataflows

Alternatively, you can enter a source into the search bar to isolate a single source. Once you have your source identified, select the filter icon filter beside it to see a list of its active dataflows.

search

A list of dataflows appears. To narrow down the list and focus on dataflows with errors, select Show failures only.

show-failures-only

Locate the dataflow that you want to monitor and then select the filter icon filter beside it, to see more information on its run status.

dataflow

The dataflow run page displays information on your dataflow’s run start date, size of data, status, as well as its processing time duration. Select the filter icon filter beside the dataflow run start time to see its dataflow run details.

dataflow-run-start

The Dataflow run details page displays information on the dataflow’s metadata, partial ingestion status, and error summary. The error summary contains the specific top-level error that shows at which step the ingestion process encountered an error.

Scroll down to see more specific information on the error that occurred.

dataflow-run-details

The Dataflow run errors panel displays the specific error and error code that resulted in the dataflow’s ingestion failure. In this scenario, a mapper transformation error occurred, resulting in the failure of 24 records.

Select Files for more information.

dataflow-run-errors

The Files panel contains information on the file’s name and path.

For a more granular representation of the error, select Preview error diagnostics.

files

The Error diagnostics preview window appears, displaying a preview of up to 100 errors in the dataflow. You can select Download to retrieve a curl command, which then allows you to download the error diagnostics.

When you are finished, select Close

error-diagnostics

You can use the breadcrumb system at the top header to navigate your way back to the Monitoring dashboard. Select Run start: 2/14/2021, 9:47 PM to return to the previous page, and then select Dataflow: Loyalty Data Ingestion Demo - Failed to return to the dataflows page.

breadcrumbs

Cross-service monitoring

The upper part of the dashboard contains a representation of the ingestion flow from the source-level, to Identity Service, and to Profile. Each cell includes a dot marker that indicates the presence of errors that occurred at that stage of ingestion. A green dot means an error-free ingestion, while a red dot means that an error occurred in that particular stage of ingestion.

cross-service-monitoring

From the dataflows page, locate a successful dataflow and select the filter icon filter beside it, to see its dataflow run information.

dataflow-success

The Source ingestion page contains information that confirms the successful ingestion of your dataflow. From here, you can start monitoring your dataflow’s journey from the source-level, to Identity Service, and then to Profile.

Select Identities to see ingestion in the Identities stage.

sources

Identity metrics

The Identity processing page contains information on records ingested to Identity Service, including number of identities added, graphs created, and graphs updated.

Select the filter icon filter beside the dataflow run start time to see more information on your Identity dataflow run.

identities

Identity metrics Description
Records received The number of records received from Data Lake.
Records failed The number of records that were not ingested into Platform due to errors in the data.
Records skipped The number of records that were ingested, but not into Identity Service because there was only one identifier in the record row.
Records ingested The number of records ingested into Identity Service.
Total records The total count of all records, including records failed, records skipped, Identities added, and duplicated records.
Identities added The number of net new identifiers added to Identity Service.
Graphs created The number of net new identity graphs created in Identity Service.
Graphs updated The number of existing identity graphs updated with new edges.
Failed dataflow runs The number of dataflow runs that failed.
Processing time The timestamp from the start of ingestion to completion.
Status Defines the overall status of a dataflow. The possible status values are:
  • Success: Indicates that a dataflow is active and is ingesting data according to the schedule it was provided…
  • Failed: Indicates that the activation process of a dataflow has been disrupted due to errors.
  • Processing: Indicates that the dataflow is not yet active. This status is often encountered immediately after a new dataflow is created.

The Dataflow run details page displays more information on your Identity dataflow run, including its IMS Org ID and dataflow run ID. This page also displays the corresponding error code and error message provided by Identity Service, should any errors occur in the ingestion process.

Select Run start: 2/14/2021, 9:47 PM to return to the previous page.

identities-dataflow-run

From the Identity processing page, select Profiles to see the status of records ingestion in the Profiles stage.

select-profiles

Profile metrics

The Profile processing page contains information on records ingested to Profile, including number of profile fragments created, profile fragments updated, and the total number of profile fragments.

Select the filter icon filter beside the dataflow run start time to see more information on your Profile dataflow run.

profiles

Profile metrics Description
Records received The number of records received from Data Lake.
Records failed The number of records that were ingested, but not into Profile due to errors.
Profile fragments added The number of net new Profile fragments added.
Profile fragments updated The number of existing Profile fragments updated
Total Profile fragments The total number of records written into Profile, including all existing Profile fragments updated and new Profile fragments created.
Failed dataflow runs The number of dataflow runs that failed.
Processing time The timestamp from the start of ingestion to completion.
Status Defines the overall status of a dataflow. The possible status values are:
  • Success: Indicates that a dataflow is active and is ingesting data according to the schedule it was provided…
  • Failed: Indicates that the activation process of a dataflow has been disrupted due to errors.
  • Processing: Indicates that the dataflow is not yet active. This status is often encountered immediately after a new dataflow is created.

The Dataflow run details page displays more information on your Profile dataflow run, including its IMS Org ID and dataflow run ID. This page also displays the corresponding error code and error message provided by Profile, should any errors occur in the ingestion process.

profiles-dataflow-run

Next steps

By following this tutorial, you have successfully monitored the ingestion dataflow from the source-level, to Identity Service, and to Profile, using the Monitoring dashboard. You have also successfully identified errors that contributed to the failure of dataflows during the ingestion process. See the following documents for more details:

On this page