Prepare data

Learn how to clean, prepare, and combine data from multiple datasets to create a new dataset using CTAS (Create Table AS) and Spark SQL functions for reporting and dashboarding. For more information, please visit the Query Service documentation.

Hi there. Adobe Experience Platform ingests data from a wide variety of sources. A major challenge for marketers is making sense of this data, to gain insights about their customers. Adobe Experience Platform’s Query Service facilitates that by allowing you to use standard SQL to query data in platform, using an interface and API. In this video let me show you how data engineers can query existing data sets and derive data, that can be used for reporting and dash boarding purposes. From your Experience Platform homepage navigate to “Data Sets” under “Data Management”. For this video, we are using a fictional retail brand called Luma. Luma loyalty data set contains information about customers, loyalty details, geographic details, et cetera. Luma web data contains web traffic information about customers’ interaction on the Luma site, including products viewed by a customer, visited pages, products purchased, et cetera. Open the Luma web data set and the last successful batch run ingested around 87 key records. These records are sample data created for demo purposes and capture user interaction with the Luma site. Let’s scroll down to the bottom of the source file to ensure that the total number of records ingested into Experience Platform, and the number of records within the source files, match. Let’s select the web interactions for a single user and check out some of the operations performed by a user, and what information is captured. Let’s take a closer look at the order placed by our customer. In a single order, the customer purchased two items stored in a product list item array. Each array object contains information about the product, such as SKU, total price, and description. Now let’s switch back to the platform UI and navigate to queries. Let’s click on “Create a New Query” option. Let’s take a use case where a marketer needs to create a report about orders placed by a loyalty customer, and it should include customer details and product details. So clearly, we need to get information from two different data sets, Luma loyalty data and Luma web data, to create this report. As a data engineer, you can write a query to combine the data sets, by performing a “Join SQL” operation and then filtering the result to display necessary details. Remember when one customer ordered multiple products in a single purchase, and product details were stored as a product list item array. When your query result contains an array object column, and you want to separate elements of an array into multiple roles, you can use the “Explode Spark” SQL function. Adobe Experience Platform Query Service, provides several built-in Spark SQL functions to extend SQL functionality. This document contains the Spark SQL functions that Query Service currently supports. Also note that we have a “CTAS” statement in our query, which means a query result gets stored as a data set in Experience Platform. Let’s run the query and open successful query execution and you can obtain the data set ID for the new data set under the results tab. Let’s navigate to the query window and check the logs. Click on the data set created, and it should open the order summary data set, created by the “CTAS” command, and the last batch ingestion, inserted a few records. Let’s preview the data set to make sure that the data set contains data required by the marketer. This data set contains order transaction details of loyalty members and products purchased by them. The marketer can now use this data set to create a report or create a dashboard using this data. Using Query Service in Adobe Experience Platform, data engineers can quickly explore, and validate data ingested to data sets, derive insights from the data, and prepare it for various purposes. -