Explore data
Learn how to validate ingested data, preview data, and explore statistical and analytical properties of data using SQL functions. For more information, please visit the Query Service documentation.
Transcript
Hi there. Adobe Experience Platform retrieves data from a wide variety of sources. Immediate challenge for marketers is making sense of this data to gain insights about their customers Adobe Experience Platform Query Service facilitates that by allowing you to use standard SQL to query data in platform using in user interface. In this video, let me show you how data engineers can quickly explore and validate data industry to datasets. From your Experience Platform homepage, navigate your datasets under data management. For this video, you’re using a fictional retail brand called Luma. Luma Loyalty dataset contains information about customers, loyalty details, geographic details, et cetera. Luma web data contains web traffic information about customers interaction on the Luma site, including products viewed by a customer, visited pages, products purchased, et cetera. Open the Luma Loyalty dataset and the last successful bachelor had ingested over 5,000 customer data. Now let’s verify the industry records by clicking on the preview dataset option. Please take a closer look at the records and their structure. As a data engineer, you can verify what’s stored in a dataset using the “Previous dataset” option. However, the preview option displays only a few records, or 200 records, but that’s not the entire dataset. Some records might have incorrect values are missing data. So to make sure that the data ingested into a dataset contains the expected number of records to explore the data type and format for each column, we can use Experience Platform query service UI, or any Postgres SQL compliant tool of your choice. From the left navigation bar, click on “credits” under “data management.” Let’s click on the “Create Query” , and you can see a query editor window open up. Let’s make sure that the number of records ingested into the dataset is the same as the source file. In the resell stamp, you can see that we have 5,000 records in the dataset, which matches the record entered into the dataset. So we made sure that no data is missing. Now let’s the dataset to make sure there are no duplicate customer loyalty records in the dataset. To do that, let’s write the query to output the unique records in the Luma Loyalty dataset. After 5,000 records, we can now confirm that there are a few duplicate records in the dataset. Let’s run a query to output duplicate records of loyalty numbers in the dataset. Using the result generated, you can then remove the duplicate records from the dataset, or write a query to output the dataset that doesn’t contain any duplicate records. To perform additional validation, Adobe Experience Platform Query Service provides several built-in Spark SQL functions to extend SQL functionality. Let’s see how we can use spark SQL functions to write a query, to check that there are no loyalty members who have a birthdate before 1900. To do this, I will be using the “Year” functionality under “date, time functions.” The date function returns the year when fed with a date. Let’s fish through the "Query Editor, and execute our query. After the query executes, you can notice that there are no records returned for customers who have a birth year less than 1900. This means we have a valid data ingested into the dataset and is ready to be used. Similarly, you can check out the platform query service document base to explore other Spark SQL functions that you could use to validate data. You can also connect to query service using different EXROP applications and tools that support cost rescuer to write and run your queries. Using 20 servers in Adobe Experience Platform that our engineers can quickly explore and validate data industry to datasets. -
recommendation-more-help
9051d869-e959-46c8-8c52-f0759cee3763