Derive data with Data Distiller

Learn how data engineers can use query service to transform data and output new datasets. Run these queries on a schedule to to power automated dashboards and segmentation. For more information, please visit Generate output datasets from query results.

Transcript

Let’s look at how data engineers can use query service with the data distiller add on to enrich and automate data sets. Deriving data with data distiller allows you to create complex programmatic rules to enrich data sets with new, calculated fields and other derived outputs. These enrich data sets can then be used to enrich customer profiles, segment customers, analyze data, and execute other advanced use cases. Keep in mind that transformation requires the data distiller add on, as it involves writing data back to a data set. This is a key distinction from ad hoc queries, which only read data.

Let’s walk through how you, as a data engineer, can use query service to identify the top 10% best selling products over the last 30 days. Creating the derived dataset allows you to reuse the results across tools and workflows. It can also be surfaced in external reporting dashboards such as Microsoft Power BI, enabling users to track different metrics without rerunning complex queries. This is also especially useful when the query is scheduled to refresh regularly, and showing the external tools always reflect the latest insights. First, query the purchase events demo dataset and calculate total revenue for each product over the past 30 days. Then they divide the ranked results into deciles ten equal groups and filter for the top decile or top 10% of products. When you run the query, the results populate the top 10% products table, but do not appear in the results tab. You can validate the results using a simple Select query in the query editor.

Since deciles adjust dynamically as new data arrives, they provide adaptive insights, making them useful for tracking high performing products, customers, or other business metrics. Once the ad hoc query has been validated, the next step is to transform it into a scheduled query. Save your query as a template. Only save templates can be scheduled. Name your query, for example. Top 10% products.

Then navigate to the templates tab. Select it. And click Add Schedule. In the scheduling interface, define the frequency. For example, daily or weekly.

Set the start and end dates and choose the output data set. You can either create a new dataset or append it to an existing one using the UI. If creating a new dataset, an ad hoc schema will automatically be created with the necessary fields. This will be the target for your scheduled query results. This process turns a one time query into a reliable, automated workflow, and showing your dataset is continuously refreshed with the latest insights.

You can view the schedule from the query editor, the template row, or from the Scheduled Queries tab. Remember, you can also use this approach to enrich profile data segment profiles into audiences and for other use cases requiring transformed data. Thanks for watching.

recommendation-more-help
9051d869-e959-46c8-8c52-f0759cee3763