Use cases for hypercubes

Use hypercubes to efficiently generate distinct counts for various user interactions without fully recalculating data each time. The following are some practical scenarios for their use:

  • Analyze unique visitors that view specific products during a defined time period.
  • Identify users who interact with multiple products in a given period to enhance cross-sell analysis.
  • Distinguish users engaging with one product but not another over time to uncover preference patterns.
  • Combine online and offline interaction data to get a comprehensive view of user behavior over a given period.
  • Track user movement across different activities within an event to optimize layout and services.

Benefits of using hypercubes

In these situations, you can pre-calculate basic information for specific categories. However, when analyzing data across multiple dimensions and time periods, you have to either recalculate everything from raw data or use a Query Service hypercube. Hypercubes streamline the process by organizing data efficiently, which allows flexible filtering and multi-dimensional analysis without reprocessing. They use advanced functions to estimate results quickly and accurately to offer key benefits such as improved processing efficiency, scalability, and adaptability for complex analytical tasks.

Data size efficiency for query processing

Query Service can compress millions or billions of data points (for example, user IDs) into a compact form called a sketch. This sketch has a significantly reduced data size for query processing, which maintains scalability and makes it much easier and faster to work with. No matter how large the original data gets, the size of the sketch stays small, which makes analyzing big data much more manageable and efficient.

The diagram below illustrates how Commerce, Product Info, and Web dimension ExperienceEvents are processed into sketches, which are then used to approximate unique counts.

Infographic showing the creation of sketches using Query Service. The diagram illustrates how ExperienceEvents with Commerce, Product Info, and Web dimensions are processed into sketches, which are then used to approximate unique counts.

Merge sketches to make data analysis faster and easier

To avoid recalculating and enhance processing speed, you can merge sketches from different categories or groups. Query Service also simplifies the design by organizing your data into a hypercube, where each row becomes a summary of its partition (a collection of dimensions) alongside the sketch column. Each row of the hyper-cube contains the dimension combination but does not have any raw data. When executing a query, specify the dimensional columns you want to use for building additive metrics and merge the sketches for those rows.

The diagram shows how sketches from different ExperienceEvents are merged to create approximate unique counts across various dimensions.

Cost-effectiveness

Customer data is often large-scale, but you can eliminate the need to reprocess historical data by using incremental processing. Sketches are much smaller and allow for faster, real-time results while saving on compute resources and costs. This data transformation makes interactive queries more feasible and efficient.

Functions Overview

This section outlines how each function optimizes data processing and enhances analytical capabilities through the efficient use of sketches and hypercubes. It details their purpose, example syntax, parameters, and expected output.

Create unique count estimates with HLL sketches

hll_build_agg is an aggregate function that creates an HLL (HyperLogLog) sketch. This function is a compact, probabilistic method for estimating the number of unique values within a column or expression in a grouped dataset.