CrossRows

IMPORTANT

Read more about Data Workbench’s End-of-life announcement.

Like other transformations, the CrossRows transformation is applied to the rows of data (log entries) in your log sources.

For each row of data, the transformation takes the value of the specified input field, performs a set of processing steps, and records the result in the output field that you specify. However, when the CrossRows transformation works on one row of data (this row is called the output row), it takes into account that row plus one or more other rows of data (these rows are called input rows) that are associated with the same tracking ID. Therefore, for a given tracking ID, the value of the output field for each output row is based on the values of the input field for one or more input rows.

The transformation provides multiple conditions and constraints that enable you to limit the input rows for the transformation. You can express these limits in terms of the data workbench server’s conditions (see Conditions), a range of input rows relative to the output row, or a range of times relative to the time of the output row. For those input rows that satisfy the transformation’s conditions and constraints, you can apply an operation (such as SUM) that determines the value of output field.

NOTE

To work, the CrossRows transformation requires that the data is ordered in time and grouped by the tracking ID in your source data. Therefore, CrossRows works only when defined in the Transformation.cfg file or in a Transformation Dataset Include file.

As you review the descriptions of the parameters in the following table, remember the following:

  • The output row is the row of data that the transformation is working on at a given point in time.
  • Input rows are all of the other rows of data (before, after, or including the output row) whose values of the input field serve as inputs to the transformation. Input rows are subject to the Input Condition, Key, Row Begin, Row End, Time Begin, and Time End parameters.
Parameter Description Default
Name Descriptive name of the transformation. You can enter any name here.
Comments Optional. Notes about the transformation.
Condition Limits the output of the transformation to certain log entries. If the condition is not met for a particular log entry, the field in Output parameter is left unchanged. The input still may be used to affect other log entries.
Input The name of the field from the input row to use as input.
Input Condition Accepts input for the transformation from only certain input rows. If the Input Condition is not met for a particular input row, the input field from that row is ignored and does not affect other output rows. However, the output field from that row is still modified per the specified Condition.
Key

Optional. The name of the field to use as the key.

If a key is specified, the input rows for a given output row are limited to the contiguous block of rows having the same Key value as the output row. This restriction is in addition to all other limitations placed on the input rows by other parameters of the CrossRows transformation.

For example, if you are working with web data and you make the field x-session-key (which has a unique value for each session) the key, then the input rows for the transformation are limited to those rows having the same x-session-key value as the output row. Therefore, you are considering only those input rows representing page views that occur during the same session as the output row.

Operation

An operation that, for each output row, is applied to all of the input rows satisfying all of the conditions defined by the Input Condition, Key, Row Begin, Row End, Time Begin, and Time End parameters to produce an output:

  • ALL takes all of the values of the input field from the input rows and outputs them as a vector.
  • SUM interprets the values of the input field from the input rows as numbers and sums them.
  • FIRST ROW outputs the value of the input field from the first input row.
  • LAST ROW outputs the value of the input field from the last input row.

Output The name of the output field.
Row Begin/Row End

Optional. Specifies a range of input rows relative to the output row. For example, a Row Begin value of "0" excludes all rows before the output row. A row begin value of "1" excludes the output row as well. Common ranges include:

  • Begin 0: This row and all subsequent ones.
  • Begin 1: All subsequent rows.
  • End 0: This row and all previous ones.
  • End -1: All previous rows.
  • Begin -1, End -1: The previous row.
  • Begin 1, End 1: The next row.

All rows
Time Begin/Time End

Optional. Specifies a range of times relative to the time of the output row. For example, a Time End of 30 minutes includes all rows that take place within 30 minutes after the output row. A Time Begin of -30 minutes includes all rows that take place within 30 minutes before the output row.

Available time units are days, weeks, hours, minutes, ms (milliseconds), ticks (100 nanoseconds), and ns (nanoseconds).

All times

The CrossRows transformation in this example is applied to rows of web data to find for each page view the time of the next page view. Because we know that CrossRows is applied only during the transformation phase of the dataset construction process, the rows of data are ordered by visitor (each visitor has a unique tracking ID) and time.

The input field, x-timestamp, is considered for only those input rows in which the x-is-page-view field is populated (indicating the row of data represents a page view). The x-session-key field (which has a unique value for each session) is specified for the Key parameter. Therefore, the input rows (log entries) for the transformation are limited to the contiguous block of rows having the same value of x-session-key as the output row. In other words, to be considered for the transformation, an input row must represent a page view that occurs during the same session as the page view in the output row. The first row operation takes the value of the output field from the first input row satisfying the Input Condition and having the same x-session-key value as the output row.

CrossRows executes in an amount of time proportional to the size of its inputs plus the size of its outputs. This means that for operations SUM, FIRST ROW, and LAST ROW, it is no less efficient than other transformations. For ALL, the situation is more complex because it is possible to configure CrossRows to output an amount of data for each row of data (log entry) that is proportional to the total number of rows (log entries) for a given tracking ID.

On this page