Read more about Data Workbench’s End-of-life announcement.
Information about the transformations that you can use to incorporate lookup data into the dataset.
Note that not all types can be used during both phases of the dataset construction process.
The Categorize transformation uses a two-column lookup table composed of pattern-string/value pairs. During this transformation, the data workbench server reads each event data record in turn and compares the contents of a designated field in the record to each of the pattern strings listed in the first column of the lookup table. If the designated field matches one of the pattern strings, the data workbench server writes the value (found in the second column) that is associated with that pattern string to a designated output field in the record.
The strings in the first column of the lookup table optionally can start with the ^ character and/or end in the $ character to force matching at the beginning and/or end. This transformation does not accept regular expressions for defining match conditions in the first column. If the input value is a vector of strings, each string is run through the transformation and the result(s) are appended to an output string vector.
A Categorize transformation is generally easier and faster than using a Regular Expression transformation to accomplish the same thing.
The substring test used in Categorize is case-sensitive unless otherwise specified using the Case Sensitive parameter.
Parameter | Description | Default |
---|---|---|
Name | Descriptive name of the transformation. You can enter any name here. | |
Case Sensitive | True or false. Specifies whether the substring test is case-sensitive. | true |
Comments | Optional. Notes about the transformation. | |
Condition | The conditions under which this transformation is applied. | |
Default | The default value to use if the condition test passes and no entry in the categorization file matches the input, or the input field is not defined in the given log entry. | |
Delimiter | String that is used to separate the columns in the lookup file. Must be a single character in length. If you hold down the Ctrl key and right-click within the Delimiter parameter, an Insert menu appears. This menu contains a list of special characters that often are used as delimiters. |
|
Multiple Values | True or false. If true, when multiple rows in the file match the input, each match results in a value being appended to the output vector of strings. If false, only the first matching row in the file is used in the output. In the latter case, if the input is a vector, the output is also a vector of equivalent length. If the input is a simple string, the output is also a simple string. | false |
File | Path and file name of the categorization file. Relative paths are with respect to the installation directory for the data workbench server. This file is typically located in the Lookups directory within the data workbench server installation directory. | |
Input | The categorization file matches its substrings against the value in this field to identify the matching row in the file. | |
Output | The name of the field associated with the result. |
Considerations for Categorize
Changes to lookup files in Categorize transformations defined in the Transformation.cfg file or in a Transformation Dataset Include file require retransformation of the dataset. Lookup files for Categorize transformations defined in the Log Processing.cfg file or a Log Processing Dataset Include file are not subject to this limitation. For information about reprocessing your data, see Reprocessing and Retransformation.
Categorize transformations defined in the Log Processing.cfg file or a Log Processing Dataset Include file reload their lookup files whenever the lookup files change. Changes are not applied retroactively, but they apply to all log data read after the change takes place.
This example illustrates the use of the Categorize transformation to integrate lookup data with event data collected from website traffic. Suppose that a particular website has business sections, and there is a requirement to be able to look at and make comparisons based on traffic flow and value generated by the different sections. You can create a lookup file that lists the substrings used to identify these different sections.
The lookup file Lookups\custommap.txt contains the following table:
/products/ | Products |
---|---|
^/sports/ | Sports |
^/news/ | News |
… | … |
This categorization file maps anything containing the string “/products/” to the value “Products,” anything starting with “/sports/” to the value “Sports,” and anything starting with “/news/” to the value “News.” The following categorization transformation uses the value in the cs-uri-stem field as the string within which we are looking for a matching substring. The result of the transformation is placed into the x-custommap field.
Assuming that the Multiple Values parameter is set to false, the example would produce the following values for x-custommap given the listed values for cs-uri-stem.
cs-uri-stem | x-custommap |
---|---|
/sports/news/today.php | Sports |
/sports/products/buy.php | Products |
/news/headlines.php | News |
/news/products/subscribe.php | Products |
The output is based on the order of the substrings in the lookup file. For example, the cs-uri-stem /sports/products/buy.php returns “Products.” Although the URI stem starts with “/sports/,” the string “/products/” is listed before “/sports/” in the lookup file. If the Multiple Values parameter were set to true, there would be an additional value for x-custommap, as the last example would match two rows in the lookup table: Products and News.
The FlatFileLookup transformation uses a lookup table composed of any number of columns and rows (although, recall that it resides in memory). During this type of transformation, the data workbench server reads each event data record in turn and compares the contents of a designated field in the record to each of the values in a designated column of the lookup table. If there is a match, the data workbench server writes one or more values from the matching row in the lookup table to one or more designated output fields in the event data record.
The lookup table used during this transformation is populated from a flat file whose location you specify when you define the transformation.
Parameter | Description | Default |
---|---|---|
Name | Descriptive name of the transformation. You can enter any name here. | |
Comments | Optional. Notes about the transformation. | |
Condition | The conditions under which this transformation is applied. | |
Default | The default value to use if the condition is met and if no entry in the lookup file matches the input. | |
Delimiter | String that is used to separate the columns in the lookup file. Must be a single character in length. If you hold down the Ctrl key and right-click within the Delimiter parameter, an Insert menu appears. This menu contains a list of special characters that often are used as delimiters. |
|
File | Path and file name of the lookup file. Relative paths are with respect to the installation directory for the data workbench server. This file is typically located in the Lookups directory within the data workbench Server installation directory. | |
Header Row | True or false. Indicates that the first row in the table is a header row to be ignored in processing. | |
Input | Column Name is the name of the column used for matching the input to the row(s) in the file. If Header Row is true, this can be the name of a column in the lookup file. Otherwise, this must be the zero-based column number to match against. Field Name is the name of the field used to locate the row in the lookup file. | |
Multiple Values | True or false. Determines whether a single value (a matching row) or multiple values should be returned (one for each matching row).
Note: If Multiple Values is set to false, you must ensure that there are not multiple matches. When multiple matches occur, there is no guarantee which match will be returned. |
|
Outputs | A vector of column objects (results) in which each object is defined by column and field names. Column Name is the column from which the output value is obtained. If Header Row is true, this can be the name of a column in the lookup file. Otherwise, this must be the zero-based column number to match against. Field Name is the name of the field used to capture the output. Note that this can be a vector of results, one for each row identified in the case where the Multiple Values parameter is true. |
Considerations for FlatFileLookup
Matching the input field to the lookup file is always case-sensitive.
Changes to lookup files in FlatFileLookup transformations defined in the Transformation.cfg file or Transformation Dataset Include files require retransformation of the dataset. Lookup files for FlatFileLookup transformations defined in the Log Processing.cfg file or Log Processing Dataset Include files are not subject to this limitation. For information about reprocessing your data, see Reprocessing and Retransformation.
FlatFileLookup transformations in the Log Processing.cfg file or Log Processing Dataset Include files reload their lookup files whenever the lookup files change. Changes are not applied retroactively, but they apply to all log data read after the change takes place.
This example illustrates the use of the FlatFileLookup transformation to integrate lookup data with event data collected from website traffic. Suppose that you want to isolate website partners that are routing traffic to the website and transform their partner IDs into more user-friendly names. You then can use the user-friendly names to create extended dimensions and visualizations that map more clearly to the business relationship than the site-to-site relationship used for routing traffic.
The example transformation searches the cs(referrer-query) field for the PartnerID name-value pair, and, if located, the lookup file Lookups\partners.txt is used to compare the PartnerID value against the values in the Partner column of the table. If a row is located, the output field x-partner-name is given the name from the PrintName column of the identified row.
If the lookup table contained the following information:
ID | Partner | Started | PrintName |
---|---|---|---|
1 | P154 | Aug 21, 1999 | Yahoo |
2 | P232 | July 10, 2000 | Microsoft |
3 | P945 | Jan 12, 2001 | Amazon |
The following examples would transform as follows:
The ODBCLookup transformation operates like a FlatFileLookup transformation. The only difference is that the lookup table used during this transformation is populated from an ODBC database and not a flat file.
ODBCLookup transformations can be executed only during the transformation phase of the dataset construction process. When possible, Adobe recommends that you use the FlatFileLookup transformation instead of the ODBCLookup transformation. FlatFileLookup transformations are inherently more reliable because they do not depend on the availability of an outside system. Additionally, there is less risk that the lookup table is modified if it resides in a flat file that you control locally.
Parameter | Description | Default |
---|---|---|
Name | Descriptive name of the transformation. You can enter any name here. | |
Comments | Optional. Notes about the transformation. | |
Condition | The conditions under which this transformation is applied. | |
Data Source Name | A DSN, as provided by an administrator of the data workbench server machine on which the dataset is processed, that refers to the database from which data is to be loaded. | |
Database Password | The password to be used when connecting to the database. If a password has been configured for the DSN in the Data Source Administrator, this may be left blank. Any password supplied here overrides the password configured for the DSN in the Data Source Administrator. | |
Database User ID | The user ID to be used when connecting to the database. If a user ID has been configured for the DSN in the Data Source Administrator, this may be left blank. Any user ID supplied here overrides the user ID configured for the DSN in the Data Source Administrator. | |
Default | The default value to use if the condition is met and no entry in the lookup file matches the input. | |
Input Column | Column Name is the column name or SQL expression for the data that is matched against the input. Field Name is the name of the field containing the data to be looked up. | |
Multiple Values | True or false. Determines whether a single value (a matching row) or multiple values should be returned (one for each matching row).
Note: If Multiple Values is set to false, you must ensure that there are not multiple matches. When multiple matches occur, there is no guarantee which match will be returned. |
|
Output Columns | A vector of column objects (results) where each object is defined by column and field names. Column Name is the name of or SQL expression for the column from which the output value is obtained. Field Name is the name of the field used to capture the output. |
|
Table Identifier | An SQL expression that names the table or view from which data is to be loaded. A typical table identifier is of the form SCHEMA.TABLE. |
The Data Source Name, Database User ID, Database Password, and Table Identifier parameters are the same as the parameters of the same names that are described for ODBC data sources. See ODBC Data Sources.
Unlike ODBC data sources, ODBCLookup transformations do not require an increasing ID column. See ODBC Data Sources. That is because the contents of the lookup table must not change in any way while the dataset is active. Changes in a lookup table or view can not be detected until retransformation occurs. For information about reprocessing your data, see Reprocessing and Retransformation.
Suppose that you want to convert outdated DNS records to the updated records. Both sets of records are stored in an SQL database. To perform this task, you would reference a lookup table that is generated from the database and replace the outdated DNS records.
Our example transformation searches the log entries for the s-dns field, and, if located, the lookup table VISUAL.LOOKUP is used to compare the s-dns entry against the entries in the OLDDNS column of the table. If a row is located in the table, the output field s-dns is given the updated DNS record entry from the NEWDNS column of the identified row.