Read more about Data Workbench’s End-of-life announcement.
The data workbenchTransform.cfg file contains the parameters that define the log sources, data transformations, and exporters.
The transformations that you define manipulate raw data collected by Sensors ( .vsl files) or contained in text files, XML files, or ODBC-compliant databases and output them either into existing fields, overwriting the current data, or into newly defined fields.
To configure transformation functionality, you edit the data workbench Transform.cfg file within the Dataset folder for the profile for which you want to export event data. Typically, this profile is dedicated to transformation functionality (that is, you perform no other data processing than what is defined in the data workbench Transform.cfg file). It is important to note that any processing instructions specified in the Log Processing Dataset Include files for any inherited profiles are applied in addition to those specified in the data workbench Transform.cfg file.
For information about dataset include files, see Dataset Include Files.
If the data that you want to export is processed by a data workbench server cluster, each of the processing servers (DPUs) in the cluster processes the data, but only the first DPU (processing server #0 in the profile.cfg file) will write the output data to its local file system.
To edit the data workbench Transform.cfg file
Parameter | Description |
---|---|
End Time | Optional. Filter data to include log entries with timestamps up to, but not including, this time. Adobe recommends using one of the following formats for the time:
For example, specifying July 29 2013 00:00:00 EDT as the End Time includes data through July 28, 2013, at 11:59:59 PM EDT. You must specify a time zone. The time zone does not default to GMT if not specified. For a list of time zone abbreviations supported by the data workbench server, see Time Zone Codes . The Use Start/End Times parameter for Sensor and log file sources is related to this parameter. |
Exporters | The subfields of an exporter specify how the output data is processed and/or formatted. You can define multiple exporters for a set of log sources. Each exporter type creates output independently. Three types of exporters exist:
For more information about exporter types, see Defining Exporters . |
Hash Threshold | Optional. A sampling factor for random sub-sampling of rows. If set to a number n, then only one out of each n tracking IDs are selected for exporting, reducing the total number of rows exported by a factor of n. To export all rows, you would set Hash Threshold to 1. |
Log Entry Condition | Optional. Defines the rules by which log entries are considered for export. For more information about the Log Entry Condition , see Log Processing Configuration File . |
Log Sources | The sources of data. Log sources can be .vsl files, log files, or XML files or data from ODBC-compliant databases. For information about log sources , see Log Processing Configuration File . Transform expects all source data to be in chronological order within lexicographically ordered input files. If this requirement is not satisfied, As Of calculations are incorrect, and additional input data may be processed after the output files are closed. |
Offline Mode | Optional. True or false. If true, Transform assumes that all of the input files are present when it starts processing the data. When all of the input data has been read, Transform closes all of the output files without waiting for additional data to be received. The default value is false.
Note: If Offline Mode is set to true, Transform expects all source data to be present before processing starts. A warning message is generated in the VisualServer.log file if additional data is received after the output files are closed. |
Reprocess | Optional. Any character or combination of characters can be entered here. Changing this parameter and saving the file to the Transform machine initiates data reprocessing. For information about reprocessing your data, see Reprocessing and Retransformation . |
Stages | Optional. The names of the processing stages that can be used in Log Processing Dataset Include files that are executed in addition to the data workbench Transform.cfg file. Processing stages provide a way to order the transformations that are defined in Log Processing Dataset Include files. This parameter is very helpful if you have defined one or more transformations within multiple Log Processing Dataset Include files and you want specific transformations to be performed at specific points during the export process. The order in which you list the stages here determines the order in which the transformations in the Log Processing Dataset Include files are executed during data export. Preprocessing and Postprocessing are built-in stages; Preprocessing is always the first stage, and Postprocessing is always the last stage. By default, there is one named stage called Default . To add a new processing stage
To delete an existing processing stage
Note: When you specify a Stage in a Log Processing Dataset Include file the name of the stage must match exactly the name that you enter here. For more information about dataset include files, see Dataset Include Files . |
Start Time | Optional. Filter data to include log entries with timestamps at or after this time. Adobe recommends using one of the following formats for the time:
For example, specifying July 29 2013 00:00:00 EDT as the Start Time includes data starting from July 29, 2013, at 12:00:00 AM EDT. You must specify a time zone. The time zone does not default to GMT if not specified. For a list of time zone abbreviations supported by the data workbench server, see Time Zone Codes .
Note: The Use Start/End Times parameter for Sensor and log file sources is related to this parameter. |
Transformations | Optional. Defines the transformations that are to be applied to the data. For information about the available transformation types, see Data Transformations .
Note: The following transformation types do not work when defined in the data workbench Transform.cfg file:
|
If additional data is received after the output files are closed (see Log Sources and Offline Mode in the preceding table), Transform creates new output files with the additional data. The names of the new output files are generated from the original output file’s name with the addition of a parenthesized version number just before the extension. For example, if the original output file is 20070701-ABC.vsl, subsequent versions of this file will be named 20070701-ABC(1).vsl, 20070701-ABC(2).vsl, and so on. Note that using the versioned files as input to the data workbench server may result in processing errors.
Adobe recommends avoiding the creation of versioned output files by making sure that all source data is in chronological order within lexicographically ordered input files and, if Offline Mode is set to true, that all source data is present before processing starts. For more information, see the Log Sources and Offline Mode entries in the preceding table.
Add transformations by right-clicking Transformations and clicking Add new > Transformation type. Complete the transformation fields.
See Data Transformations for descriptions and examples of the transformations that you can use with transformation functionality.
Right-click (modified) at the top of the window, then click Save.
To make the locally made changes take effect, in the Profile Manager, right-click the check mark for data workbench Transform.cfg in the User column, then click Save to > profile name, where profile name is the name of the profile for which you are exporting data. Reprocessing of the data begins after synchronization of the profile.
For information about reprocessing your data for export, see Reprocessing and Retransformation.