Editing the Log Processing Configuration File

Last update: 2022-10-04
  • Created for:
  • User
    Admin
IMPORTANT

Read more about Data Workbench’s End-of-life announcement.

Steps to edit the Log Processing.cfg file for a dataset profile.

  1. While working in your dataset profile, open the Profile Manager and click Dataset to show its contents.

    For information about opening and working with the Profile Manager, see the Data Workbench User Guide.

    NOTE

    A Log Processing subdirectory may exist within the Dataset directory. This subdirectory contains the Log Processing Dataset Include files that have been created for one or more inherited profiles. See Dataset Include Files.

  2. Right-click the check mark next to Log Processing.cfg and click Make Local. A check mark for this file appears in the User column.

  3. Right-click the newly created check mark and click Open > in Workstation. The Log Processing.cfg window appears.

    You also can open the Log Processing.cfg file from a Transformation Dependency Map. For information about transformation dependency maps, see Dataset Configuration Tools.

  4. Edit the parameters in the configuration file using the following table as a guide.

    When editing the Log Processing.cfg file within a data workbench window, you can use shortcut keys for basic editing features, including cut ( Ctrl+x ), copy ( Ctrl+c) , paste ( Ctrl+v ), undo ( Ctrl+z ), redo ( Ctrl+Shift+z ), select section (click+drag), and select all ( Ctrl+a ). You can also use the shortcuts to copy and paste text from one configuration file ( .cfg) to another.

    NOTE

    A Log Processing Dataset Include file for an inherited profile contains a subset of the parameters described in the following table as well as some additional parameters. See Dataset Include Files.

    Parameter Description
    Log Sources The sources of data. See Log Sources .
    End Time

    Optional. Filter data to include log entries with timestamps up to but not including this time. Adobe recommends using one of the following formats for the time:

    • January 1 2013 HH:MM:SS EDT
    • Jan 1 2013 HH:MM:SS GMT

    For example, specifying July 29 2013 00:00:00 EDT as the End Time includes data through July 28, 2013, at 11:59:59 PM EDT. See Data Filters .

    You must specify a time zone. The time zone does not default to GMT if not specified. For a list of time zone abbreviations supported by the data workbench server, see Time Zone Codes .

    Note: The Use Start/End Times parameter for Sensor , log file, and XML sources is related to this parameter. See the sections of Log Sources that discuss these source types.

    Fields Optional. Adobe recommends defining Fields in one or more Log Processing Dataset Include files. See Log Processing Dataset Include Files .
    Group Maximum Key Bytes

    Maximum amount of event data that the Server can process for a single tracking ID. Data exceeding this limit is filtered from the dataset construction process. This value must be set to 2e6 when key splitting is active and 1e6 when key splitting is not active. See Key Splitting .

    Note: Do not change this value without consulting Adobe.

    Hash Threshold

    Optional. A sampling factor for random sub-sampling of rows. If set to a number n, then only one out of each n tracking IDs enters the dataset, reducing the total number of rows in the dataset by a factor of n.

    To create a dataset that requires 100 percent accuracy (that is, to include all rows), you would set Hash Threshold to 1.

    Values:

    Hash Threshold = 1 (100 percent of data including all rows.)

    Hash Threshold = 2 (1/2 of data and includes half the rows.)

    Hash Threshold = 3 (1/3 of data and includes one of three rows, but rounds to 34% in Query Completion)

    Hash Threshold = 4 (1/4th of data and includes one out of four rows.)

    Note: Using a Using a Hash Threshold = 8 provides 1/8th of the data, which is 12.5%. However the Query Completion value in the rounds to 13% for this value. Additional examples include a Hash Threshold = 6 that results in 17% query resolution. A Hash Threshold = 13 results in 8% query resolution.

    If Hash Threshold is specified in both the Log Processing.cfg and Transformation.cfg files, it is not applied in sequence; the maximum value set in either configuration file applies. See Data Filters .

    Log Entry Condition Optional. Defines the rules by which log entries are considered for inclusion in the dataset. See Log Entry Condition .
    Reprocess

    Optional. Any character or combination of characters can be entered here. Changing this parameter and saving the file to the data workbench Server machine initiates data reprocessing.

    See Reprocessing and Retransformation .

    Split Key Bucket Space

    Parameter involved in key splitting. Its value should be 6e6 when key splitting is active. See Key Splitting .

    Note: Do not change this value without consulting Adobe.

    Split Key Bytes

    Parameter involved in key splitting. Its value should be 1e6 when key splitting is active and 0 when key splitting is not active. See Key Splitting .

    Note: Do not change this value without consulting Adobe.

    Split Key Space Ratio

    Parameter involved in key splitting. Its value should be 10 when key splitting is active. See Key Splitting .

    Note: Do not change this value without consulting Adobe.

    Stages

    Optional. The names of the processing stages that can be used in Log Processing Dataset Include files. Processing stages provide a way to order the transformations that are defined in Log Processing Dataset Include files. This parameter is very helpful if you have defined one or more transformations within multiple Log Processing Dataset Include files and you want specific transformations to be performed at specific points during log processing.

    The order in which you list the stages here determines the order in which the transformations in the Log Processing Dataset Include files are executed during log processing. Preprocessing and Postprocessing are built-in stages. Preprocessing is always the first stage, and Postprocessing is always the last stage. By default, there is one named stage called Default.

    To add a new processing stage

    • In the Log Processing.cfg window, right-click Stages and click Add New > Stage .
    • Enter a name for the new stage.

    To delete an existing processing stage

    • Right-click the number corresponding to the stage that you want to delete and click Remove < #stage_number >.

    Note: When you specify a Stage in a Log Processing Dataset Include files, the name of the stage must match exactly the name that you enter here. See Dataset Include Files .

    Start Time

    Optional. Filter data to include log entries with timestamps at or after this time. Adobe recommends using one of the following formats for the time:

    • January 1 2013 HH:MM:SS EDT
    • Jan 1 2013 HH:MM:SS GMT

    For example, specifying "July 29 2013 00:00:00 EDT" as the Start Time includes data starting from July 29, 2013, at 12:00:00 AM EDT. See Data Filters .

    You must specify a time zone. The time zone does not default to GMT if not specified. For a list of time zone abbreviations supported by the data workbench server, see Time Zone Codes .

    Note: The Use Start/End Times parameter for Sensor , log file, and XML sources is related to this parameter. See the sections of Log Sources that discuss these source types.

    Time Zone

    Optional. Time zone of the data workbench server that is used for time conversions (such as the conversion represented by the x-local-timestring field) during log processing.

    Note: You must specify the Time Zone if you want to access the converted time field during the log processing phase of dataset construction. Otherwise, the data workbench server records an error in the event logs.

    See Time Zones .

    Transformations Optional. Adobe recommends defining transformations for log processing in one or more Log Processing Dataset Include files. See Log Processing Dataset Include Files .
  5. Right-click (modified) at the top of the window and click Save.

  6. In the Profile Manager, right-click the check mark for Log Processing.cfgin the User column, then click Save to > < dataset profile name> to make the locally made changes take effect. Reprocessing of the data begins after synchronization of the dataset profile.

    NOTE

    Do not save the modified configuration file to any of the internal profiles provided by Adobe, as your changes are overwritten when you install updates to these profiles.

    For more information about reprocessing your data, see Reprocessing and Retransformation.

On this page