Log Sources

Last update: 2022-10-04
  • Created for:
  • User
    Admin
IMPORTANT

Read more about Data Workbench’s End-of-life announcement.

Log sources are files that contain the data to be used to build a dataset.

The data available in the log sources is called event data because each data record represents a transaction record or a single instance of an event. The data workbench server can process log sources that are derived from data collected by Sensors or extracted from other data sources.

  • **Data Collected by Sensors: ** Data collected by Sensors from HTTP and application servers is transmitted to data workbench servers, which convert the data into highly compressed log ( .vsl) files. See Sensor Files.

  • Data Extracted by Insight Server: The data workbench server reads event data contained in flat files, XML files, or ODBC-compliant databases, and uses its decoders to extract the desired elements of the data. Such event data does not have to be memory-resident, but the records that contain the data must include a tracking ID. See Log Files, XML Log Sources, and ODBC Data Sources.

To add a log source

  1. Open Log Processing.cfg in data workbench.

  2. Right-click Log Sources, then click Add New.

  3. Select one of the following:

    • Sensor
    • Log File
    • XML Log Source
    • ODBC Data Source
  4. The specific parameters used to define a dataset vary based on the type of log source to be used in the dataset’s configuration process. Specify the parameters as indicated in the section corresponding to the appropriate log source:

  5. After you have defined your log source (and made changes to other parameters) in the Log Processing.cfg file, save the file locally and save it to your dataset profile on the data workbench server.

    NOTE

    A data workbench server File Server Unit can receive and store Sensor files, log files, and XML files and serve them to the data workbench server’s Data Processing Units that construct the dataset. See Configuring an Insight Server File Server Unit.

    You can open the configuration of any log source from a Transformation Dependency Map. For information about Transformation Dependency Map, see Dataset Configuration Tools.

Requirements

Event data collected by Sensors from HTTP and application servers is transmitted to data workbench servers, which convert the data into highly compressed log ( .vsl) files. The .vsl file format is managed by the data workbench server, and each file has a name of the format:

YYYYMMDD-SENSORID.VSL

where YYYYMMDD is the date of the file, and SENSORID is the name (assigned by your organization) that indicates which Sensor collected and transmitted the data to the data workbench server.

Parameters

For Sensor files, the following parameters are available:

Parameter Description
Log Paths

The directories where the .vsl files are stored. The default location is the Logs directory. A relative path refers to the installation directory of the data workbench server.

You can use wildcard characters to specify which .vsl files to process:

  • * matches any number of characters
  • ? matches a single character

For example, the log path Logs\*.vsl matches any file in the Logs directory ending in .vsl. The log path Logs\*-SENSOR?.vsl matches files in the Logs directory with any date (YYYYMMDD) and a single character after SENSOR, as in SENSOR1.

If you want to search all subdirectories of the specified path, you must set the Recursive parameter to true.

Note: If the files are to be read from a data workbench server's File Server Unit, then you must enter the appropriate URI(s) in the Log Paths parameter. For example, the URI /Logs/*-*.vsl matches any .vsl file in the Logs directory. See Configuring an Insight Server File Server Unit.

Log Server Information (Address, Name, Port, and so on) necessary to connect to a file server. If there is an entry in the Log Server parameter, the Log Paths are interpreted as URIs. Otherwise, they are interpreted as local paths. See Configuring an Insight Server File Server Unit.
Log Source ID

This parameter's value can be any string. If a value is specified, this parameter enables you to differentiate log entries from different log sources for source identification or targeted processing. The x-log-source-id field is populated with a value identifying the log source for each log entry. For example, if you want to identify log entries from a Sensor named VSensor01, you could type from VSensor01, and that string would be passed to the x-log-source-id field for every log entry from that source.

For information about the x-log-source-id field, see Event Data Record Fields.

Recursive True or false. If set to true, all subdirectories of each path specified in Log Paths are searched for files matching the specified file name or wildcard pattern. The default value is false.
Use Start/End Times

True or false. If set to true and Start Time or End Time is specified, then all files for this log source must have file names starting with dates in ISO format (YYYYMMDD). It is assumed that each file contains data for one GMT day (for example, the time range starting at 0000 GMT on one day and ending at 0000 GMT the following day). If the log sources files contain data that do not correspond to a GMT day, then this parameter must be set to false to avoid incorrect results.

Note: By default, .vsl files containing data collected by Sensor automatically meet the naming and time range requirements described above. If you set this parameter to true, the data workbench server always processes data from files whose names include ISO dates that fall between the specified Start Time and End Time. If you set this parameter to false, the data workbench server reads all of the .vsl files during log processing to determine which files contain data within the Start Time and End Time range.

For information about the Start Time and End Time parameters, see Data Filters.

NOTE

Do not use the configuration parameters for Sensor data sources to determine which log entries within a log file should be included in a dataset. Instead, set up the data source to point to all of the log files within a directory. Then use the Start Time and End Time parameters of Log Processing.cfg to determine which log entries should be used in constructing the dataset. See Data Filters.

The file containing the event data must meet the following requirements:

  • Each event data record in the file must be represented by one line.

  • The fields within a record must be separated, whether empty or not, by an ASCII delimiter. The data workbench server does not require you to use a specific delimiter. You may use any character that is not a line-ending character and does not appear anywhere within the event data itself.

  • Each record in the file must contain:

    • A tracking ID
    • A time stamp
  • To specify start and end times for data processing, each file name must be of the form:

    • YYYYMMDD-SOURCE.log

    where YYYYMMDD is the Greenwich Mean Time (GMT) day of all of the data in the file, and SOURCE is a variable identifying the source of the data contained in the file.

    NOTE

    Please contact Adobe Consulting Services for a review of the log files that you plan to incorporate into the dataset.

Parameters

For log files log sources, the parameters in the following table are available.

NOTE

The processing of log file log sources requires additional parameters that are defined in a Log Processing Dataset Include file, which contains a subset of the parameters included in a Log Processing.cfg file as well as special parameters for defining decoders for extracting data from the log file. For information about defining decoders for log file log sources, see Text File Decoder Groups.

Parameter Description
Name The identifier for the log file source.
Log Paths

The directories where the log files are stored. The default location is the Logs directory. A relative path refers to the installation directory of the data workbench server.

You can use wildcard characters to specify which log files to process:

  • * matches any number of characters.
  • ? matches a single character.

For example, the log path Logs\*.log matches any file in the Logs directory ending in .log.

If you want to search all subdirectories of the specified path, then you must set the Recursive parameter to true.

If the files are to be read from a data workbench server's File Server Unit, then you must enter the appropriate URI(s) in the Log Paths parameter. For example, the URI/Logs/*.log matches any .log file in the Logs directory. See Configuring an Insight Server File Server Unit.

Log Server Information (Address, Name, Port, and so on) necessary to connect to a file server. If there is an entry in the Log Server parameter, the Log Paths are interpreted as URIs. Otherwise, they are interpreted as local paths. See Configuring an Insight Server File Server Unit.
Compressed True or false. This value should be set to true if the log files to be read by the data workbench server are compressed gzip files.
Decoder Group The name of the text file decoder group to be applied to the log file log source. This name must match exactly the name of the corresponding text file decoder group specified in the Log Processing Dataset Include file. See Text File Decoder Groups.
Log Source ID

This parameter's value can be any string. If a value is specified, this parameter enables you to differentiate log entries from different log sources for source identification or targeted processing. The x-log-source-id field is populated with a value identifying the log source for each log entry. For example, if you want to identify log entries from a log file source named LogFile01, you could type from LogFile01, and that string would be passed to the x-log-source-id field for every log entry from that source.

For information about the x-log-source-id field, see Event Data Record Fields.

Mask Pattern

A regular expression with a single capturing subpattern that extracts a consistent name used to identify the source of a series of log files. Only the file name is considered. The path and extension are not considered for the regular expression matching. If you do not specify a mask pattern, then a mask is generated automatically.

For the files Logs\010105server1.log and Logs\010105server2.log, the mask pattern would be [0-9]{6}(.*). This pattern extracts the string "server1" or "server2" from the file names above.

See Regular Expressions.

Recursive True or false. If this parameter is set to true, all subdirectories of each path specified in Log Paths are searched for files matching the specified file name or wildcard pattern. The default value is false.
Reject File The path and file name of the file containing the log entries that do not meet the conditions of the decoder.
Use Start/End Times

True or false. If this parameter is set to true and Start Time or End Time is specified, then all files for this log source must have file names starting with dates in ISO format (YYYYMMDD). It is assumed that each file contains data for one GMT day (for example, the time range starting at 0000 GMT on one day and ending at 0000 GMT the following day). If the log sources file names do not begin with ISO dates, or if the files contain data that do not correspond to a GMT day, then this parameter must be set to false to avoid incorrect results.

Note: If the naming and time range requirements described above are satisfied for the log files and you set this parameter to true, the specified text file decoder group limits the files read to those whose names have ISO dates that fall between the specified Start Time and End Time. If you set this parameter to false, the data workbench server reads all of the log files during log processing to determine which files contain data within the Start Time and End Time range.

For information about the Start Time and End Time parameters, see Data Filters.

In this example, the dataset is constructed from two types of log sources.

Log Source 0 specifies log files generated from event data captured by Sensor. This data source points to a directory called Logs and to all of the files in that directory with a .vsl file name extension.

Log Source 1 points to all of the files in the Logs directory with a .txt file name extension. The decoder group for this log source is called “Text Logs.”

You should not delete or move log files after the data sources for a dataset have been defined. Only newly created log files should be added to the directory for the data sources.

The file containing the event data must meet the following requirements:

  • Event data must be included in a properly formatted XML file with appropriate parent-child relationships.

  • A unique decoder group must exist for each XML file format. For information about constructing a decoder group, see XML Decoder Groups.

  • Each visitor record in the file must contain:

    • A tracking ID
    • A time stamp
  • To specify start and end times for data processing, each file name must be of the form

YYYYMMDD-SOURCE.log

where YYYYMMDD is the Greenwich Mean Time (GMT) day of all of the data in the file, and SOURCE is a variable identifying the source of the data contained in the file.

For an example of an XML file that meets these requirements, see XML Decoder Groups.

NOTE

Please contact Adobe Consulting Services for a review of the XML log files that you plan to incorporate into the dataset.

Parameters

For XML log sources, the parameters in the following table are available.

NOTE

The processing of XML log sources requires additional parameters that are defined in a Log Processing Dataset Include file, which contains a subset of the parameters included in a Log Processing.cfg file as well as special parameters for defining decoders for extracting data from the XML file. For information about defining decoders for XML log sources, see XML Decoder Groups.

Field Description
Name The identifier for the XML log source.
Log Paths

The directories where the XML log sources are stored. The default location is the Logs directory. A relative path refers to the installation directory of the data workbench server.

You can use wildcard characters to specify which XML log sources to process:

  • * matches any number of characters
  • ? matches a single character

For example, the log path Logs\*.xml matches any file in the Logs directory ending in .xml.

If you want to search all subdirectories of the specified path, you must set the Recursive field to true.

Note: If the files are to be read from a data workbench server's File Server Unit, you must enter the appropriate URI(s) in the Log Paths field. For example, the URI/Logs/*.xml matches any .xml file in the Logs directory. See Configuring an Insight Server File Server Unit.

Log Server Information (Address, Name, Port, and so on) necessary to connect to a file server. If there is an entry in the Log Server field, the Log Paths are interpreted as URIs. Otherwise, they are interpreted as local paths. See Configuring an Insight Server File Server Unit.
Compressed True or false. This value should be set to true if the XML log sources to be read by the data workbench server are compressed gzip files.
Decoder Group The name of the XML decoder group to be applied to the XML log source. This name must match exactly the name of the corresponding XML decoder group specified in the Log Processing Dataset Include file. See XML Decoder Groups.
Log Source ID

This field's value can be any string. If a value is specified, this field enables you to differentiate log entries from different log sources for source identification or targeted processing. The x-log-source-id field is populated with a value identifying the log source for each log entry. For example, if you want to identify log entries from a log file source named XMLFile01, you could type from XMLFile01, and that string would be passed to the x-log-source-id field for every log entry from that source.

For information about the x-log-source-id field, see Event Data Record Fields.

Mask Pattern

A regular expression with a single capturing subpattern that extracts a consistent name used to identify the source of a series of log files. Only the file name is considered. The path and extension are not considered for the regular expression matching. If you do not specify a mask pattern, then a mask is generated automatically.

For the files Logs\010105server1.xml and Logs\010105server2.xml, the mask pattern would be [0-9]{6}(.*). This pattern extracts the string "server1" or "server2" from the file names above.

See Regular Expressions.

Recursive True or false. If this parameter is set to true, all subdirectories of each path specified in Log Paths are searched for files matching the specified file name or wildcard pattern. The default value is false.
Reject File The path and file name of the file containing the log entries that do not meet the conditions of the decoder.
Use Start/End Times

True or false. If this parameter is set to true and Start Time or End Time is specified, then all files for this log source must have file names starting with dates in ISO format (YYYYMMDD). It is assumed that each file contains data for one GMT day (for example, the time range starting at 0000 GMT on one day and ending at 0000 GMT the following day). If the log sources file names do not begin with ISO dates, or if the files contain data that do not correspond to a GMT day, then this parameter must be set to false to avoid incorrect results.

Note: If the naming and time range requirements described above are satisfied for the XML files and you set this parameter to true, the specified XML decoder group limits the files read to those whose names have ISO dates that fall between the specified Start Time and End Time. If you set this parameter to false, the data workbench server reads all of the XML files during log processing to determine which files contain data within the Start Time and End Time range.

For information about the Start Time and End Time parameters, see Data Filters.

NOTE

You should not delete or move XML log sources after the data sources for a dataset have been defined. Only newly created XML files should be added to the directory for the data sources.

The Avro data feed provides a more efficient way to integrate data into Data Workbench:

  • Avro provides a single-source format for traffic and commerce data.

  • The Avro feed is compressed data of multiple source chunks provided per day. It provisions only populated fields and provides monitoring and notification features, access to historical data, and auto-recovery.

  • The schema, a self-defining layout of Avro log files, is included at the beginning of each file.

  • New fields are added with supporting information to ingest Data Workbench data without any changes required to the decoder. These include:

    • Evars: 1-250 (previously 1-75)
    • Custom Events: 1-1000 (versus 1-100)
    • Access to solution variables for mobile, social, and video data
NOTE

In addition, using the Avro feed allows immediate access to any new fields in the feed without a shutdown, allowing the fields to be updated with no service hour requirements.

The Avro data feed is set up in separate files:

  • An Avro Log file: This is the Avro log format generated from the decoder to format traffic and commerce data.
  • An Avro Decoder file: This file lets you map values into the new Avro format. You can set up the decoder using the Avro Decoder Wizard.

Avro Decoder Wizard

This wizard sets up the Avro decoder log file.

To open, right-click in a workspace and select Admin > Wizards > Avro Decoder Wizard.

Step 1: Select an Avro Log File.

In this step, you can select a source file for the Avro schema. Schemas can be accessed from a log file (.log) or an existing decoder file (.avro). Schemas can be pulled from either file.

**Avro Log File ** Click to open a log (.log) file to view the schema at the top of the log file and generate decoder file.
Avro Decoder File Click to open and edit the schema of an existing decoder (.avro) file.

Step 2: Select Input Fields.

Select the input fields to be used in the data set to pass through log processing. All fields in the file will be displayed, allowing you to select fields for the feed.

NOTE

A x-product(Generates row) field is provided if an array is encountered in the data. This field generates new rows for the nested data in an array as input fields. For example, if you have a Hit row with many Product values in an array, then rows will be generated in the input file for each product.

Select Defaults Select fields to identify as a standard set of default fields .
Select All Select all fields in the file.
Deselect All Clear all fields in the file.

Step 3: Select fields that get copied to generate rows.

Because new rows can be created from nested values in an array, every new row created must have a tracking ID and a timestamp. This step allows you to select the fields to be copied to rows from the parent record, such as a tracking ID and timestamp. You can also select other values you want added to each row.

Select Defaults Select a standard set of default fields that require new column values added to each row, such as a tracking ID and timestamp. For example, a hit_source field is a default value required to be added to each new row (it is defined as a default value in the list). You can add other column values to each row as needed.
Select All Select all fields in the file.
Deselect All Clear all fields in the file.

Use the Search box to find values in the list.

Step 4:Specify the decoder name

Assign a name for the group of fields and save as a decoder file. The name should match the Decoder group name specified in your log source.

Step 5: Save the decoder file.

The file menu will open to name the decoder file and save as a .cfg file in the Logs folder.

On this page