This section describes the files found in a data feed delivery.
The manifest file contains the following details about each file that is part of the uploaded data set:
The manifest file follows the same format as a Java JAR manifest file.
The manifest file is always delivered last as a separate .txt
file, so that its existence indicates that the complete data set for that request period has already been delivered. Manifest files are named according to the following:
[rsid]_[YYYY-mm-dd].txt
A typical manifest file contains data similar to the following:
Datafeed-Manifest-Version: 1.0
Lookup-Files: 1
Data-Files: 1
Total-Records: 611
Lookup-File: rsid_date-lookup_data.tar.gz
MD5-Digest: af6de42d8b945d4ec1cf28360085308
File-Size: 63750
Data-File: 01-rsid_date.tsv.gz
MD5-Digest: 9c70bf783cb3d0095a4836904b72c991
File-Size: 122534
Record-Count: 611
Every manifest file contains a header, indicating the total number of lookup files, data files, and total number of records in all data files. This header is followed by multiple sections containing information for each file included in the data feed delivery.
Some feeds are configured to receive a .fin
file instead of a .txt
manifest. The .fin
indicates that the upload is complete, but it contains no metadata about the upload.
Some data feed columns output a number that corresponds to its actual value. Lookup files are used to match a number from a data feed column and match it to an actual value. For example, a value of “497” in the browser
hit data column indicates that the hit came from “Microsoft Internet Explorer 8” if you look in browser.tsv
.
Note that the column_headers.tsv
and event_list.tsv
are specific to the data feed and report suite. Other files, such as browser.tsv
, are generic.
The lookup files are delivered together in a compressed zip named according to the following:
[rsid]_[YYYY-mm-dd]-lookup_data.[compression_suffix]
Hit data is provided in a hit_data.tsv file. The amount of data in this file is determined by the delivery format (hourly or daily, and single or multiple files). This file contains only hit data. The column headers are delivered separately with the lookup files. Each row in this file contains a single server call.
Files delivered by Adobe vary based on the type of data feed that you have configured. All files are encoded using ISO-8859-1.
[rsid]
refers to the report suite ID the data feed is from.[index]
is used only in multiple file feeds, and refers to the correct order of paginated files.[YYYY-mm-dd]
refers to the starting day the data feed is for.[HHMMSS]
is used only in hourly feeds, and refers to the starting hour the data feed is for.[compression_suffix]
refers to the type of compression used. Typically data feeds are compressed into tar.gz
or zip
files.After data is collected for a day, you receive a single compressed data file and a manifest file. The data file is named:
[rsid]_[YYYY-mm-dd].[compression_suffix]
When extracted, the data file contains a single hit_data.tsv
file with all data for that day, as well as lookup files for any required columns.
After data is collected for a day, you receive one or more compressed data files and a manifest file. The data file is named:
[index]-[rsid]_[YYYY-mm-dd].[compression_suffix]
When extracted, each data file contains a single hit_data.tsv
that contains approximately 2GB of uncompressed data, as well as lookup files for any required columns.
After data is collected for an hour, you receive a single compressed data file and a manifest file. The data file is named:
[rsid]_[YYYYmmdd]-[HHMMSS].[compression_suffix]
When extracted, the data file contains a single hit_data.tsv
file with all data for that hour, as well as lookup files for any required columns.
After data is collected for an hour, you receive one or more compressed data files and a manifest file. The data file is named:
[index]-[rsid]_[YYYYmmdd]-[HHMMSS].[compression_suffix]
When extracted, each data file contains a single hit_data.tsv
that contains approximately 2GB of uncompressed data, as well as lookup files for any required columns.
The hit data file size varies greatly depending on the number of variables actively used and amount of traffic sent to report suite. However, on average, a row of data is approximately 500B (compressed) or 2KB (uncompressed). Multiplying this by the number of server calls can provide a rough estimate on how large a data feed file is. Once your organizations starts receiving data feed files, you can find a more accurate number by dividing the number of rows in hit_data.tsv
by its total file size.