Documentation Analytics Components Guide

Classification set file formats

Last update: Fri Oct 24 2025 00:00:00 GMT+0000 (Coordinated Universal Time)

Topics:
Classifications

CREATED FOR:

User
Admin

Classification sets support multiple file formats for uploading classification data. Each format has specific requirements for successful data uploads.

Once your file is properly formatted according to these specifications, you can upload the data through the Classification sets interface or API. For detailed upload instructions:

Browser upload: See Upload in the Schema interface for a classification set.
API upload: See the Analytics Classifications API

Classification sets support the following file formats:

JSON: JavaScript Object Notation files with structured data
CSV: Comma-separated values files
TSV/TAB: Tab-separated values files

General file requirements

All file formats must adhere to the following requirements:

File encoding: Use UTF-8 without byte-order marks. Latin1 encoding is also supported.
Character limits: Individual classification values have a maximum limit of 255 bytes.
Key requirements: Key values cannot be empty or contain only whitespace. If duplicate keys are present, the last occurrence is used.

JSON format details

The JSON file format follows conventions for JSON Lines (JSONL). The file must contain one JSON object per line, where each object represents a single classification record.

note note
NOTE
Despite following conventions for JSON Lines, use the `.json` file extension for all uploads. Using the `.jsonl` extension can result in errors.

JSON structure

Each JSON object must contain:

key (required): The unique identifier for the classification record
data (required for updates): An object containing classification column names and their values
action (optional): The action to perform. Supported values include:
- update (default)
- delete-field
- delete-key
enc (optional): Data encoding specification. Supported values include:
- utf8 or UTF8 (default)
- latin1 or LATIN1

All JSON field names (key, data, action, enc) are case-sensitive and must be lowercase.

JSON examples

Basic update record:

code language-json
`{"key": "product123", "data": {"Product Name": "Basketball Shoes", "Brand": "Brand A", "Category": "Sports"}}`

Update with encoding specified:

code language-json
`{"key": "product456", "enc": "utf8", "data": {"Product Name": "Running Shoes", "Brand": "Brand B"}}`

Delete specific fields:

code language-json
`{"key": "product789", "action": "delete-field", "data": {"Brand": null, "Category": null}}`

Delete an entire key:

code language-json
`{"key": "product999", "action": "delete-key"}`

JSON validation rules

The key field is required and cannot be null or empty.
For update actions, the data field is required and cannot be empty.
For delete-field actions, the data field must contain the fields to delete.
For delete-key actions, the data field must not be present.
Supported encoding values are case-insensitive and include standard charset names.

CSV format details

CSV (Comma-Separated Values) files use commas to separate classification data fields.

CSV structure

Header row: The first row must contain column headers and the first column must be the key column. Subsequent columns should match names in yours classification set schema
Data rows: Each subsequent row contains classification data
Delimiters: Fields are separated by commas
Quoting: Fields containing commas, quotes, or newlines should be enclosed in double quotes

CSV examples

Basic classification data:

code language-csv
`Key,Product Name,Brand,Category,Price product123,"Basketball Shoes",Brand A,Sports,89.99 product456,"Running Shoes",Brand B,Sports,79.99 product789,"Winter Jacket",Brand C,Clothing,149.99`

Delete an entire key:

code language-csv
`Key,Product Name,Brand,Category,Price product999,~deletekey~,,,`

Delete specific fields (mixed with updates):

code language-csv
`Key,Product Name,Brand,Category,Price product123,"Updated Product Name",Brand A,Sports,89.99 product456,,~empty~,~empty~,79.99`

CSV formatting rules

Fields that contain commas must be enclosed in double quotes.
Fields that contain double quotes must escape quotes by doubling them ("").
Empty fields represent null values for that classification.
Leading and trailing spaces around fields are automatically trimmed.
Special characters (tabs, newlines) within quoted fields are preserved.

Delete operations:

Use ~deletekey~ in any field to delete the entire key and all its classification data
Use ~empty~ in specific fields to delete only those classification values (leaves other fields intact)
When using ~empty~, you can mix deletions with updates in the same file

TSV and TAB format details

TSV (Tab-Separated Values) and TAB files use tab characters to separate classification data fields.

TSV and TAB structure

Header row: The first row must contain column headers and the first column must be the key column. Subsequent columns should match names in yours classification set schema.
Data rows: Each subsequent row contains classification data.
Delimiters: Fields are separated by tab characters (\t).
Quoting: Generally no quoting is needed, but some implementations support quoted fields.

TSV and TAB examples

Basic classification data:

code language-tsv
`Key Product Name Brand Category Price product123 Basketball Shoes Brand A Sports 89.99 product456 Running Shoes Brand B Sports 79.99 product789 Winter Jacket Brand C Clothing 149.99`

Delete an entire key:

code language-tsv
`Key Product Name Brand Category Price product999 ~deletekey~`

Delete specific fields (mixed with updates):

code language-tsv
`Key Product Name Brand Category Price product123 Updated Product Name Brand A Sports 89.99 product456 ~empty~ ~empty~ 79.99`

TSV/TAB formatting rules

Fields are separated by single tab characters.
Empty fields (consecutive tabs) represent null values.
No special quoting is typically required.
Leading and trailing spaces are preserved.
Newline characters within fields should be avoided.

Delete operations:

Use ~deletekey~ in any field to delete the entire key and all its classification data.
Use ~empty~ in specific fields to delete only those classification values (leaves other fields intact).
When using ~empty~, you can mix deletions with updates in the same file.

Error handling

Common upload issues and solutions:

General file format errors

Invalid file format: Verify that your file extension matches the content format (.json, .csv, .tsv, or .tab).
Unknown header: Column names must match your classification set schema (applies to all formats).

CSV and TSV specific errors

First column is required to be the key: Ensure your CSV or TSV file has a proper header row with the key column first.
A minimum of two header items are required: CSV or TSV files must have at least a Key column and one classification column.
The first header column must be called ‘Key’: The first column header must be exactly Key (capital K, case-sensitive).
Blank headers are not allowed: All CSV/TSV column headers must have names.
The number of columns did not match the headers: Each CSV or TSV data row must have the same number of fields as the header row.
"Malformed document: Check CSV quoting, proper tab separation in TSV files, and more.

JSON specific errors

Key is a required field: All JSON records must have a non-empty "key" field (lowercase, case-sensitive).
Data is a required field when using action=update: JSON update actions must include a "data" field.
Data is a required field when using action=delete-field: JSON delete-field actions must specify which fields to delete in the "data" field.
Data must not be present when using action=delete-key: JSON delete-key actions cannot include a "data" field.
Unsupported encoding: Use only supported encoding values in the "enc" field (utf8, UTF8, latin1, LATIN1).
Invalid JSON syntax: Ensure that the JSON file is formatted correctly following JSONL conventions. Also check for general JSON formatting, missing quotes, commas, brackets, etc.

Size limit errors

Key exceeds maximum size: Individual keys cannot exceed 255 bytes.
Column value exceeds maximum size: Individual classification values cannot exceed 255 bytes.

Best practices

File size: 50MB is the maximum file size for browser and API uploads.
Batch processing: For large datasets, consider splitting into smaller files.
Data validation: Test with a small sample file before uploading large datasets.
Backup: Keep copies of your source data files.
Incremental updates: Use JSON format for precise control over individual record updates and deletions.

recommendation-more-help

46b8682c-fda6-4669-9355-1a44923e549e