Uploading Data

Overview

There are four ways to add data to a benchmark:

Large file upload — start an upload session, upload Parquet files directly to storage, then finalize and let the platform index the dataset asynchronously. This is the recommended path for large parquet datasets.
Git — clone, add files, push. Best for metadata-heavy workflows, smaller repositories, and keeping benchmark source files alongside data.
CSV import — upload a CSV file or paste a URL from the benchmark hub UI.
Browser editor — create rows and columns manually in the Data tab.

For large dataset payloads, prefer the upload flow over a single large Git push.

You can also push data programmatically with the Python SDK using ze.Dataset(...).push(), but resilient upload sessions are the preferred path for very large parquet files. Git remains a good option for smaller benchmark repos and non-data files like scorer implementations.

Large File Upload (recommended for large parquet datasets)

Use the upload flow when a benchmark includes large parquet subsets or when you want resumable uploads that do not depend on one long-lived Git request.

1. Create an upload session

Create a session for the files you plan to upload. The session records:

file paths
expected sizes
checksums
progress

2. Upload file parts directly to storage

The backend returns presigned multipart upload URLs. Upload each file in parts, retry failed parts if needed, and complete each file when all parts are present.

3. Finalize the session

Finalizing the session tells the platform to verify the uploaded files and start dataset indexing.

4. Wait for indexing

The session and dataset move through these states:

received
uploading
indexing
ready
failed

The dataset becomes usable when the status reaches ready.

Git

Every benchmark has a git repository. Clone it, add your data files, and push.

1. Clone the repository

git clone https://git.llm-stats.com/<org>/<benchmark-slug>.git
cd <benchmark-slug>

Replace <org> with your organization slug and <benchmark-slug> with the benchmark name. You can find the clone URL on the benchmark’s Files tab. Authentication uses your ZeroEval API key as the git password:

git clone https://<anything>:<your-api-key>@git.llm-stats.com/<org>/<benchmark-slug>.git

2. Add data files

Place data files in a data/ directory. Each file becomes a subset.

data/
├── default.parquet    → subset "default"
├── easy.parquet       → subset "easy"
└── hard.parquet       → subset "hard"

Only .parquet files inside data/ are treated as dataset subsets. CSV and JSONL files are stored but not auto-detected as subsets.

If you only have one file, name it data/default.parquet. The platform will use it as the default subset automatically.

3. Add non-data files (optional)

You can include any other files in the repository. Common additions:

data/
├── default.parquet
├── hard.parquet
scorers/
├── exact_match.py
├── faithfulness.py
README.md

scorers/*.py files are auto-detected as scorer implementations and shown in the benchmark config.
README.md is displayed on the benchmark overview page.
Any other files (scripts, configs, notebooks) are stored and browsable in the Files tab.

4. Commit and push

git add .
git commit -m "Add benchmark data"
git push

git push now acknowledges the repo update first, then indexes the dataset asynchronously in the background. After pushing, inspect status with the benchmark page or CLI while the ingest moves through received, indexing, ready, or failed. Each successful ingest creates a new dataset version. The platform automatically:

indexes all files
detects subsets from data/*.parquet
detects scorers from scorers/*.py
updates the schema and column types from the Parquet Arrow metadata

Converting CSV to Parquet

If your data is in CSV format, convert it to Parquet before pushing:

import pandas as pd

df = pd.read_csv("questions.csv")
df.to_parquet("data/default.parquet", index=False)

Or with PyArrow directly:

import pyarrow.csv as pcsv
import pyarrow.parquet as pq

table = pcsv.read_csv("questions.csv")
pq.write_table(table, "data/default.parquet")

Multiple subsets

Use multiple Parquet files to create named subsets:

import pandas as pd

df = pd.read_csv("questions.csv")

easy = df[df["difficulty"] == "easy"]
hard = df[df["difficulty"] == "hard"]

easy.to_parquet("data/easy.parquet", index=False)
hard.to_parquet("data/hard.parquet", index=False)

Then in the SDK:

easy_ds = ze.Dataset.pull("my-benchmark", subset="easy")
hard_ds = ze.Dataset.pull("my-benchmark", subset="hard")

CSV import (UI)

From the benchmark’s Data tab, click Import CSV to upload a file or fetch from a URL. The importer:

parses CSV headers and infers column types
lets you enable/disable individual columns
previews the first 10 rows before importing
appends rows to the current subset

This is useful for quick one-off imports. For repeatable workflows, use git.

Browser editor (UI)

From the benchmark’s Data tab, click Create in UI to open a spreadsheet editor where you can:

add and remove columns
add rows one at a time
edit cell values inline
commit changes as a new version

This is useful for small benchmarks or manual test case curation.

Subset detection

The platform auto-detects subsets from the data/ directory:

File path	Subset name
`data/default.parquet`	`default`
`data/easy.parquet`	`easy`
`data/hard.parquet`	`hard`
`data/gpqa_diamond.parquet`	`gpqa_diamond`

The default subset is resolved in this order:

The benchmark’s configured default subset
A file named default.parquet
If there is only one subset, that subset
The first subset alphabetically

Schema and columns

Column names and types are read from Parquet Arrow metadata. Supported column types:

string (text)
int64 / int32 (integer)
float64 / float32 (float)
bool (boolean)

The primary key column is auto-detected:

The benchmark’s configured primary key
A column named id
The first column ending in _id
The first column

Include a stable row_id or id column whenever possible. It makes resume, row-level comparison, and version diffing more reliable.

Recommended layout

A well-structured benchmark repository looks like this:

my-benchmark/
├── data/
│   ├── default.parquet     # main evaluation subset
│   ├── easy.parquet        # optional difficulty split
│   └── hard.parquet
├── scorers/
│   └── exact_match.py      # scorer implementation
├── README.md               # benchmark description
└── requirements.txt        # optional dependencies

Getting Started

Datasets

Evals

CLI

Examples

Overview

Large File Upload (recommended for large parquet datasets)

1. Create an upload session

2. Upload file parts directly to storage

3. Finalize the session

4. Wait for indexing

Git

1. Clone the repository

2. Add data files

3. Add non-data files (optional)

4. Commit and push

Converting CSV to Parquet

Multiple subsets

CSV import (UI)

Browser editor (UI)

Subset detection

Schema and columns

Recommended layout

Getting Started

Datasets

Evals

CLI

Examples

​Overview

​Large File Upload (recommended for large parquet datasets)

​1. Create an upload session

​2. Upload file parts directly to storage

​3. Finalize the session

​4. Wait for indexing

​Git

​1. Clone the repository

​2. Add data files

​3. Add non-data files (optional)

​4. Commit and push

​Converting CSV to Parquet

​Multiple subsets

​CSV import (UI)

​Browser editor (UI)

​Subset detection

​Schema and columns

​Recommended layout

Overview

Large File Upload (recommended for large parquet datasets)

1. Create an upload session

2. Upload file parts directly to storage

3. Finalize the session

4. Wait for indexing

Git

1. Clone the repository

2. Add data files

3. Add non-data files (optional)

4. Commit and push

Converting CSV to Parquet

Multiple subsets

CSV import (UI)

Browser editor (UI)

Subset detection

Schema and columns

Recommended layout