Skip to main content

Create from Python data

Use ze.Dataset(name, data=[...]) when your rows already exist in memory.
import zeroeval as ze

ze.init()

dataset = ze.Dataset(
    "capital-cities",
    data=[
        {"row_id": "fr", "question": "Capital of France?", "answer": "Paris"},
        {"row_id": "de", "question": "Capital of Germany?", "answer": "Berlin"},
    ],
    description="Simple geography dataset",
)
Add a stable row_id field whenever possible. It makes resume behavior and row-level tracking much more reliable.

Create from CSV

You can initialize directly from a CSV path. The dataset name defaults to the CSV filename stem.
import zeroeval as ze

ze.init()

dataset = ze.Dataset("/path/to/questions.csv")
print(dataset.name)  # e.g. "questions"
print(len(dataset))

Modify rows before running evals

The Dataset object supports in-memory row operations.
# Add rows
dataset.add_rows(
    [{"row_id": "es", "question": "Capital of Spain?", "answer": "Madrid"}]
)

# Update one row by index
dataset.update_row(
    0,
    {"question": "Capital city of France?", "answer": "Paris"},
)

# Delete a row by index
dataset.delete_row(1)
Row updates by index can change row ordering and row content. Keep this in mind if your downstream logic assumes fixed order.

Push to ZeroEval

Push persists the dataset to the backend and refreshes local rows with backend metadata (including row IDs when available).
dataset.push()
print(dataset.version_id, dataset.version_number)
Use create_new_version=True if you explicitly want a new version on push:
dataset.push(create_new_version=True)

Validate shape quickly

Useful helpers before running tasks:
print(dataset.columns)  # sorted list of column names
print(len(dataset))     # number of rows
print(dataset[0])       # DotDict for first row