Skip to main content

Minimal run

import zeroeval as ze

ze.init()
dataset = ze.Dataset.pull("capital-cities")

@ze.task(outputs=["prediction"])
def predict(row):
    return {"prediction": row.answer}

run = dataset.eval(predict, workers=8)
print(run.run_id)
print(run.health)
Dataset.eval(...) and Dataset.run(...) are aliases.

Task requirements

@ze.task functions must:
  • Return a dict
  • Include all declared outputs
@ze.task(outputs=["prediction"])
def predict(row):
    value = call_model(row.question)
    return {"prediction": value}
If required outputs are missing, the SDK raises a validation error.

Execution controls

Use ExecutionConfig for runtime behavior:
run = dataset.eval(
    predict,
    execution=ze.ExecutionConfig(
        workers=12,
        timeout_s=30,
        retry=ze.RetryPolicy(max_attempts=3),
    ),
)

Key knobs

  • workers: thread pool size
  • max_in_flight: max queued concurrent futures
  • timeout_s: per-row future timeout
  • retry: transient retry policy
  • failure.on_row_error: "continue" or "stop"

Checkpointing

Enable incremental persistence for long runs:
run = dataset.eval(
    predict,
    checkpoint=ze.CheckpointConfig(
        enabled=True,
        flush_every_rows=50,
        flush_every_seconds=10.0,
    ),
)
Checkpointing reduces work lost on interruption and supports strong resume behavior.

Attach run metadata

Use parameters to persist contextual metadata with the run:
run = dataset.eval(
    predict,
    parameters={
        "model": "gpt-4o-mini",
        "experiment": "capital-baseline",
        "dataset_version": dataset.version_number,
    },
)
Keep runs self-describing by always storing model name, dataset version, and subset in run parameters.