Minimal run
import zeroeval as ze
ze.init()
dataset = ze.Dataset.pull("capital-cities")
@ze.task(outputs=["prediction"])
def predict(row):
return {"prediction": row.answer}
run = dataset.eval(predict, workers=8)
print(run.run_id)
print(run.health)
Dataset.eval(...) and Dataset.run(...) are aliases.
Task requirements
@ze.task functions must:
- Return a
dict
- Include all declared
outputs
@ze.task(outputs=["prediction"])
def predict(row):
value = call_model(row.question)
return {"prediction": value}
If required outputs are missing, the SDK raises a validation error.
Execution controls
Use ExecutionConfig for runtime behavior:
run = dataset.eval(
predict,
execution=ze.ExecutionConfig(
workers=12,
timeout_s=30,
retry=ze.RetryPolicy(max_attempts=3),
),
)
Key knobs
workers: thread pool size
max_in_flight: max queued concurrent futures
timeout_s: per-row future timeout
retry: transient retry policy
failure.on_row_error: "continue" or "stop"
Checkpointing
Enable incremental persistence for long runs:
run = dataset.eval(
predict,
checkpoint=ze.CheckpointConfig(
enabled=True,
flush_every_rows=50,
flush_every_seconds=10.0,
),
)
Checkpointing reduces work lost on interruption and supports strong resume behavior.
Use parameters to persist contextual metadata with the run:
run = dataset.eval(
predict,
parameters={
"model": "gpt-4o-mini",
"experiment": "capital-baseline",
"dataset_version": dataset.version_number,
},
)
Keep runs self-describing by always storing model name, dataset version, and subset in run parameters.