Pull a dataset by name
import zeroeval as ze
ze.init()
dataset = ze.Dataset.pull("capital-cities")
print(dataset.name)
print(len(dataset))
Optionally load a specific version number:
dataset_v2 = ze.Dataset.pull("capital-cities", version_number=2)
If the dataset defines named subsets, you can pull one directly:
diamond = ze.Dataset.pull("gpqa", subset="diamond")
print(len(diamond))
Iterate over rows
Rows are yielded as DotDict objects, so both key and dot access are possible.
for row in dataset:
print(row.question, row.answer) # dot access
Index and slice
dataset[idx] returns a single row (DotDict)
dataset[start:end] returns a new Dataset with copied rows
first = dataset[0]
top_100 = dataset[:100]
print(type(first)) # DotDict
print(type(top_100)) # Dataset
Sliced datasets preserve backend metadata (dataset id/version/subset) when
available, so they can still be evaluated and pushed in normal workflows.
Access columns and normalized data
print(dataset.columns) # union of all row keys (excluding internal row_id)
print(dataset.data) # row payloads without wrapper metadata
Minimal versioning example
dataset = ze.Dataset(
"qa-demo",
data=[{"row_id": "q1", "question": "6 * 7", "answer": "42"}],
)
dataset.push()
latest = ze.Dataset.pull("qa-demo")
pinned = ze.Dataset.pull("qa-demo", version_number=dataset.version_number)
Common loading errors
Dataset.pull(...) requires a valid ZeroEval initialization.ze.init(api_key="sk_ze_...")
Confirm dataset name and project context (API key/org mapping). Pull uses the project resolved from your API credentials.