Skip to main content

Pull a dataset by name

import zeroeval as ze

ze.init()

dataset = ze.Dataset.pull("capital-cities")
print(dataset.name)
print(len(dataset))
Optionally load a specific version number:
dataset_v2 = ze.Dataset.pull("capital-cities", version_number=2)
If the dataset defines named subsets, you can pull one directly:
diamond = ze.Dataset.pull("gpqa", subset="diamond")
print(len(diamond))

Iterate over rows

Rows are yielded as DotDict objects, so both key and dot access are possible.
for row in dataset:
    print(row.question, row.answer)  # dot access

Index and slice

  • dataset[idx] returns a single row (DotDict)
  • dataset[start:end] returns a new Dataset with copied rows
first = dataset[0]
top_100 = dataset[:100]

print(type(first))    # DotDict
print(type(top_100))  # Dataset
Sliced datasets preserve backend metadata (dataset id/version/subset) when available, so they can still be evaluated and pushed in normal workflows.

Access columns and normalized data

print(dataset.columns)  # union of all row keys (excluding internal row_id)
print(dataset.data)     # row payloads without wrapper metadata

Minimal versioning example

dataset = ze.Dataset(
    "qa-demo",
    data=[{"row_id": "q1", "question": "6 * 7", "answer": "42"}],
)
dataset.push()

latest = ze.Dataset.pull("qa-demo")
pinned = ze.Dataset.pull("qa-demo", version_number=dataset.version_number)

Common loading errors

Dataset.pull(...) requires a valid ZeroEval initialization.
ze.init(api_key="sk_ze_...")
Confirm dataset name and project context (API key/org mapping). Pull uses the project resolved from your API credentials.