Skip to main content

Start with the SDK quickstart

Build your first dataset, run a task, and compute quality metrics.
About the stack: LLM Stats runs on top of ZeroEval, the evaluation library developed by the same team behind LLM Stats.

What You Can Do

Upload benchmark data

Add data via git, CSV import, or the browser editor.

Load and inspect data

Pull datasets, access rows with dot notation, and work with slices/subsets.

Run evals

Execute tasks with configurable workers, retries, and checkpoints.

Score quality

Add row, column, and run-level evaluations for robust measurement.

Inspect signals and traces

Emit runtime signals during execution and inspect them through traces.

Call models via the gateway

OpenAI-compatible chat, plus unified image, video, TTS, and STT endpoints.
1

Install and authenticate

Follow /python-sdk/installation to install zeroeval and configure ZEROEVAL_API_KEY.
2

Complete the first eval

Run the walkthrough in /python-sdk/quickstart.
3

Upload benchmark data

Push data via git, import CSVs, or use the browser editor. Learn about subsets, versioning, and the recommended repository layout.
4

Productionize eval execution

Add scoring, repetition, and resume/reliability controls.

Documentation Scope

  • Getting Started: setup and first run
  • Datasets: creation, loading, versioning, subsets, multimodal
  • Evals: execution, scoring, metrics, repetitions, resume
  • Examples: end-to-end text and multimodal workflows