Start with the SDK quickstart
Build your first dataset, run a task, and compute quality metrics.
About the stack: LLM Stats runs on top of ZeroEval, the evaluation
library developed by the same team behind LLM Stats.
What You Can Do
Upload benchmark data
Add data via git, CSV import, or the browser editor.
Load and inspect data
Pull datasets, access rows with dot notation, and work with slices/subsets.
Run evals
Execute tasks with configurable workers, retries, and checkpoints.
Score quality
Add row, column, and run-level evaluations for robust measurement.
Inspect signals and traces
Emit runtime signals during execution and inspect them through traces.
Call models via the gateway
OpenAI-compatible chat, plus unified image, video, TTS, and STT endpoints.
Recommended Path
Install and authenticate
Follow
/python-sdk/installation to install
zeroeval and configure ZEROEVAL_API_KEY.Complete the first eval
Run the walkthrough in
/python-sdk/quickstart.Upload benchmark data
Push data via git, import CSVs, or use the browser editor. Learn about
subsets, versioning, and the recommended repository layout.
Documentation Scope
- Getting Started: setup and first run
- Datasets: creation, loading, versioning, subsets, multimodal
- Evals: execution, scoring, metrics, repetitions, resume
- Examples: end-to-end text and multimodal workflows