Skip to main content
The LLM Stats gateway is a thin, OpenAI-compatible router. You send a request to a single endpoint, we pick the best healthy provider for that model, and we transparently retry on failure — without changing your client code.

Base URL

https://gateway.llm-stats.com
All endpoints documented in this section live under /v1.

Authentication

Send your API key as a Bearer token on every request.
Authorization: Bearer YOUR_API_KEY
Create and manage keys in the LLM Stats dashboard.
Keys starting with ze_ are LLM Stats keys. Don’t proxy upstream provider keys — you don’t need them. Routing, retries, and accounting all happen on our side.

How routing works

1

You send one request

A normal request to a modality endpoint with a model field — no provider prefix, no provider-specific quirks.
2

We pick a healthy provider

The router scores every provider serving that model on live latency, throughput, error rate, and capacity, then dispatches to the best one.
3

We fail over transparently

If the chosen provider degrades or errors mid-flight, we retry on the next best option. Your code only sees the final, successful response (or a clean error envelope if every provider fails).

Endpoints at a glance

ModalityEndpointShape
Chat / LLMPOST /v1/chat/completionsOpenAI-compatible, sync or streaming
Image / VideoPOST /v1/generationsUnified async resource with long-polling
Text-to-speechPOST /v1/tts/synthesizeReturns audio bytes
Speech-to-textPOST /v1/stt/transcribeMultipart upload, JSON transcript

Chat completions

OpenAI-compatible LLM inference with streaming.

Image & video generations

Single async API for image and video models.

Text-to-speech

Synthesize audio from text.

Speech-to-text

Transcribe audio files to text.

Errors

Errors share a single envelope across every modality:
{
  "error": {
    "code": "invalid_input",
    "message": "Human-readable explanation.",
    "param": "model"
  }
}
code is the contract — branch on it, never on message. See the full table in Errors.

Rate limits

Every response carries X-RateLimit-* headers so you can back off without guessing. When you exceed your quota, the gateway returns HTTP 429 with the standard error envelope and a Retry-After header.