Base URL
/v1.
Authentication
Send your API key as a Bearer token on every request.How routing works
You send one request
A normal request to a modality endpoint with a
model field — no provider
prefix, no provider-specific quirks.We pick a healthy provider
The router scores every provider serving that model on live latency,
throughput, error rate, and capacity, then dispatches to the best one.
Endpoints at a glance
| Modality | Endpoint | Shape |
|---|---|---|
| Chat / LLM | POST /v1/chat/completions | OpenAI-compatible, sync or streaming |
| Image / Video | POST /v1/generations | Unified async resource with long-polling |
| Text-to-speech | POST /v1/tts/synthesize | Returns audio bytes |
| Speech-to-text | POST /v1/stt/transcribe | Multipart upload, JSON transcript |
Chat completions
OpenAI-compatible LLM inference with streaming.
Image & video generations
Single async API for image and video models.
Text-to-speech
Synthesize audio from text.
Speech-to-text
Transcribe audio files to text.
Errors
Errors share a single envelope across every modality:code is the contract — branch on it, never on message. See the full table in Errors.
Rate limits
Every response carriesX-RateLimit-* headers so you can back off without
guessing. When you exceed your quota, the gateway returns HTTP 429 with the
standard error envelope and a Retry-After header.