Quickstart
Request body
The request body follows the OpenAI chat completions schema. The most common fields:Model ID without any provider prefix (e.g.
gpt-4o, claude-opus-4-6,
llama-4-405b). The router resolves it to the best provider.Conversation history. Each message has
role (system, user, assistant,
or tool) and content (string or content parts for multimodal models).When
true, the response is a Server-Sent Events stream of incremental
deltas — same wire format as OpenAI.Sampling temperature. Range and behavior depend on the underlying model.
Maximum tokens to generate. Capped per-model where the provider enforces a
limit.
Tool / function definitions. Forwarded verbatim to providers that support
tool calling.
Set
{ "type": "json_object" } for JSON-mode, or { "type": "json_schema", "json_schema": {...} } for structured outputs (where supported).top_p, presence_penalty, frequency_penalty, seed, logprobs, stop, user, …) is forwarded to the provider when supported.
Response
The response is an OpenAIchat.completion object:
Streaming
Setstream: true to receive a text/event-stream of chat.completion.chunk events:
data: [DONE], exactly like OpenAI. Failover is handled before the first byte goes out — once streaming starts, the connection sticks with the chosen provider.
Multimodal input
Models that accept images, audio, or files use the standard OpenAI content-parts shape:Errors
Failures use the shared error envelope. Provider-classified failures map to standard HTTP statuses:| Status | Meaning |
|---|---|
400 | Validation failed (max_tokens too high, malformed messages, …). |
401 | Missing or invalid API key. |
403 | Key has no access to that model. |
429 | Rate limited or quota exceeded — retry after Retry-After seconds. |
502 | All providers returned errors. Includes the last upstream message. |
504 | All providers timed out. Safe to retry. |