Skip to main content
POST https://gateway.llm-stats.com/v1/chat/completions
The chat endpoint is drop-in compatible with the OpenAI SDK. Point any OpenAI client at our base URL and your existing code keeps working — we route the model to the best healthy provider behind the scenes.

Quickstart

from openai import OpenAI

client = OpenAI(
    base_url="https://gateway.llm-stats.com/v1",
    api_key="YOUR_API_KEY",
)

response = client.chat.completions.create(
    model="claude-opus-4-6",
    messages=[{"role": "user", "content": "Hello, how are you?"}],
    temperature=0.7,
    max_tokens=1024,
)

print(response.choices[0].message.content)

Request body

The request body follows the OpenAI chat completions schema. The most common fields:
model
string
required
Model ID without any provider prefix (e.g. gpt-4o, claude-opus-4-6, llama-4-405b). The router resolves it to the best provider.
messages
array
required
Conversation history. Each message has role (system, user, assistant, or tool) and content (string or content parts for multimodal models).
stream
boolean
default:"false"
When true, the response is a Server-Sent Events stream of incremental deltas — same wire format as OpenAI.
temperature
number
Sampling temperature. Range and behavior depend on the underlying model.
max_tokens
integer
Maximum tokens to generate. Capped per-model where the provider enforces a limit.
tools
array
Tool / function definitions. Forwarded verbatim to providers that support tool calling.
response_format
object
Set { "type": "json_object" } for JSON-mode, or { "type": "json_schema", "json_schema": {...} } for structured outputs (where supported).
Any other OpenAI-supported parameter (top_p, presence_penalty, frequency_penalty, seed, logprobs, stop, user, …) is forwarded to the provider when supported.

Response

The response is an OpenAI chat.completion object:
{
  "id": "chatcmpl_…",
  "object": "chat.completion",
  "created": 1730000000,
  "model": "claude-opus-4-6",
  "choices": [
    {
      "index": 0,
      "message": { "role": "assistant", "content": "Hi! …" },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 12,
    "completion_tokens": 27,
    "total_tokens": 39
  }
}

Streaming

Set stream: true to receive a text/event-stream of chat.completion.chunk events:
stream = client.chat.completions.create(
    model="claude-opus-4-6",
    messages=[{"role": "user", "content": "Stream this."}],
    stream=True,
)
for chunk in stream:
    delta = chunk.choices[0].delta.content or ""
    print(delta, end="", flush=True)
The stream terminates with data: [DONE], exactly like OpenAI. Failover is handled before the first byte goes out — once streaming starts, the connection sticks with the chosen provider.

Multimodal input

Models that accept images, audio, or files use the standard OpenAI content-parts shape:
{
  "role": "user",
  "content": [
    { "type": "text", "text": "What's in this image?" },
    {
      "type": "image_url",
      "image_url": { "url": "https://example.com/cat.png" }
    }
  ]
}

Errors

Failures use the shared error envelope. Provider-classified failures map to standard HTTP statuses:
StatusMeaning
400Validation failed (max_tokens too high, malformed messages, …).
401Missing or invalid API key.
403Key has no access to that model.
429Rate limited or quota exceeded — retry after Retry-After seconds.
502All providers returned errors. Includes the last upstream message.
504All providers timed out. Safe to retry.