Skip to main content
Every gateway error — across chat, generations, TTS, and STT — uses the same shape, so you only need to write the handling code once.

Envelope

{
  "error": {
    "code": "invalid_input",
    "message": "Human-readable explanation.",
    "param": "model"
  }
}
error.code
string
Stable, machine-readable error code. This is the contract — branch on it in your code.
error.message
string
Human-readable explanation. Display it to operators, log it, but never parse it.
error.param
string | null
Field that caused the error, when applicable (e.g. "model", "input.prompt").

Codes

error.codeHTTPMeaning
invalid_input400Validation failed. Read param for the offending field.
unauthenticated401Missing, malformed, or revoked API key.
insufficient_quota402Account out of credit or over plan limits. Top up to retry.
model_unavailable403Model isn’t enabled for your account or doesn’t exist.
not_found404Unknown resource id (e.g. on GET /v1/generations/{id}).
content_policy422Provider rejected the request for safety / policy reasons.
rate_limited429Slow down. Use Retry-After to back off.
provider_unavailable502Every healthy provider for this model returned an error.
provider_timeout504Every healthy provider timed out. Safe to retry.
internal_error500Bug on our side. Open a support ticket with the request id.

Headers worth handling

HeaderWhenWhat to do
Retry-After429, 502, 504Wait that many seconds before retrying.
X-RateLimit-Limitevery responseYour bucket size for the current window.
X-RateLimit-Remainingevery responseRequests remaining in the current window.
X-RateLimit-Resetevery responseUnix seconds until the bucket refills.
X-Request-Idevery responseCite this id when contacting support.
1

Honor `Retry-After` first

For 429, 502, 504, sleep for the value of Retry-After (with a small jitter) before the next attempt.
2

Cap retries on terminal codes

400, 401, 403, 404, 422 are user errors — retrying won’t help. Surface them to the caller.
3

Bound everything

Apply an outer deadline on every request and limit retries to e.g. 3 attempts so a degraded backend can’t spiral your client.

Example: defensive client

import time, requests

def post_with_retries(url, payload, api_key, max_attempts=3):
    headers = {"Authorization": f"Bearer {api_key}"}
    for attempt in range(max_attempts):
        res = requests.post(url, headers=headers, json=payload, timeout=120)
        if res.status_code < 400:
            return res.json()

        body = res.json().get("error", {})
        code = body.get("code", "internal_error")

        # Permanent — don't retry.
        if code in {"invalid_input", "unauthenticated", "model_unavailable",
                    "insufficient_quota", "not_found", "content_policy"}:
            raise RuntimeError(f"{code}: {body.get('message')}")

        # Transient — back off using Retry-After.
        sleep_s = float(res.headers.get("Retry-After", 2 ** attempt))
        time.sleep(sleep_s)

    raise RuntimeError("Exhausted retries")