5-minute guide

Quickstart

Go from API key to your first smart-routed LLM call. Drop in one endpoint, get automatic model selection that saves you up to 80% on AI costs.

Get your API key

Sign up for a free account and grab your API key in one step. No credit card required to start. The Free Demo is a one-time guided onboarding flow with one example workflow run on open-source models, then upgrade for continued usage.

Create Tenant

Already have a key? Skip to Step 2.

Make your first call

The gateway is OpenAI-compatible. Set model: "auto" and we'll pick the cheapest model that can handle your request.

curl

curl -X POST https://api.zero-human-labs.com/api/v1/gateway/chat/completions \
  -H "Content-Type: application/json" \
  -H "X-API-Key: YOUR_API_KEY" \
  -d '{
    "model": "auto",
    "messages": [
      {"role": "user", "content": "Translate hello to French"}
    ]
  }'

Python (requests)

quickstart.py

import requests

response = requests.post(
    "https://api.zero-human-labs.com/api/v1/gateway/chat/completions",
    headers={
        "Content-Type": "application/json",
        "X-API-Key": "YOUR_API_KEY",
    },
    json={
        "model": "auto",
        "messages": [
            {"role": "user", "content": "Translate hello to French"}
        ],
    },
)

data = response.json()
print(data["choices"][0]["message"]["content"])
print(f"Routed to: {data.get('x_routed_model', 'N/A')}")

JavaScript (fetch)

quickstart.js

const response = await fetch(
  "https://api.zero-human-labs.com/api/v1/gateway/chat/completions",
  {
    method: "POST",
    headers: {
      "Content-Type": "application/json",
      "X-API-Key": "YOUR_API_KEY",
    },
    body: JSON.stringify({
      model: "auto",
      messages: [
        { role: "user", content: "Translate hello to French" },
      ],
    }),
  }
);

const data = await response.json();
console.log(data.choices[0].message.content);
console.log("Routed to:", data.x_routed_model);

See cost savings

The response includes extra fields that show exactly what happened. Here's what you'll get back:

response.json

{
  "id": "chatcmpl-abc123...",
  "object": "chat.completion",
  "model": "gpt-4.1-nano",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Bonjour"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 12,
    "completion_tokens": 3,
    "total_tokens": 15
  },
  "x_cache": "MISS",
  "x_routed_model": "gpt-4.1-nano",
  "x_routing": "auto-routed: simple -> gpt-4.1-nano"
}

Response fields

x_cacheMISS on first request. Repeat the same query and it becomes HIT — served from cache at zero cost.

x_routed_modelThe model that actually handled your request. You asked for "auto", we picked gpt-4.1-nano because the query was simple.

x_routingWhy that model was chosen. Shows the complexity classification and routing decision.

usage.total_tokensWhat you'll be billed for. Nano tokens cost ~80% less than GPT-4o tokens.

The savings: You asked for auto, we routed to gpt-4.1-nano because "Translate hello to French" is a simple task. You saved ~80% vs sending it to GPT-4o. Same result, fraction of the cost.

Try smart routing

The router classifies each request by complexity and picks the cheapest adequate model. Try these three examples to see it in action:

SimpleRoutes to gpt-4.1-nano

"Translate hello to French"

Short, single-step task. Translation, classification, simple Q&A all route to the cheapest model.

MediumRoutes to gpt-4.1-mini

"Write a Python function to sort a list using mergesort"

Multi-step generation. Code writing, structured content, and moderate reasoning get a mid-tier model.

ComplexRoutes to gpt-4o / claude-sonnet-4

"Review this code for security vulnerabilities and suggest fixes"

Deep reasoning required. Code review, architecture, multi-step analysis, and long-form writing get a frontier model.

The router uses content patterns, message length, and conversation depth to classify complexity. You can also override with "budget": "low" or "budget": "high" to force a tier.

Check your usage

See your requests, token counts, and costs with the stats endpoint:

curl

curl https://api.zero-human-labs.com/api/v1/gateway/stats \
  -H "X-API-Key: YOUR_API_KEY"

Returns total requests, tokens used, costs by model, and cache statistics. Use /api/v1/gateway/requests to see individual request logs with routing decisions.

OpenAI-compatible. Drop-in replacement.

Already using the OpenAI API? Just change the base URL and set model: "auto". Your existing code works — and starts saving money immediately.

Quickstart

Get your API key

Make your first call

See cost savings

Response fields

Try smart routing

Check your usage

Next steps

Pricing

Platform

GitHub

Agent Teams

OpenAI-compatible. Drop-in replacement.