5-minute guide

Quickstart

Go from API key to your first smart-routed LLM call. Drop in one endpoint, get automatic model selection that saves you up to 80% on AI costs.

1

Get your API key

Sign up for a free account and grab your API key in one step. No credit card required to start. The Free Demo is a one-time guided onboarding flow with one example workflow run on open-source models, then upgrade for continued usage.

Create Tenant

Already have a key? Skip to Step 2.

2

Make your first call

The gateway is OpenAI-compatible. Set model: "auto" and we'll pick the cheapest model that can handle your request.

curl
curl -X POST https://api.zero-human-labs.com/api/v1/gateway/chat/completions \
  -H "Content-Type: application/json" \
  -H "X-API-Key: YOUR_API_KEY" \
  -d '{
    "model": "auto",
    "messages": [
      {"role": "user", "content": "Translate hello to French"}
    ]
  }'
Python (requests)
quickstart.py
import requests

response = requests.post(
    "https://api.zero-human-labs.com/api/v1/gateway/chat/completions",
    headers={
        "Content-Type": "application/json",
        "X-API-Key": "YOUR_API_KEY",
    },
    json={
        "model": "auto",
        "messages": [
            {"role": "user", "content": "Translate hello to French"}
        ],
    },
)

data = response.json()
print(data["choices"][0]["message"]["content"])
print(f"Routed to: {data.get('x_routed_model', 'N/A')}")
JavaScript (fetch)
quickstart.js
const response = await fetch(
  "https://api.zero-human-labs.com/api/v1/gateway/chat/completions",
  {
    method: "POST",
    headers: {
      "Content-Type": "application/json",
      "X-API-Key": "YOUR_API_KEY",
    },
    body: JSON.stringify({
      model: "auto",
      messages: [
        { role: "user", content: "Translate hello to French" },
      ],
    }),
  }
);

const data = await response.json();
console.log(data.choices[0].message.content);
console.log("Routed to:", data.x_routed_model);
3

See cost savings

The response includes extra fields that show exactly what happened. Here's what you'll get back:

response.json
{
  "id": "chatcmpl-abc123...",
  "object": "chat.completion",
  "model": "gpt-4.1-nano",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Bonjour"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 12,
    "completion_tokens": 3,
    "total_tokens": 15
  },
  "x_cache": "MISS",
  "x_routed_model": "gpt-4.1-nano",
  "x_routing": "auto-routed: simple -> gpt-4.1-nano"
}

Response fields

x_cacheMISS on first request. Repeat the same query and it becomes HIT — served from cache at zero cost.
x_routed_modelThe model that actually handled your request. You asked for "auto", we picked gpt-4.1-nano because the query was simple.
x_routingWhy that model was chosen. Shows the complexity classification and routing decision.
usage.total_tokensWhat you'll be billed for. Nano tokens cost ~80% less than GPT-4o tokens.

The savings: You asked for auto, we routed to gpt-4.1-nano because "Translate hello to French" is a simple task. You saved ~80% vs sending it to GPT-4o. Same result, fraction of the cost.

4

Try smart routing

The router classifies each request by complexity and picks the cheapest adequate model. Try these three examples to see it in action:

SimpleRoutes to gpt-4.1-nano
"Translate hello to French"

Short, single-step task. Translation, classification, simple Q&A all route to the cheapest model.

MediumRoutes to gpt-4.1-mini
"Write a Python function to sort a list using mergesort"

Multi-step generation. Code writing, structured content, and moderate reasoning get a mid-tier model.

ComplexRoutes to gpt-4o / claude-sonnet-4
"Review this code for security vulnerabilities and suggest fixes"

Deep reasoning required. Code review, architecture, multi-step analysis, and long-form writing get a frontier model.

The router uses content patterns, message length, and conversation depth to classify complexity. You can also override with "budget": "low" or "budget": "high" to force a tier.

5

Check your usage

See your requests, token counts, and costs with the stats endpoint:

curl
curl https://api.zero-human-labs.com/api/v1/gateway/stats \
  -H "X-API-Key: YOUR_API_KEY"

Returns total requests, tokens used, costs by model, and cache statistics. Use /api/v1/gateway/requests to see individual request logs with routing decisions.

6

Next steps

OpenAI-compatible. Drop-in replacement.

Already using the OpenAI API? Just change the base URL and set model: "auto". Your existing code works — and starts saving money immediately.