Quickstart
Go from API key to your first smart-routed LLM call. Drop in one endpoint, get automatic model selection that saves you up to 80% on AI costs.
Get your API key
Sign up for a free account and grab your API key in one step. No credit card required to start. The Free Demo is a one-time guided onboarding flow with one example workflow run on open-source models, then upgrade for continued usage.
Create TenantAlready have a key? Skip to Step 2.
Make your first call
The gateway is OpenAI-compatible. Set model: "auto" and we'll pick the cheapest model that can handle your request.
curl -X POST https://api.zero-human-labs.com/api/v1/gateway/chat/completions \
-H "Content-Type: application/json" \
-H "X-API-Key: YOUR_API_KEY" \
-d '{
"model": "auto",
"messages": [
{"role": "user", "content": "Translate hello to French"}
]
}'Python (requests)
import requests
response = requests.post(
"https://api.zero-human-labs.com/api/v1/gateway/chat/completions",
headers={
"Content-Type": "application/json",
"X-API-Key": "YOUR_API_KEY",
},
json={
"model": "auto",
"messages": [
{"role": "user", "content": "Translate hello to French"}
],
},
)
data = response.json()
print(data["choices"][0]["message"]["content"])
print(f"Routed to: {data.get('x_routed_model', 'N/A')}")JavaScript (fetch)
const response = await fetch(
"https://api.zero-human-labs.com/api/v1/gateway/chat/completions",
{
method: "POST",
headers: {
"Content-Type": "application/json",
"X-API-Key": "YOUR_API_KEY",
},
body: JSON.stringify({
model: "auto",
messages: [
{ role: "user", content: "Translate hello to French" },
],
}),
}
);
const data = await response.json();
console.log(data.choices[0].message.content);
console.log("Routed to:", data.x_routed_model);See cost savings
The response includes extra fields that show exactly what happened. Here's what you'll get back:
{
"id": "chatcmpl-abc123...",
"object": "chat.completion",
"model": "gpt-4.1-nano",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Bonjour"
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 12,
"completion_tokens": 3,
"total_tokens": 15
},
"x_cache": "MISS",
"x_routed_model": "gpt-4.1-nano",
"x_routing": "auto-routed: simple -> gpt-4.1-nano"
}Response fields
x_cacheMISS on first request. Repeat the same query and it becomes HIT — served from cache at zero cost.x_routed_modelThe model that actually handled your request. You asked for "auto", we picked gpt-4.1-nano because the query was simple.x_routingWhy that model was chosen. Shows the complexity classification and routing decision.usage.total_tokensWhat you'll be billed for. Nano tokens cost ~80% less than GPT-4o tokens.The savings: You asked for auto, we routed to gpt-4.1-nano because "Translate hello to French" is a simple task. You saved ~80% vs sending it to GPT-4o. Same result, fraction of the cost.
Try smart routing
The router classifies each request by complexity and picks the cheapest adequate model. Try these three examples to see it in action:
"Translate hello to French"Short, single-step task. Translation, classification, simple Q&A all route to the cheapest model.
"Write a Python function to sort a list using mergesort"Multi-step generation. Code writing, structured content, and moderate reasoning get a mid-tier model.
"Review this code for security vulnerabilities and suggest fixes"Deep reasoning required. Code review, architecture, multi-step analysis, and long-form writing get a frontier model.
The router uses content patterns, message length, and conversation depth to classify complexity. You can also override with "budget": "low" or "budget": "high" to force a tier.
Check your usage
See your requests, token counts, and costs with the stats endpoint:
curl https://api.zero-human-labs.com/api/v1/gateway/stats \ -H "X-API-Key: YOUR_API_KEY"
Returns total requests, tokens used, costs by model, and cache statistics. Use /api/v1/gateway/requests to see individual request logs with routing decisions.
Next steps
OpenAI-compatible. Drop-in replacement.
Already using the OpenAI API? Just change the base URL and set model: "auto". Your existing code works — and starts saving money immediately.