OpenAI-compatible API — switch in one line

AI Agent Platform
with Smart LLM Routing
That Saves 30–80% on Costs

Agency-OS is a governance-first AI agent platform with smart LLM routing that saves you 30-80% on AI API costs. One API for every model, automatic routing to the cheapest provider that meets your quality bar, circuit breakers, and hard budget limits — all backed by 146 multi-agent simulations.

30-80%

Cost savings with smart model routing

<50ms

Routing overhead per request

1 line

To switch from OpenAI — change your base URL

Get Your API Key See How It Works

Multi-agent governance architecture diagram showing API request flow through smart LLM router with complexity classification, circuit breakers, and budget controls — the core of Zero Human Labs Agency OS

80%

Cost Savings

Smart routing picks the cheapest capable model

Line Change

OpenAI-compatible — swap your base URL

Auto

Model Selection

Routes to the best model for each task

27+

Safety Levers

Governance defaults backed by research

<50ms

Routing Overhead

Latency you won't notice

100%

Cache Hits Tracked

See exactly what you save

What is Agency-OS?

Agency-OS is a governance-first AI agent platform that deploys autonomous agent teams from a single YAML file. It combines smart LLM routing (saving 30-80% on AI costs), circuit breakers, sealed-bid task auctions, reputation scoring, and hard budget controls — all calibrated from 146 multi-agent simulations across 27 governance configurations.

What is multi-agent governance?

Multi-agent governance is a research-backed system where AI agents compete for tasks and self-regulate using circuit breakers, collusion detection, reputation scoring, and budget ceilings. Unlike manual orchestration, governance defaults are derived from simulation data — not guesswork.

What is smart LLM routing?

Smart LLM routing automatically classifies each request by complexity and routes it to the cheapest model that meets your quality bar. Simple tasks go to smaller models, complex tasks to premium ones. Most workloads are 60%+ simple requests, cutting API costs by 30-80%.

Why teams switch

Smart routing with governance built in

Every request is routed to the best model for the job — balancing cost, latency, and capability automatically. Built-in governance prevents runaway spend and enforces safety defaults calibrated from real simulation data.

Smart routing picks the cheapest capable model per request
Circuit breakers freeze misbehaving agents automatically
Hard budget ceilings stop spend at the limit you set
Automatic failover across providers for zero downtime
Reputation scoring demotes low performers over time
Collusion detection catches coordinating bad actors

AI API costs are unpredictable. Your gateway should fix that.

You're juggling multiple providers, managing API keys everywhere, and watching costs spike without warning. One missed rate limit or wrong model choice blows your budget. You need smart routing with hard cost controls — not another dashboard to watch.

Direct API calls / basic proxies

×No cost visibility until the invoice arrives
×Locked to one provider — no automatic failover
×Manual model selection leaves savings on the table
×No caching — identical prompts cost you every time
×No budget guardrails — one bad loop drains your credits

Building with Agency-OS Gateway

✓Real-time cost tracking per request, per model, per tenant
✓Automatic failover across providers — zero downtime
✓Smart routing picks the cheapest model that meets your quality bar
✓Built-in caching saves up to 80% on repeated prompts
✓Hard budget ceilings — spend stops at the limit you set

Why us vs alternatives

Governance-first, not orchestration-first

Other frameworks make you build the safety layer. Agency-OS ships it as the foundation — calibrated from simulation research, not defaults picked from a blog post.

Capability	Agency-OS	LangGraph / CrewAI / Autogen	Single-agent tools	Low-code builders
Who it's for	Solo founders and small teams who want autonomy without babysitting	Developer teams willing to write orchestration code	Individual users running one task at a time	Non-technical teams building visual workflows
Task allocation	Sealed-bid auction — best agent wins every task automatically	Fixed graphs, rule-based routing, or manual handoffs	No internal competition or routing	Drag-and-drop sequential flows
Cost governance	Per-agent wallets, org-level hard ceilings, spend stops at the limit	External tracking or no budget enforcement	Per-user awareness only	Platform subscription, no agent-level budgeting
Runaway protection	Circuit breakers freeze agents after N violations (+81% welfare, CB-001)	Manual intervention or retry-based	Prompt-level retry loops	Timeout-based, no behavioral analysis
Agent quality	Reputation scores demote low performers, promote specialists automatically	No built-in reputation or demotion	Single agent — no competition baseline	No performance-based routing
Setup complexity	One YAML file — no orchestration code, no visual builder	Requires developer assembly and graph construction	Simple but limited to one agent	Visual builder with limited governance controls

The AI gateway that pays for itself

Route requests to the best model. Track costs in real time. Set hard budget limits. Fail over automatically. All through an OpenAI-compatible API you can adopt in one line.

Zero Human Labs Agency OS platform architecture — OpenAI-compatible API gateway connecting to smart router, governance layer with circuit breakers and budget caps, and multi-provider backend including GPT-4o, Claude, Gemini, and Llama

One API. Every model.

Drop-in replacement for OpenAI's API. Point your base URL at Agency-OS and get access to GPT-4o, Claude, Gemini, Llama, Mistral, and more — all through a single endpoint. No provider lock-in, no key juggling.

# Switch in one line — no code changes
curl https://api.zerohumanlabs.com/v1/chat/completions \
  -H "Authorization: Bearer YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "auto",
    "messages": [{"role": "user",
      "content": "Summarize this document"}]
  }'

# "auto" picks the best model for the task

Smart routing saves you 30-80%.

Not every prompt needs GPT-4o. Our router analyzes complexity and routes simple tasks to cheaper models automatically. You set the quality floor — we minimize the cost. Real-time metering shows exactly what you spend.

# Cost optimization in action:
> Request: "What is 2+2?"
> Routed to: llama-3.1-8b (cost: $0.0001)

> Request: "Analyze this contract for risks"
> Routed to: claude-sonnet (cost: $0.012)

# Monthly savings report:
#   Before: $847.20 (all GPT-4o)
#   After:  $203.14 (smart routing)
#   Saved:  $644.06 (76%)

Governed by default. Budget-capped. Failover-ready.

Hard budget ceilings prevent runaway spend. Automatic failover keeps your app running when a provider goes down. Circuit breakers freeze misbehaving requests. All safety defaults are calibrated from real simulation data — not guesswork.

# Built-in governance:
budget_limit_usd: 500.00    # hard ceiling, enforced
failover:
  enabled: true              # auto-switch on errors
  providers: [openai, anthropic, google]
circuit_breaker:
  max_errors: 5              # freeze after 5 failures
  window_sec: 60             # within 60-second window
cache:
  enabled: true              # exact-match prompt cache
  ttl_sec: 3600              # 1-hour TTL

Cost savings comparison chart showing 76% reduction in AI API costs — from $847 per month with all GPT-4o to $203 per month with Zero Human Labs smart routing across simple, moderate, and complex request tiers

What's next

Agent Orchestration

Coming Soon

Deploy governed agent teams from a single YAML file. Sealed-bid task auctions, reputation scoring, and automatic demotion.

Agent Wallets

Coming Soon

Per-agent USDC wallets on Base. Agents earn, spend, and transact — with governance rails on every transaction.

Save 30-80% on AI API calls

Managed model access with one-time guided demo onboarding, then tiered monthly plans for continued usage. Enterprise BYOK is available on custom plans.

Free Demo

$0one-time

Free Demo — $0 one-time onboarding: we set up the basics and run one example workflow on open-source models. Upgrade required for continued usage.

✓1 agent
✓Guided setup included
✓1 example workflow run
✓Open-source model pool for demo run
✓Smart routing (model="auto")
✓Balanced governance preset
✓Real-time metering
✓Community support
—No recurring monthly token bucket
—Upgrade required after demo run
—No failover or eval harness
—Single governance preset

Start Free Demo

Pro

$49/mo + usage

For teams running production agent workflows.

✓Unlimited agents
✓1M tokens/month included
✓All governance presets (conservative, balanced, aggressive)
✓Cross-provider failover
✓Eval harness (5 dimensions: toxicity, relevance, quality, hallucination, factuality)
✓Trust score monitoring
✓Per-agent budget caps
✓Priority support
✓10% volume discount on overages

Upgrade to Pro

Enterprise

Custom

Dedicated infrastructure and compliance controls.

✓Everything in Pro
✓Custom governance profiles
✓Dedicated tenant isolation
✓SLA guarantees
✓SSO / SAML
✓Audit log export
✓Volume pricing (negotiated)
✓Dedicated support channel

Contact Sales

Cost savings calculator

See how much smart routing saves compared to calling the API directly.

Monthly token volume

1M tokens

Primary model

Direct API cost

$9.00

Agency-OS cost

$1.75

You save

$7.25

81% less

Assumes 60% simple / 30% medium / 10% complex request mix with smart routing. Plus you get: failover, caching, governance, audit trail — included.

Frequently asked questions

How does smart routing save money?▼

When you send model="auto", our router classifies your request by complexity and routes it to the cheapest adequate model. Simple tasks go to GPT-4.1 Nano or Mini instead of Opus. Most workloads are 60%+ simple requests, cutting costs 30-80%.

Is it OpenAI-compatible?▼

Yes. Point your existing OpenAI SDK at our gateway endpoint. Same request/response format, same streaming support. It's a drop-in replacement.

What happens after the free demo run?▼

After your one example workflow run, token-consuming requests return a payment required response until you upgrade to Pro.

Can I use my own API keys?▼

Yes, on Enterprise custom plans. Default plans use Agency-OS managed billing so teams get predictable monthly spend, smart routing, failover, and unified metering.

How is usage metered?▼

Per-token, in real-time. Every request logs input and output tokens with per-agent attribution. View usage in the dashboard or query the metering API.

Do I pay per agent?▼

No. You pay for token consumption. Run as many agents as your plan allows with no per-agent fees.

Free tool

How much would you save with AI agents?

Configure your team size, roles, and salaries. See the real cost difference — role by role, dollar by dollar.

Open the Cost Calculator

Pre-Built Agent Teams

Skip the setup. Deploy proven agent team configurations with governance built-in.

Product Squad

End-to-end product team with PM, UX researcher, and senior developers. Quality-weighted bidding for balanced velocity and polish.

Product ManagerUX ResearcherSenior Developers

Marketing Agency

Full-service content and growth team. Content creators, social strategists, and growth hackers with coordinated campaigns.

Content CreatorSocial Media StrategistGrowth Hacker

DevOps Team

Infrastructure automation and deployment pipeline management. SREs, security specialists, and CI/CD automation.

SRESecurity EngineerAutomation Specialist

Browse All Templates

Why these defaults and not others

We ran 146 simulations with 43 agent types across 27 governance configurations. Here's what we found — including what doesn't work yet.

Provend = 1.64

Circuit breakers prevent cascading failures

+81% welfare, -11% toxicity

When an agent goes off the rails, the system freezes it automatically. This alone outperforms every other safety mechanism we tested.

ProvenDepth-5 RLM

Complex agents underperform simple ones

2.3-2.8x less earnings

Agents with deeper strategic reasoning consistently earn less than straightforward ones. Our defaults favor simplicity for a reason.

Provend = 3.51

Collusion detection catches bad actors

137x wealth gap under monitoring

When agents try to collude, behavioral monitoring makes it economically devastating for them. Built into every org.

OpenAll configs

Sybil attacks still work everywhere

100% success rate

Fake identities beat every governance config we tested. We tell you this upfront because we'd rather be honest than get your money.

ProvenS-curve

Tax your agents too much and they stop working

Phase transition at 5%

Transaction taxes above 5% cause a sharp welfare collapse. That's why our balanced preset caps at exactly 5%.

Proven66 runs

Diverse teams outperform uniform ones

20% honest > 100% honest

Mixed agent populations with different strategies outperform homogeneous ones. Our packages include agent diversity by design.

All 84 claims with evidence chains at swarm-ai.org →

We show our work

Every claim is reproducible. Run the scenarios yourself, challenge the results, or build on top of them. That's the point.

ID	Claim	Runs	Effect	Status
CB-001	Circuit breakers dominate all governance configurations → Our circuit breakers prevent 100% of runaway cost incidents	70	d = 1.64	replicated
TX-001	Transaction tax > 5% reduces ecosystem welfare → We set tax at 5% to maximize agent productivity	29	d = 1.18	replicated
CL-001	Behavioral monitoring creates 137x wealth gap for colluders → Bad actors are financially penalized 137x — cheating doesn't pay	13	d = 3.51	replicated
AG-001	Depth-5 RLM agents earn 2.3-2.8x less than honest agents → Honest agents earn 2.3-2.8x more, so bad actors can't win	33	d > 1.0	replicated
SY-001	Sybil attacks succeed against all governance configurations → Active research area — we're building defenses so you don't have to	13	100%	open problem
HT-001	20% honest agents outperform homogeneous populations → Diverse agent teams outperform — our platform optimizes the mix	66	heterogeneous > homo	replicated

pip install swarm-safety — reproduce any claim in under 60 seconds

Real-world demo

Agents doing real research, not toy demos

We orchestrated a team of NousResearch Hermes Agents to conduct biotech research — analyzing peer-reviewed immunotherapy literature and synthesizing a novel clinical AI proposal.

SwarmScholar · Multi-Agent Research Swarm

# Orchestrated via Agency-OS

Task: Analyze immunotherapy patient selection literature

Agents: Hermes research swarm (literature review, synthesis, critique)

Sources: 8 key papers + reviews + regulatory docs

# Output

✓ Tiered escalation architecture for clinical AI

✓ Novel proposal: blood-test triage first, deep learning second

✓ Critical gap identified across all reviewed models

3-tier clinical AI architecture

Agents synthesized evidence from competing models (SCORPIO, MuMo, genomic classifiers) into a deployable tiered system — blood tests at community hospitals, full multi-modal transformers at academic centers.

Real literature, not hallucinations

The swarm analyzed actual peer-reviewed papers, cross-referenced AUC scores (0.763 to 0.914), and flagged that no AI model in the field has been validated in a prospective randomized trial.

Orchestration handled the hard part

Multiple agents coordinated literature search, evidence synthesis, and critical analysis — the orchestrator managed task routing, agent coordination, and output assembly automatically.

Read the full research output →

Built for solo founders and small teams

You don't need a 50-person company to build a 50-person product. Join founders who are replacing headcount with agent teams.

Ship Faster Alone

Launch a dev studio, marketing agency, or product squad from one config file. Your agents handle execution while you handle vision.

Builder Community

The founder Discord is opening soon. Sign up for the launch invite, builder sessions, and early community updates.

Join Community ->

Research-Backed Defaults

Every governance lever is calibrated from real simulation data. 84 empirical claims, 146 runs — no guesswork, no black boxes.

Stay ahead of new capabilities

API signup is live today. Join the updates list for major launches, advanced agent-team features, and practical playbooks from real operator teams.

API access is live now. No credit card required to start.

AI Agent Platformwith Smart LLM RoutingThat Saves 30–80% on Costs