How Agency-OS Works

From YAML to governed agent team in 5 minutes

Define your agents. Connect your API. Governance, smart routing, and budget controls handle the rest — no orchestration code, no babysitting.

01

Define your agent team in YAML

Declare agents, roles, model preferences, and governance rules in a single configuration file. No orchestration code, no visual builder — just a human-readable spec that lives in version control.

# agency.yaml — your entire agent team
name: product-squad
governance: balanced        # 5% tax, circuit breakers on

agents:
  - name: engineer
    role: "Write, test, and deploy code"
    model: auto              # smart routing picks cheapest
    budget_usd: 200

  - name: reviewer
    role: "Review PRs for quality and security"
    model: claude-sonnet
    budget_usd: 50

  - name: pm
    role: "Prioritize tasks from user feedback"
    model: auto
    budget_usd: 30

  - name: writer
    role: "Draft docs, changelogs, blog posts"
    model: auto
    budget_usd: 20

routing:
  failover: [openai, anthropic, google]
  cache_ttl_sec: 3600

budget:
  org_ceiling_usd: 500      # hard monthly limit
02

Connect via OpenAI-compatible API

Point your existing OpenAI SDK at the Agency-OS gateway. Same endpoints, same streaming, same request format. Smart routing, caching, failover, and governance are applied transparently — zero code changes required.

# Python — change one line
from openai import OpenAI

client = OpenAI(
    base_url="https://api.zerohumanlabs.com/v1",
    api_key="YOUR_AGENCY_OS_KEY",
)

response = client.chat.completions.create(
    model="auto",  # routes to cheapest capable model
    messages=[
        {"role": "system", "content": "You are a code reviewer."},
        {"role": "user", "content": "Review this PR for security issues."},
    ],
)

# That's it. Smart routing, caching, failover,
# budget enforcement — all handled automatically.
print(response.choices[0].message.content)
03

Governance runs automatically

Circuit breakers freeze misbehaving agents. Budget ceilings stop spend at the limit you set. Reputation scoring demotes underperformers. Collusion detection catches coordinating bad actors. All defaults are calibrated from 146 simulation runs — not guesswork.

# What happens behind the scenes:

> Agent "engineer" submits task bid: $0.04
> Agent "reviewer" submits task bid: $0.02
> Winner: reviewer (lower cost, 94% reputation)

> Agent "writer" fails 5 tasks in 60 seconds
> Circuit breaker activated — agent frozen
> Tasks reassigned to "pm" automatically

> Monthly spend: $487.20 / $500.00 ceiling
> Warning: approaching budget limit
> At $500.00 → all requests return 402

> Routing decision log:
>   "Summarize this doc" → llama-3.1-8b ($0.0001)
>   "Analyze contract"   → claude-sonnet ($0.012)
>   Cache hit rate: 34% (saved $42.10 this month)

Traditional team vs. Agency-OS

Real cost comparisons for common team configurations.

SaaS Product Squad

Agents: Engineer, Reviewer, PM, Writer

Traditional
$8,000-15,000/mo
2-4 weeks hiring

4 contractors at market rates, manual coordination, no cost controls

Agency-OS
$200-500/mo
5 minutes

4 governed agents, automatic task routing, hard budget ceiling, real-time cost tracking

95-97%
cost reduction

Marketing Agency

Agents: Strategist, Copywriter, Designer, Analyst

Traditional
$12,000-20,000/mo
1-2 months ramp-up

Freelancer management, revision cycles, inconsistent quality, no performance tracking

Agency-OS
$300-600/mo
5 minutes

Agents compete for tasks via sealed-bid auctions, reputation scoring ensures quality, circuit breakers prevent bad outputs

96-98%
cost reduction

Research Team

Agents: Literature Reviewer, Analyst, Synthesizer, Critic

Traditional
$6,000-10,000/mo
Weeks of onboarding

Research assistants with varying skill levels, slow turnaround, no systematic quality checks

Agency-OS
$150-400/mo
5 minutes

Agents analyze peer-reviewed papers, cross-reference findings, and produce structured synthesis — with eval harness scoring every output

94-97%
cost reduction

What governance does for you

Every safety mechanism is on by default. Calibrated from 146 simulations, not blog-post defaults.

MechanismWhat it preventsResearch evidence
Circuit breakersCascading failures from misbehaving agents+81% welfare, -11% toxicity (CB-001, d=1.64, 70 runs)
Budget ceilingsRunaway spend from bad loops or misconfigurationsHard enforcement at gateway level — cannot be exceeded
Reputation scoringLow-quality agents winning high-value tasksComplex agents earn 2.3-2.8x less than simple ones (AG-001, 33 runs)
Collusion detectionAgents coordinating to manipulate outcomes137x wealth gap for colluders under monitoring (CL-001, d=3.51)
Transaction tax (≤5%)Free-rider exploitation of shared resourcesWelfare collapse above 5% threshold (TX-001, d=1.18, 29 runs)
Smart routingOverspending on simple tasks that don't need premium models30-80% cost savings on production workloads
Automatic failoverDowntime when a provider has an outageTransparent retry across OpenAI, Anthropic, Google

Ready to replace headcount with agent teams?