The honest range: $5k to $200k+
"It depends" is the consultant's hedge. So here are the brackets we see across our own pipeline and 12 competitors we benchmarked: $5k–$200k+ for the initial build, depending on tier and scope. Most production agents land between $25k and $80k for a first version.
That spread isn't because pricing is mysterious — it's because "an AI agent" can mean five very different things. Below: the three tiers we quote against, and what each actually gets you.
POC: $5k–$15k · 2–4 weeks
A working demo against your real data. Not production. Not multi-user. Not monitored. The point is to prove the agent can do the task at all — and to give you a real artifact to circulate internally before committing budget.
- One workflow, single happy path
- Run on the developer's machine or a sandbox deploy
- No auth, no logging, no rate limiting
- Eval against ~20 sample inputs
What you get out: a Loom video, a Notion doc, a working agent that you can poke at. What you don't get: something you can put in front of customers.
MVP: $15k–$45k · 4–8 weeks
The first version actual humans can use. Production deploy, basic auth, logging, eval harness, one round of iteration on prompts and tools after seeing real usage.
- Production deployment (Vercel, Fly, or your stack)
- Auth, rate limiting, basic observability (Langfuse or similar)
- Tool integrations: 2–4 real ones
- Eval set of 100–200 inputs, regression-tested before deploys
- Documentation and handoff
This is what 70% of our clients actually need. They don't realize it until they've tried to spec a "full system" and the scope balloons.
Production system: $45k–$200k+ · 8–16 weeks
Multi-tenant, multi-channel, real SLA. Custom data ingestion. Fine-tuned prompts. Eval pipelines. Human-in-the-loop for high-stakes outputs. Integrations across 5+ tools. Compliance work if you're in regulated space (HIPAA, SOC 2).
This tier is where you stop calling it "the agent" and start calling it "the platform." Maintenance becomes a real line item.
Where the money goes
The number that surprises clients: model and prompt work is the smallest line.
| Line item | % of budget |
|---|---|
| Integration engineering (APIs, webhooks, data plumbing) | 50–60% |
| Frontend / UX / dashboards | 10–15% |
| Prompts, tools, model selection, eval | 10–20% |
| Deployment, observability, security | 10–15% |
| PM + writing + docs | 5–10% |
Monthly running cost (after launch)
Once it's live, you're paying for: model tokens, infra, observability tools, ongoing maintenance.
- Low volume agent (under 5k runs/mo): $80–$300 model + $50–$150 infra = ~$200–$500/mo
- Medium volume (5k–50k runs/mo): $400–$2k model + $200–$500 infra = ~$800–$3k/mo
- High volume (50k+ runs/mo): negotiated rates with Anthropic/OpenAI, hosting on dedicated. $3k–$20k+/mo depending on token weight
Plus retainer if you want us (or another team) to keep optimizing — typically $3k–$8k/mo for 1 day/week of engineering.
Where you'll waste money
- Building before measuring. If you can't quantify the manual cost of the task today, you can't quantify the savings tomorrow. Measure two weeks first.
- Multi-agent orchestration for solo tasks. CrewAI and friends are sexy. Most use cases need one agent with three tools, not five agents talking to each other.
- Buying a platform before having a use case. $15k/year for an AI-ops platform you'll use for one workflow. Free tier exists.
- Over-engineering eval. A spreadsheet with 50 inputs/outputs is fine for v1. You don't need a full eval framework day one.
Most AI projects fail on integration, not on the model. Budget like that's true and you'll be right more often.