Anthropic API pricing 2026: cost calculator
- Anthropic API pricing is per-token, billed separately for input (what you send) and output (what Claude returns). Prices vary by model — Haiku is ~19x cheaper per input token than Opus.
- For most production workloads, Sonnet 4.5 is the right default: it costs $3.00/M input tokens and $15.00/M output tokens, versus $15/$75 for Opus 4. Use Opus only when the quality difference is measurable and the cost increase is in the budget.
- The three patterns that cause unexpected bills: long context windows on expensive models, high output-to-input ratios, and sub-agent loops that multiply per-call cost by task count.
Current pricing — verified April 2026
These are the published rates from anthropic.com/api as of April 2026. All prices are per million tokens. Anthropic adjusts pricing periodically; verify against the current pricing page before committing to a budget.
| Model | Input / M tokens | Output / M tokens | Context window |
|---|---|---|---|
| Claude Haiku 3.5 | $0.80 | $4.00 | 200K |
| Claude Sonnet 4.5 | $3.00 | $15.00 | 200K |
| Claude Opus 4 | $15.00 | $75.00 | 200K |
The ratio between input and output cost is approximately 1:5 across all models — output tokens are five times more expensive than input tokens. This asymmetry matters for workloads that generate long responses, like code generation or document drafting.
There is also a prompt caching tier for all models. Cached input tokens (content that appears at the same position in repeated requests) cost significantly less: $0.08/M for Haiku cache reads, $0.30/M for Sonnet cache reads, $1.50/M for Opus cache reads. Prompt caching is a meaningful lever for applications that send the same system prompt or context repeatedly.
Cost calculator — five common workloads
The table above is the raw rate. What follows is what those rates actually mean for real workloads. All calculations use Sonnet 4.5 unless otherwise noted.
Workload 1: Single Claude Code task (scoped)
A typical scoped task: read 3 files (avg 200 lines each), conversation with 4 back-and-forth turns, output 150 lines of code.
Workload 2: Long Claude Code session (2 hours)
An extended session working across a codebase: 20+ file reads, multiple tasks, accumulated conversation history. No /compact.
Workload 3: PR review automation (per PR)
Automated PR review: system prompt, diff of ~400 lines, output a structured review with inline comments.
Workload 4: Sub-agent parallel task (5 agents)
Five parallel sub-agents, each with workload equivalent to Workload 1. Context is not shared — each agent carries its own copy.
Workload 5: Same as Workload 2, but on Opus 4
The two-hour extended session, switching from Sonnet 4.5 to Opus 4 without changing the workload.
Septim Drills — $29 · cost calibration exercises
Twelve structured exercises including a cost-projection drill: you estimate workload cost before running it, then compare against the Anthropic console. The delta closes fast. Includes the sub-agent budget worksheet and the prompt-caching setup guide.
Get Septim Drills — $29 →The three patterns that cause unexpected bills
1. Opus 4 on tasks that Sonnet handles equivalently
The most common mistake: a developer sets their default model to Opus 4 because it is the most capable model, then runs it on workloads where Sonnet 4.5 produces identical results. Code formatting, documentation generation, test writing, and most code review tasks do not benefit from Opus 4's additional capability. At $15/$75 per million tokens versus $3/$15, this costs five times as much for the same output.
The correct default: start with Sonnet 4.5 and measure whether Opus 4 produces meaningfully better results on your specific workload before paying for it.
2. Long context windows with expensive models
A single request to Opus 4 that fills the 200K context window costs $3.00 in input tokens alone. If you are running dozens of these requests daily — document analysis, codebase review, large refactors — the cost compounds quickly. The context management guide covers the techniques for keeping context lean.
Prompt caching helps significantly here for repeated contexts: a 100K-token system prompt cached and reused costs $1.50/M on Opus versus $15/M uncached. If your application sends the same large context on every request, caching is likely your highest-leverage cost lever.
3. Sub-agent loops without a budget ceiling
Claude Code can spawn sub-agents. An agentic workflow that spawns 10 agents to work in parallel on a large codebase multiplies your single-session cost by 10. Without an explicit task budget defined in your CLAUDE.md, this is not a configuration error — it is Claude doing what you asked. The fix is explicit task scoping: define what each agent should read, what it should produce, and how many turns it is allowed.
If you have already had a Tokenocalypse-style spike, Septim Rescue ($299) covers emergency remote intervention to diagnose the source and implement ceiling controls on your workloads.
Token estimation without running the call
Rough estimation rules that hold to within 20% for English-language content:
- 1,000 words of English prose ≈ 1,300–1,500 tokens
- 100 lines of TypeScript ≈ 800–1,000 tokens
- 100 lines of Python ≈ 700–900 tokens
- A 200-line diff ≈ 1,600–2,000 tokens
- The Anthropic tokenizer is available at console.anthropic.com/tokenizer for exact counts
For production applications, use the usage field in every API response to track actual token consumption. Log it from day one — reconstructing cost history from aggregated logs is much harder than collecting it in real time.
Claude Code vs. direct API: which costs more
Claude Code (the CLI tool) uses the same underlying API but adds overhead: the system prompt, tool descriptions, and the conversation management layer all consume tokens you do not pay for when making direct API calls. In practice, a Claude Code session costs roughly 15–25% more per unit of useful output than an equivalent direct API call optimized for the same task.
That overhead is the price of the agentic loop — the ability to iterate, read files, run commands, and course-correct. For structured, predictable API calls (classification, extraction, generation of a known format), the direct API is cheaper. For open-ended development tasks, Claude Code's overhead is worth it.
How prompt caching actually saves money
Prompt caching is the single highest-leverage cost lever that most developers do not use. Here is how it works and what it costs across models:
| Model | Cache write / M | Cache read / M | Vs. uncached input |
|---|---|---|---|
| Claude Haiku 3.5 | $1.00 | $0.08 | 10× cheaper on cache hits |
| Claude Sonnet 4.5 | $3.75 | $0.30 | 10× cheaper on cache hits |
| Claude Opus 4 | $18.75 | $1.50 | 10× cheaper on cache hits |
The math on a real workload: a 50,000-token system prompt sent to Sonnet 4.5 costs $0.15 uncached ($3.00/M × 50K). If you cache it once and read it 100 times, the write costs $0.19 ($3.75/M × 50K) and each subsequent read costs $0.015 ($0.30/M × 50K). One hundred reads costs $0.19 + (99 × $0.015) = $1.675 with caching versus $15.00 without. That is a 9× cost reduction for the same content.
The caveat: prompt caching requires that the cached content is at the same position in the request on every call. If your system prompt or context block varies between requests, caching does not apply. Design for stability.
The monthly budget worksheet
For teams moving from Claude Code Pro ($100/month flat rate) to API billing, this worksheet gives you a pre-buy estimate.
// Monthly API budget estimator — Sonnet 4.5
Sessions per day: 10
Average input per session: 60,000 tokens
Average output per session: 8,000 tokens
Working days per month: 22
Monthly input tokens: 10 × 60,000 × 22 = 13,200,000
Monthly output tokens: 10 × 8,000 × 22 = 1,760,000
Input cost: 13,200,000 / 1,000,000 × $3.00 = $39.60
Output cost: 1,760,000 / 1,000,000 × $15.00 = $26.40
Monthly total (no caching): $66.00
Monthly total (50% cache hit on input): ~$42.00
Break-even vs. $100 Pro plan: ~22 days at this volume
At this session volume, direct API billing is cheaper than the flat Pro plan by roughly 35-58%. The crossover point goes the other way if you run longer sessions (180K+ tokens per session) or switch to Opus 4 for any significant portion of your work.
Common mistakes that inflate the bill
Not running /compact on long Claude Code sessions. The /compact command summarizes conversation history, replacing the full transcript with a compressed summary. On a two-hour session, this can cut the input token count by 40-60%. Without it, every turn re-sends the entire conversation history. The cost difference on a session that runs to 150K tokens without compacting versus 80K with it: $0.42 versus $0.24 on Sonnet 4.5.
Sending full file contents when only a diff was needed. If Claude needs to understand a change, send the diff. If Claude needs to understand the file structure, send the outline or the relevant function. Most "read this file" tasks can be scoped to the relevant section. A 600-line file costs roughly 600-900 tokens; the relevant 30-line function costs 40-60 tokens.
Not monitoring the usage field in API responses. Every Anthropic API response includes a usage field with input_tokens and output_tokens. Log this from day one. Reconstructing cost history from aggregated billing data two months later is much harder than a simple per-call log line.
When this gets expensive — and what that signals
If your Claude Code API bill exceeds $200/month on solo development work, something structural is wrong. The most common causes:
- Opus 4 as the default model (5× Sonnet cost for most tasks)
- Sub-agents without explicit context scoping (each agent re-reads shared files independently)
- Sessions that run 4+ hours without /compact (context window accumulates without compression)
- Automated scripts that call Claude in a loop without a turn budget ceiling
A useful pre-commit check: if you are running any automated Claude Code workflow, add a token ceiling. Septim Tether ($19) ships 3 pre-commit hooks including a token-cap hook that runs before each commit, checks whether the session has crossed a configurable threshold, and exits with a warning if it has. It uses your own Anthropic key, POSIX shell plus inline Python, no external dependencies. It will not prevent you from continuing work — it surfaces the cost before you forget it.
Septim Tether — $19 · 3 Claude Code pre-commit hooks
Token-cap, secrets-scan, and context-audit hooks. Self-hosted, your Anthropic key. POSIX shell plus inline Python. No external dependencies. Runs at git commit before anything ships. Pay once, GitHub repo invite on purchase.