How is managed LLM search answer mode billed?

A managed /v1/search request with answer: true charges across legs that share one requestId: ~1 credit base search, ~1 credit per scraped result, and a token-metered LLM synthesis leg. The LLM leg uses a reserve-commit-refund ledger — it pre-charges a worst-case estimate, then refunds the difference once the real token cost is known. Plain searches without answer mode incur no LLM leg.

How does fastCRW bill managed search model usage?

Managed answer mode meters model usage in credits, deterministically and always rounding up so it never undercharges. A typical synthesis leg lands around 3 credits, and every request is capped at 8,000 credits. Price your own workload in credits against the live /pricing page.

What is the maximum cost of one managed search request?

Every managed search request is hard-capped at 8,000 credits (SEARCH_RESERVE_HARD_CAP_CREDITS). On top of that, each LLM leg is capped at 1,024 tokens by the engine, and the reserve-commit-refund ledger means you can never be charged beyond your wallet balance.

Which plans include managed answer mode?

Managed answer mode runs on the paid plans — HOBBY, STANDARD, GROWTH, and SCALE. LLM features require a paid plan; the FREE tier has no answer-synthesis path. Usage is metered in credits and hard-capped at 8,000 credits per request, so the worst case for one request is bounded regardless of tier.

Which model powers fastCRW's managed search by default?

Managed answer mode runs fastCRW's managed LLM — you do not pick or manage the model, and usage is metered in credits. Its low effective per-token cost is what keeps capped managed answer mode affordable. The same managed LLM powers fastCRW's LLM extraction path (formats: ["json"]) on paid plans.

Managed LLM Search API Costs: The Capped Credit Model

By the fastCRW team · Pricing/billing mechanics verified 2026-05-18 (managed-search config re-verified in prod 2026-05-30) · Verify independently before buying.

Managed LLM search API costs are the line item most teams forget to model: you price the search call, you price the scrape, and then answer-mode synthesis quietly adds a metered third leg that scales with how much the model writes. This page walks through exactly how fastCRW bills a managed /v1/search request with answer: true — how the managed LLM leg is metered in credits and the hard per-request cap — so you can forecast the spend instead of being surprised by it.

What a managed LLM search request charges

A plain /v1/search query is cheap and flat: 1 credit per query, plus 1 credit per result when you ask fastCRW to scrape the result content. That part is forecastable from request volume alone — no model is involved, so there is no usage meter.

The cost picture changes the moment you turn on answer synthesis. When you pass answer: true (or summarizeResults: true) the request grows extra legs: the base search, the per-result scrape, and then an LLM leg that reads the scraped content and writes an answer or per-result summaries. That LLM leg is metered in credits based on usage, which is the only part of a managed search request that is genuinely variable.

The managed answer-mode default

On a paid plan, the managed path uses fastCRW's managed LLM. Its low effective per-token cost is the whole reason managed answer mode stays affordable: a typical synthesis leg costs only a handful of credits. Note the honest scope here — this is the managed search answer path. fastCRW's separate LLM extraction path (formats: ["json"]) also runs on the managed LLM on paid plans; do not conflate the two.

How managed answer usage is metered

Managed answer mode meters the model usage in credits, deterministically and always rounding up so fastCRW never undercharges. The two numbers that bound a request are customer-facing and predictable:

Mechanic	Value
Billing	Metered in credits based on model usage
Per-request hard cap	8,000 credits (`SEARCH_RESERVE_HARD_CAP_CREDITS`)
Per-leg token cap	1,024 tokens
Default model	fastCRW managed LLM

A worked example: a few credits per answer

A typical managed synthesis leg lands around 3 credits. So a full answer-mode request that scraped a few results lands around: 1 credit base search + ~3 credits scrape + ~3 credits for the synthesis leg — roughly 7 credits all-in for that request. Your numbers will vary with result count and answer length, but the shape holds: the LLM leg is a small, bounded slice, not a runaway. Price your own workload in credits against the live /pricing page, where your plan's rates live.

Bounding the per-request cost

The reason managed answer cost stays predictable is that two ceilings make a single request impossible to blow past. First, the per-leg max_tokens is capped at 1,024 by the engine, so no single synthesis call can write an essay's worth of tokens. Second, and more importantly for budgeting, every managed search request is hard-capped at 8,000 credits (SEARCH_RESERVE_HARD_CAP_CREDITS). That is the worst case for one request, full stop.

Reserve, commit, refund: you never overspend your balance

Managed search uses a reserve-commit-refund ledger so a caller can never burn past their wallet. The flow, all under one requestId: the base search and scrape legs charge as usual; the LLM leg reserves a worst-case credit estimate up front; then a commit step reconciles and refunds the difference between the reserve and the actual token cost. Because the reserve happens first, checkAndConsumeQuota returns a 402 if you do not have the credits — the wallet clamp keeps your balance at or above zero. You are charged the real cost, but you can never be charged more than you have.

Where managed mode is available

Managed answer mode runs on the paid plans — HOBBY, STANDARD, GROWTH, and SCALE. LLM features require a paid plan; FREE-tier callers do not have an answer-synthesis path. A HOBBY user with their plan's monthly credits has the answer-mode leg bounded by the same per-request cap and ledger as everyone else, so even a heavy answer-mode month stays within the credits they bought.

Why the capped credit model

The managed model is the turnkey path: zero key management, and metered credits buy you the convenience and a capped, single-line bill. Because the LLM leg is metered in credits and hard-capped at 8,000 credits per request, your worst case is bounded and computable. The end-to-end answer-engine setup is covered in the Perplexity-style answer engine tutorial.

Estimating monthly answer-mode spend

Because managed cost is metered, the honest way to forecast it is bottom-up from your own traffic. A quick worksheet:

Count answer-mode requests per month. Only requests with answer: true or summarizeResults: true incur an LLM leg; plain searches do not.
Estimate credits per request. Base search (1) + per-result scrape (1 each) + the synthesis leg. The worked example above puts the LLM leg around 3 credits for a typical managed answer; size it up if you summarize many results per request.
Multiply and compare to your plan allowance. Tie the total credits back to the monthly credits in your tier (see /pricing) rather than to a dollar figure — credits are the unit you actually spend.
Size your plan to your volume. Below a few thousand answer requests a month, the managed credit cost is usually noise against the convenience; above that, move to a higher tier where the per-credit rate drops. Every request stays bounded by the same per-request cap and ledger.

If you are new to the endpoint itself, the search API release notes and the search API for AI agents guide cover the request shape and how answer mode fits into an agent loop.

Honest scope and limits

To keep this useful rather than a sales sheet: fastCRW is stateless per request, so there is no cached cross-request answer reuse to amortize cost — each managed answer request pays its own LLM leg. There is no /v1/deep-research or /v1/agent endpoint that would orchestrate many synthesis legs into one billable task; managed answer mode is single-request synthesis over your search results. And managed mode does not let you pick the model — the managed LLM is the model. None of these is a billing trap; they are just the edges of what the capped managed answer mode covers.

Sources

Live pricing: fastcrw.com/pricing · repo github.com/us/crw

Managed LLM Search API Costs: The Capped Credit Model

What a managed LLM search request charges

The managed answer-mode default

How managed answer usage is metered

A worked example: a few credits per answer

Bounding the per-request cost

Reserve, commit, refund: you never overspend your balance

Where managed mode is available

Why the capped credit model

Estimating monthly answer-mode spend

Honest scope and limits

Sources

Frequently asked questions

Try fastCRW free

More engineering posts

Cloudflare's September 15 AI Crawler Wall: What Agent Builders Need to Know

Streaming Scrape Results in Node.js with SSE

Privacy-First Web Scraping API for Regulated Work