By the fastCRW team · Pricing/billing mechanics verified 2026-05-18 (managed-search config re-verified in prod 2026-05-30) · fastCRW launch pricing expires 2026-06-01 · Verify independently before buying.
Managed LLM search API costs are the line item most teams forget to model: you price the search call, you price the scrape, and then answer-mode synthesis quietly adds a token-metered third leg that scales with how much the model writes. This page walks through exactly how fastCRW bills a managed /v1/search request with answer: true — the DeepSeek default, the 3x markup, and the hard per-request cap — so you can forecast the spend instead of being surprised by it. Every number here traces to marketing/CANONICAL-FACTS.md §6 and src/lib/llm-pricing.ts; we are not asserting any competitor's pricing.
What a managed LLM search request charges
A plain /v1/search query is cheap and flat: 1 credit per query, plus 1 credit per result when you ask fastCRW to scrape the result content (CANONICAL-FACTS §3). That part is forecastable from request volume alone — no model is involved, so there is no token meter.
The cost picture changes the moment you turn on answer synthesis. When you pass answer: true (or summarizeResults: true) the request grows extra legs: the base search, the per-result scrape, and then an LLM leg that reads the scraped content and writes an answer or per-result summaries. That LLM leg is billed token-by-token, which is the only part of a managed search request that is genuinely variable.
The DeepSeek managed default
If you do not supply your own key, the managed path defaults to DeepSeek — pricing key deepseek-v4-flash, which the engine maps to the DeepSeek API model deepseek-chat (MANAGED_SEARCH_DEFAULT_MODEL, src/lib/llm-pricing.ts:54, verified 2026-05-30). DeepSeek's low per-token rate is the whole reason managed answer mode stays affordable: a typical synthesis leg costs fractions of a cent in raw provider spend. Note the honest scope here — the managed search default is DeepSeek specifically. fastCRW's separate LLM extraction path (formats: ["json"]) supports OpenAI and Anthropic providers only (§9); do not conflate the two.
The credit-to-dollar conversion
Managed answer mode does not charge you DeepSeek's dollars directly. It converts raw provider cost into fastCRW credits with a fixed, published formula. The markup is a transparent multiplier, not a hidden surcharge:
| Constant | Value | Source |
|---|---|---|
| Markup multiplier | 3x | MARKUP_MULTIPLIER (llm-pricing.ts:28) |
| Internal cost reference | $0.001 / credit | USD_PER_CREDIT (llm-pricing.ts) |
| Per-request hard cap | 8,000 credits | SEARCH_RESERVE_HARD_CAP_CREDITS (llm-pricing.ts:62) |
| Per-leg token cap | 1,024 tokens | engine SEARCH_LLM_MAX_TOKENS_PER_LEG |
The conversion is: creditsCharged = ceil(provider_usd × 3 / 0.001) = ceil(provider_usd × 3000). It always rounds up, so fastCRW never undercharges, and the raw provider cost is stored separately (without markup) for auditability.
A worked example: 903 microdollars to 3 credits
A real managed DeepSeek synthesis leg measured in production cost 903 microdollars (0.000903 USD) in raw provider spend. Apply the formula: 0.000903 × 3000 = 2.71, rounded up to 3 credits. So a full answer-mode request that scraped a few results lands around: 1 credit base search + ~3 credits scrape + 3 credits for the synthesis leg — roughly 7 credits all-in for that request. Your numbers will vary with result count and answer length, but the shape holds: the LLM leg is a small, bounded slice, not a runaway.
One subtlety worth knowing: $0.001/credit is the internal cost-to-credit reference, not what you pay per credit. You buy credits at plan rates (see /pricing), so the realized economics differ by tier — but the per-request credit math above is exactly what gets deducted from your balance.
Bounding the per-request cost
The reason managed answer cost stays predictable is that two ceilings make a single request impossible to blow past. First, the per-leg max_tokens is capped at 1,024 by the engine, so no single synthesis call can write an essay's worth of tokens. Second, and more importantly for budgeting, every managed search request is hard-capped at 8,000 credits (SEARCH_RESERVE_HARD_CAP_CREDITS, llm-pricing.ts:62) — about $2.67 of raw DeepSeek spend. That is the worst case for one request, full stop.
Reserve, commit, refund: you never overspend your balance
Managed search uses a reserve-commit-refund ledger so a caller can never burn past their wallet. The flow, all under one requestId: the base search and scrape legs charge as usual; the LLM leg reserves a worst-case credit estimate up front; then a commit step reconciles and refunds the difference between the reserve and the actual token cost. Because the reserve happens first, checkAndConsumeQuota returns a 402 if you do not have the credits — the wallet clamp keeps your balance at or above zero. You are charged the real cost, but you can never be charged more than you have.
Where managed mode is available
Managed answer mode runs on the paid plans — HOBBY, STANDARD, GROWTH, and SCALE (CANONICAL-FACTS §6). FREE-tier callers cannot use the managed model, but they can still use answer synthesis by bringing their own key (BYOK), which is available on every plan including FREE. A HOBBY user with their plan's monthly credits can burn at most a small fraction of a dollar in real DeepSeek cost per month even in the pathological case where every credit went to the LLM — which cannot actually happen, since search and scrape legs always consume some of the budget too.
Managed vs BYOK for answer mode
fastCRW gives you two ways to pay for the model, and the cost difference lives entirely in the markup:
| Dimension | Managed | BYOK |
|---|---|---|
| Who supplies the key | fastCRW (no key needed) | You (llmApiKey + llmProvider) |
| Token markup | 3x on raw provider cost | None — only the flat infra fee |
| Default model | DeepSeek deepseek-v4-flash | Your provider's model |
| Providers | DeepSeek (managed default) | OpenAI / Anthropic / DeepSeek / Azure / OpenAI-compatible |
| Plans | HOBBY and up | Every plan, including FREE |
| Per-request cap | 8,000 credits | Your provider's billing applies |
Managed is the turnkey path: zero key management, the markup buys you the convenience and the capped, single-line bill. BYOK is the escape hatch: you pay your provider directly with no fastCRW token markup, trading a little setup for the lowest possible per-token cost. The detailed break-even for BYOK across extraction and search lives in our companion BYOK vs managed LLM extraction pricing piece, and the DeepSeek key setup is covered in the DeepSeek BYOK tutorial.
Estimating monthly answer-mode spend
Because managed cost is metered, the honest way to forecast it is bottom-up from your own traffic. A quick worksheet:
- Count answer-mode requests per month. Only requests with
answer: trueorsummarizeResults: trueincur an LLM leg; plain searches do not. - Estimate credits per request. Base search (1) + per-result scrape (1 each) + the synthesis leg. The worked example above puts the LLM leg around 3 credits for a typical DeepSeek answer; size it up if you summarize many results per request.
- Multiply and compare to your plan allowance. Tie the total credits back to the monthly credits in your tier (see /pricing) rather than to a dollar figure — credits are the unit you actually spend.
- Decide managed vs BYOK at your volume. Below a few thousand answer requests a month, the managed 3x markup is usually noise against the convenience. Above that, dropping the markup with BYOK starts to pay for the setup, and you also gain provider choice and data-residency control.
If you are new to the endpoint itself, the search API release notes and the search API for AI agents guide cover the request shape and how answer mode fits into an agent loop.
Honest scope and limits
To keep this useful rather than a sales sheet: fastCRW is stateless per request, so there is no cached cross-request answer reuse to amortize cost — each managed answer request pays its own LLM leg. There is no /v1/deep-research or /v1/agent endpoint that would orchestrate many synthesis legs into one billable task; managed answer mode is single-request synthesis over your search results. And the managed default is DeepSeek only — if you need a frontier model for synthesis, that is a BYOK decision, not a managed one. None of these is a billing trap; they are just the edges of what the capped managed model covers.
Sources
- fastCRW canonical fact sheet:
marketing/CANONICAL-FACTS.md§6 (managed search, DeepSeek default, 3x markup, 8,000-credit cap) — verified 2026-05-29/30 - Billing mechanics:
.claude/rules/managed-search-billing.md(credit↔dollar formula, reserve-commit-refund) andsrc/lib/llm-pricing.ts(MARKUP_MULTIPLIER,SEARCH_RESERVE_HARD_CAP_CREDITS,MANAGED_SEARCH_DEFAULT_MODEL) - Live pricing: fastcrw.com/pricing · repo github.com/us/crw
Related: BYOK vs managed LLM extraction pricing · CRW search API release · Search API for AI agents · DeepSeek BYOK tutorial
