Skip to main content
Engineering

Managed LLM Search API Costs: Capped DeepSeek

How managed LLM search adds model tokens to your bill: a 3x markup on DeepSeek with an 8,000-credit per-request cap that keeps answer-mode cost predictable.

fastcrw
June 8, 20269 min readLast updated: June 2, 2026

By the fastCRW team · Pricing/billing mechanics verified 2026-05-18 (managed-search config re-verified in prod 2026-05-30) · fastCRW launch pricing expires 2026-06-01 · Verify independently before buying.

Managed LLM search API costs are the line item most teams forget to model: you price the search call, you price the scrape, and then answer-mode synthesis quietly adds a token-metered third leg that scales with how much the model writes. This page walks through exactly how fastCRW bills a managed /v1/search request with answer: true — the DeepSeek default, the 3x markup, and the hard per-request cap — so you can forecast the spend instead of being surprised by it. Every number here traces to marketing/CANONICAL-FACTS.md §6 and src/lib/llm-pricing.ts; we are not asserting any competitor's pricing.

What a managed LLM search request charges

A plain /v1/search query is cheap and flat: 1 credit per query, plus 1 credit per result when you ask fastCRW to scrape the result content (CANONICAL-FACTS §3). That part is forecastable from request volume alone — no model is involved, so there is no token meter.

The cost picture changes the moment you turn on answer synthesis. When you pass answer: true (or summarizeResults: true) the request grows extra legs: the base search, the per-result scrape, and then an LLM leg that reads the scraped content and writes an answer or per-result summaries. That LLM leg is billed token-by-token, which is the only part of a managed search request that is genuinely variable.

The DeepSeek managed default

If you do not supply your own key, the managed path defaults to DeepSeek — pricing key deepseek-v4-flash, which the engine maps to the DeepSeek API model deepseek-chat (MANAGED_SEARCH_DEFAULT_MODEL, src/lib/llm-pricing.ts:54, verified 2026-05-30). DeepSeek's low per-token rate is the whole reason managed answer mode stays affordable: a typical synthesis leg costs fractions of a cent in raw provider spend. Note the honest scope here — the managed search default is DeepSeek specifically. fastCRW's separate LLM extraction path (formats: ["json"]) supports OpenAI and Anthropic providers only (§9); do not conflate the two.

The credit-to-dollar conversion

Managed answer mode does not charge you DeepSeek's dollars directly. It converts raw provider cost into fastCRW credits with a fixed, published formula. The markup is a transparent multiplier, not a hidden surcharge:

ConstantValueSource
Markup multiplier3xMARKUP_MULTIPLIER (llm-pricing.ts:28)
Internal cost reference$0.001 / creditUSD_PER_CREDIT (llm-pricing.ts)
Per-request hard cap8,000 creditsSEARCH_RESERVE_HARD_CAP_CREDITS (llm-pricing.ts:62)
Per-leg token cap1,024 tokensengine SEARCH_LLM_MAX_TOKENS_PER_LEG

The conversion is: creditsCharged = ceil(provider_usd × 3 / 0.001) = ceil(provider_usd × 3000). It always rounds up, so fastCRW never undercharges, and the raw provider cost is stored separately (without markup) for auditability.

A worked example: 903 microdollars to 3 credits

A real managed DeepSeek synthesis leg measured in production cost 903 microdollars (0.000903 USD) in raw provider spend. Apply the formula: 0.000903 × 3000 = 2.71, rounded up to 3 credits. So a full answer-mode request that scraped a few results lands around: 1 credit base search + ~3 credits scrape + 3 credits for the synthesis leg — roughly 7 credits all-in for that request. Your numbers will vary with result count and answer length, but the shape holds: the LLM leg is a small, bounded slice, not a runaway.

One subtlety worth knowing: $0.001/credit is the internal cost-to-credit reference, not what you pay per credit. You buy credits at plan rates (see /pricing), so the realized economics differ by tier — but the per-request credit math above is exactly what gets deducted from your balance.

Bounding the per-request cost

The reason managed answer cost stays predictable is that two ceilings make a single request impossible to blow past. First, the per-leg max_tokens is capped at 1,024 by the engine, so no single synthesis call can write an essay's worth of tokens. Second, and more importantly for budgeting, every managed search request is hard-capped at 8,000 credits (SEARCH_RESERVE_HARD_CAP_CREDITS, llm-pricing.ts:62) — about $2.67 of raw DeepSeek spend. That is the worst case for one request, full stop.

Reserve, commit, refund: you never overspend your balance

Managed search uses a reserve-commit-refund ledger so a caller can never burn past their wallet. The flow, all under one requestId: the base search and scrape legs charge as usual; the LLM leg reserves a worst-case credit estimate up front; then a commit step reconciles and refunds the difference between the reserve and the actual token cost. Because the reserve happens first, checkAndConsumeQuota returns a 402 if you do not have the credits — the wallet clamp keeps your balance at or above zero. You are charged the real cost, but you can never be charged more than you have.

Where managed mode is available

Managed answer mode runs on the paid plans — HOBBY, STANDARD, GROWTH, and SCALE (CANONICAL-FACTS §6). FREE-tier callers cannot use the managed model, but they can still use answer synthesis by bringing their own key (BYOK), which is available on every plan including FREE. A HOBBY user with their plan's monthly credits can burn at most a small fraction of a dollar in real DeepSeek cost per month even in the pathological case where every credit went to the LLM — which cannot actually happen, since search and scrape legs always consume some of the budget too.

Managed vs BYOK for answer mode

fastCRW gives you two ways to pay for the model, and the cost difference lives entirely in the markup:

DimensionManagedBYOK
Who supplies the keyfastCRW (no key needed)You (llmApiKey + llmProvider)
Token markup3x on raw provider costNone — only the flat infra fee
Default modelDeepSeek deepseek-v4-flashYour provider's model
ProvidersDeepSeek (managed default)OpenAI / Anthropic / DeepSeek / Azure / OpenAI-compatible
PlansHOBBY and upEvery plan, including FREE
Per-request cap8,000 creditsYour provider's billing applies

Managed is the turnkey path: zero key management, the markup buys you the convenience and the capped, single-line bill. BYOK is the escape hatch: you pay your provider directly with no fastCRW token markup, trading a little setup for the lowest possible per-token cost. The detailed break-even for BYOK across extraction and search lives in our companion BYOK vs managed LLM extraction pricing piece, and the DeepSeek key setup is covered in the DeepSeek BYOK tutorial.

Estimating monthly answer-mode spend

Because managed cost is metered, the honest way to forecast it is bottom-up from your own traffic. A quick worksheet:

  1. Count answer-mode requests per month. Only requests with answer: true or summarizeResults: true incur an LLM leg; plain searches do not.
  2. Estimate credits per request. Base search (1) + per-result scrape (1 each) + the synthesis leg. The worked example above puts the LLM leg around 3 credits for a typical DeepSeek answer; size it up if you summarize many results per request.
  3. Multiply and compare to your plan allowance. Tie the total credits back to the monthly credits in your tier (see /pricing) rather than to a dollar figure — credits are the unit you actually spend.
  4. Decide managed vs BYOK at your volume. Below a few thousand answer requests a month, the managed 3x markup is usually noise against the convenience. Above that, dropping the markup with BYOK starts to pay for the setup, and you also gain provider choice and data-residency control.

If you are new to the endpoint itself, the search API release notes and the search API for AI agents guide cover the request shape and how answer mode fits into an agent loop.

Honest scope and limits

To keep this useful rather than a sales sheet: fastCRW is stateless per request, so there is no cached cross-request answer reuse to amortize cost — each managed answer request pays its own LLM leg. There is no /v1/deep-research or /v1/agent endpoint that would orchestrate many synthesis legs into one billable task; managed answer mode is single-request synthesis over your search results. And the managed default is DeepSeek only — if you need a frontier model for synthesis, that is a BYOK decision, not a managed one. None of these is a billing trap; they are just the edges of what the capped managed model covers.

Sources

  • fastCRW canonical fact sheet: marketing/CANONICAL-FACTS.md §6 (managed search, DeepSeek default, 3x markup, 8,000-credit cap) — verified 2026-05-29/30
  • Billing mechanics: .claude/rules/managed-search-billing.md (credit↔dollar formula, reserve-commit-refund) and src/lib/llm-pricing.ts (MARKUP_MULTIPLIER, SEARCH_RESERVE_HARD_CAP_CREDITS, MANAGED_SEARCH_DEFAULT_MODEL)
  • Live pricing: fastcrw.com/pricing · repo github.com/us/crw

Related: BYOK vs managed LLM extraction pricing · CRW search API release · Search API for AI agents · DeepSeek BYOK tutorial

FAQ

Frequently asked questions

How is managed LLM search answer mode billed?
A managed /v1/search request with answer: true charges across legs that share one requestId: ~1 credit base search, ~1 credit per scraped result, and a token-metered LLM synthesis leg. The LLM leg uses a reserve-commit-refund ledger — it pre-charges a worst-case estimate, then refunds the difference once the real token cost is known. Plain searches without answer mode incur no LLM leg.
What markup does fastCRW apply to managed search tokens?
A 3x markup on the raw provider cost (MARKUP_MULTIPLIER in src/lib/llm-pricing.ts:28). The conversion is creditsCharged = ceil(provider_usd × 3 / 0.001) = ceil(provider_usd × 3000), and it always rounds up. For example, a 903-microdollar DeepSeek synthesis leg becomes 3 credits (0.000903 × 3000 = 2.71, rounded up).
What is the maximum cost of one managed search request?
Every managed search request is hard-capped at 8,000 credits (SEARCH_RESERVE_HARD_CAP_CREDITS, llm-pricing.ts:62), which is about $2.67 of raw DeepSeek spend. On top of that, each LLM leg is capped at 1,024 tokens by the engine, and the reserve-commit-refund ledger means you can never be charged beyond your wallet balance.
Can I avoid the token markup with my own API key?
Yes. With BYOK you supply your own llmApiKey and llmProvider, and there is no token markup — you pay your provider directly, plus the flat infra fee. BYOK is available on every plan including FREE, and it supports OpenAI, Anthropic, DeepSeek, Azure, and OpenAI-compatible providers, versus managed mode which defaults to DeepSeek on paid plans only.
Which model powers fastCRW's managed search by default?
DeepSeek — pricing key deepseek-v4-flash, mapped to the DeepSeek API model deepseek-chat (MANAGED_SEARCH_DEFAULT_MODEL, src/lib/llm-pricing.ts:54, verified 2026-05-30). DeepSeek's low per-token rate is what keeps capped managed answer mode affordable. Note this is the managed search default specifically; fastCRW's LLM extraction path (formats: ["json"]) supports OpenAI and Anthropic only.

Get Started

Try CRW Free

Self-host for free (AGPL) or use fastCRW cloud with 500 free credits — no credit card required.

Continue exploring

More engineering posts

View category archive