By the fastCRW team · Tavily figures sourced from competitor profiling verified 2026-05-18 and flagged "reportedly" — re-verify against Tavily's live pricing before acting · fastCRW launch pricing expires 2026-06-01 · Verify independently.
Disclosure: We build fastCRW, so weight this accordingly. We've kept the Tavily figures explicitly "reported" rather than frozen, and there's a "Where Tavily genuinely wins" section below, because a cost comparison that hides the competitor's strengths isn't useful to you.
Tavily research endpoint cost: what the per-request range actually means
The headline number developers run into is that a single call to Tavily's Research endpoint reportedly costs anywhere from 15 to 250 credits (per Tavily docs and our competitor profiling, verified 2026-05-18 — re-verify before relying on it). That is a 16x spread on one endpoint, and the reason that range matters for budgeting is that the upper bound, not the lower one, is the number you have to plan against. If a single research task can quietly stack to 250 credits, your worst-case monthly bill is set by how often the agent hits the top of that band — something you usually don't control from the outside.
This page is scoped narrowly to the cost anatomy of that one endpoint: how the range is composed, why it has no hard ceiling, and what a capped alternative looks like. If you want a broader shopping list of predictable research APIs (neural search engines, build-your-own loops, other managed options), that lives in the sibling roundup Tavily research cost: predictable alternatives. For the general Tavily plan/credit breakdown, see Tavily pricing explained.
What the Tavily Research endpoint charges
Tavily's API exposes a few distinct operations, and they are not priced the same way. The common understanding (per competitor profiling, 2026-05-18 — reportedly, not a frozen figure) is roughly:
- Search — the cheapest leg, a lightweight credit cost per query.
- Extract — pulling clean content from URLs, priced per batch of URLs.
- Research — the expensive endpoint: a multi-step agentic task that internally fans out into search legs, extract legs, and a pro-model synthesis pass, reportedly metered at 15-250 credits per request.
The Research endpoint is expensive precisely because it is a composite. One Research call is not one operation; it is an orchestration that issues several searches, fetches and extracts several pages, and then runs one or more large-language-model passes to synthesise an answer. Each of those internal legs consumes credits, and they are summed into the single Research charge you see on your bill.
How the 15-250 range is composed
Think of the per-request cost as the sum of three weighted parts:
| Internal leg | What it does | Why it varies |
|---|---|---|
| Search legs | Issues N queries to find candidate sources | A harder question triggers more queries |
| Extract legs | Fetches and cleans the top sources | More sources read = more extract credits |
| Pro-model synthesis | LLM reasons over the gathered context | Deeper reasoning / more tokens = more cost |
A shallow lookup that needs one search and two pages lands near the floor. A genuinely open-ended research task — "compare the last three years of policy changes across these five jurisdictions" — fans out into many searches, many extracts, and a long synthesis, landing near the ceiling. Same endpoint, same code path, wildly different cost, and the difference is driven by the question, which is the thing your users supply at runtime.
Why the Research cost is unbounded
The structural problem isn't the size of the range — it's that the range has no hard per-request ceiling enforced by the billing layer. The 250-credit figure is a reported observed maximum, not a contractual cap. When a single task can stack arbitrarily many search and extract legs before synthesis, the upper end is set by the agent's behaviour rather than by a number you can point to in advance.
Three things make this unbounded in practice:
- No leg cap you control. The depth of the fan-out is decided inside the endpoint based on the query, so a verbose or ambiguous prompt can quietly multiply the leg count.
- Pro-model synthesis inflates the upper end. The reasoning pass is the most expensive leg, and the more context the search/extract stage gathers, the more tokens the model chews through — so cost compounds rather than adding linearly.
- Pay-as-you-go metering makes the upper bound the planning number. On a PAYG meter, you don't get a refund for landing under budget; you only get billed when you go over. So the only safe assumption for a finance forecast is the top of the band times your request volume.
For an agent that runs research autonomously — where the inputs are end-user questions you never see — this is the real budget risk. A handful of pathological queries in a month can dominate the bill, and there's no balance-level guardrail that says "this one request cannot cost more than X."
The capped contrast: a managed answer mode with a hard ceiling
fastCRW does not ship a managed /v1/deep-research endpoint — that's an honest gap we state plainly (more on it below). What it does offer is a managed answer mode on /v1/search (answer: true), and the way that request is metered is structurally different: it is bounded.
- Reserve-commit-refund ledger. Before the LLM leg runs, the worst-case cost is reserved against your balance; after it completes, the difference between the reserve and the actual cost is refunded. The practical guarantee: a caller can never burn past their balance, because the reserve is checked first and the wallet clamps at zero (per the managed-search billing rules).
- Per-leg token cap. Each LLM leg is capped at
max_tokens = 1024, so a single answer synthesis can't run away on length. - A low-cost managed default. The managed path runs a managed LLM, metered in credits based on usage — a low per-token rate that keeps the typical request far below the cap.
The base mechanics are flat and easy to reason about: /v1/search is 1 credit per query, and scraping a result costs 1 credit per result. The LLM answer leg is the only metered-by-tokens part, and it sits under the 8,000-credit ceiling. So where Tavily Research gives you an unbounded composite, fastCRW gives you a flat base plus one capped, refundable LLM leg.
Availability note: managed answer mode is available on paid plans (HOBBY and up); the FREE plan has no LLM features. You can derive your own per-request budget from the live tiers on /pricing.
Reading your own Research bill
If you're already on Tavily and trying to forecast spend, the useful exercise is to estimate the leg count of a typical request rather than trusting the floor figure.
- Count the legs of a representative task. How many distinct sources does your agent typically pull? How deep is the synthesis? A 3-source quick answer and a 20-source deep dive are different cost classes on the same endpoint.
- Multiply by the worst case, not the average. Because the meter only punishes overruns, your forecast should use the upper band (near 250 credits) for the fraction of requests that go deep. The average will understate the bill in any month with a few heavy queries.
- Find your unbounded inputs. Wherever an end-user's free-text question drives the research, that's where the upper end of the range becomes your real budget. Those paths are the ones a per-request cap would protect.
The takeaway isn't "Tavily is expensive" — it's that the shape of the meter (unbounded composite, no balance-level guardrail) makes the upper bound the number you plan against. A model with a hard per-request ceiling and a reserve-refund ledger lets you plan against the cap instead, which for autonomous agents is the difference between a forecast and a guess.
Where Tavily genuinely wins
To keep this honest, here's where Tavily's Research endpoint is the better tool:
- It's a real turnkey research agent. One call does multi-step search, extraction, and synthesis. fastCRW has no managed
/v1/deep-researchor/v1/agentendpoint — to get the same outcome you compose the loop yourself over search + scrape primitives. - Less orchestration to build. If you want a research answer without writing and maintaining a fan-out/synthesis loop, the burn may be worth not building it.
- Search benchmark caveat. Our search latency benchmark (avg 880 ms over 100 queries;
triple-bench.ts) measures plain search, not Tavily's Research endpoint — it's not a like-for-like comparison against Research, and we won't pretend it is. See /benchmarks for the full numbers.
If you've decided turnkey research is worth a separate, capped budget, the build-your-own contrast and the wider alternatives landscape are covered in building a deep-research agent on fastCRW and Tavily research cost: predictable alternatives.
Sources
- Tavily Research per-request range (15-250 credits) and leg composition: internal competitor profiling, verified 2026-05-18 — reported, not a locked figure. Re-verify against tavily.com pricing before acting.
- fastCRW repo and live pricing: github.com/us/crw · /pricing
Related: Tavily pricing explained · Tavily research cost: predictable alternatives · Deep-research agent on fastCRW
