By the fastCRW team · Benchmark figures verified 2026-05-18 against bench/server-runs/RESULT_3WAY_1000_FULL.md (2026-05-08) and benchmarks/triple-bench.ts · Verify independently before quoting internally.
Disclosure: we build fastCRW, so weight the latency framing accordingly — but the whole point of this page is to hand you the p50/p90/p99 split honestly, including the tail where fastCRW is the worst of the three tools we tested, so you can budget timeouts instead of being surprised by them.
LangGraph web scraping latency: why a retrieval node compounds
If you have already decided to add a live-web retrieval node to a LangGraph RAG agent, your problem is no longer "how do I scrape a page" — it is per-node latency inside the graph loop. A web-retrieval node sits on the critical path: nothing downstream (re-rank, synthesize, answer) runs until it returns, and in an agentic graph that node can fire several times per user turn as conditional edges loop back to fetch more context. That is why median scrape latency, not a vendor's best-case number, is the figure that decides how the loop feels.
This page is the latency-tuning companion to the build tutorial. If you have not wired the node yet, read the LangGraph web scraping agent tutorial first to stand up the node, then come back here to budget timeouts and retries against real percentiles.
The node sits on the critical path
In a typical web-aware RAG graph the flow is: classify intent → decide retrieve-vs-answer (conditional edge) → retrieve node (search + scrape) → grade documents → either answer or loop back to retrieve. Every iteration of that loop pays the retrieval node's latency again. A 2-second node called once is invisible; called three times across a re-plan loop it is the dominant cost of the turn. So the question to answer before tuning anything is: how many times does my graph realistically re-enter the retrieve node, and what is the per-call distribution?
p50 vs p90: which number a graph loop actually feels
A single average hides the part that hurts. A graph loop feels the median on the common path and the tail on the unlucky one — and because a loop re-rolls the dice each iteration, your effective exposure to the tail grows with loop depth. Call a node with a p90 of 14 seconds three times and the probability that at least one call lands in that tail is roughly 1 − 0.9³ ≈ 27%. That is why you must look at p50 and p90 separately, and why you set node timeouts off the tail, never the median.
The fastCRW latency picture for a graph node, told honestly
Here is the canonical performance data, with full provenance. These come from a single run of diagnose_3way.py over Firecrawl's own public 1,000-URL scrape-content-dataset-v1 (3,000 total requests, 2026-05-08), plus a separate 100-query search benchmark (triple-bench.ts).
| Metric | fastCRW | Crawl4AI | Firecrawl |
|---|---|---|---|
| p50 scrape latency | 1914 ms | 1916 ms | 2305 ms |
| p90 scrape latency | 14157 ms | 4754 ms | 6937 ms |
| p99 scrape latency | 15012 ms | 13749 ms | 21107 ms |
| Truth-recall (of 819 labeled URLs) | 63.74% | 59.95% | 56.04% |
Median scrape 1914 ms beats Firecrawl's 2305 ms
On the common path, fastCRW's p50 scrape latency of 1914 ms beats Firecrawl's 2305 ms (diagnose_3way.py, 2026-05-08) and is effectively tied with Crawl4AI (1916 ms — 2 ms apart). For a retrieve node that lands on the median most of the time, that is roughly 390 ms shaved off every common-path iteration versus Firecrawl. Across a 3-iteration loop that is over a second of wall-clock you are not spending, before any caching.
Search averages 880 ms over a 100-query benchmark
If your retrieve node does discovery first (search) then fetches (scrape), add the search leg. fastCRW search averaged 880 ms over a 100-query benchmark, with 73 of 100 latency wins against Firecrawl and Tavily (triple-bench.ts, 100 queries; a separate point-in-time measurement from the scrape run above). A search-then-scrape retrieve node therefore budgets to roughly 880 ms + 1914 ms ≈ 2.8 s on the median path. Cite those raw numbers, not a speed multiple — and keep the two benchmarks separate, because the search run does not measure scrape and vice versa.
The p90 14157 ms tail is the worst of three — and it is causal
Now the part most vendor pages bury: fastCRW's p90 of 14157 ms is the worst of the three tools tested (Crawl4AI 4754 ms, Firecrawl 6937 ms). We disclose it because it is not noise — it is causal. The chrome-stealth fallback that recovers the labeled URLs the other two miss (the same mechanism behind the highest truth-recall) is exactly what produces the slow tail. You are trading a fatter p90 for a higher recall. For a RAG node that is often the right trade, but only if you size your timeout for that tail deliberately rather than discover it in production. See scraping latency explained for why percentiles, not averages, are the only honest way to publish this.
Setting node-level timeouts and retries around the tail
LangGraph lets you bound work at the node level (per-node timeouts, retry policies on edges, and your own deadline inside the tool). The data above tells you exactly where to set them.
Pick a per-node timeout from the p90, not the p50
If you set a retrieve-node timeout at, say, 3 seconds because the median is under 2, you will kill roughly the slowest 10%+ of fastCRW fetches — precisely the chrome-stealth fallbacks that were about to return the high-recall content you wanted. A defensible starting point is a timeout a little above the p90 you actually observe on your URL mix — on this dataset that is north of 14 seconds for scrape, lower if your traffic skews toward easy http/lightpanda pages. The honest rule: timeout off the tail, accept that the median path finishes in ~2 s and only the unlucky tail uses the full budget.
Retry vs re-plan on a timed-out retrieval edge
When a retrieve call does blow the deadline, you have two levers, and they are not interchangeable. A blind retry of the same URL on the same renderer often hits the same slow path again — you pay the tail twice. A smarter conditional edge re-plans: try a cheaper renderer or a different source on the first timeout, and only escalate back to the heavy fallback if recall demands it. Because fastCRW is stateless per request, the graph owns this decision — there is no server-side session to lean on, so encode the retry-vs-re-plan policy in your graph edges, not in the tool.
Cache to avoid re-fetching across loop iterations
The cheapest latency is the request you never send. In a loop that may re-enter the retrieve node, keep a per-run cache of {url: markdown} in graph state so a second pass over the same URL is a dictionary hit, not a 1.9 s (or tail-case 14 s) round trip. This is the single highest-leverage tuning move for loop-heavy graphs and it costs you nothing but a few lines of state management. Pair it with the patterns in building a RAG pipeline with fastCRW for the indexing side.
How accuracy keeps the graph from looping
Latency tuning usually stops at timeouts. It should not, because the cheapest way to lower total node latency is to not re-enter the node at all — and that is an accuracy property, not a speed property.
Highest truth-recall of the three tools tested
fastCRW posted the highest truth-recall of the three tools tested — 63.74% of 819 labeled URLs (522 of 819), versus Crawl4AI's 59.95% and Firecrawl's 56.04% (diagnose_3way.py, 2026-05-08). Paired honestly with its 87.7% scrape-success and 0 thrown errors across the 3,000 requests, that means the first fetch is more likely to return the content the answer actually needs.
Fewer empty retrievals means fewer re-plan iterations
An agentic RAG graph loops when the grade-documents node decides the retrieved context is insufficient. Every empty or thin retrieval is a vote to loop back and pay the retrieve node's latency again. Higher truth-recall means more first-pass retrievals clear the grading bar, which means fewer loop iterations, which means lower total latency for the turn — even though per-call fastCRW carries the worst p90. The tail you occasionally pay is offset by the loop iterations you do not.
Latency you do not pay because the first fetch succeeded
Put concretely: a tool with a faster p90 but lower recall that forces a second and third loop iteration can be slower end-to-end than one slow-tail fetch that succeeds on the first try. The right metric for a graph is not per-call latency in isolation — it is per-turn latency, which is per-call latency multiplied by expected loop iterations. Recall is the term that drives iterations down.
A worked latency-tuning example
Here is the loop to run on your own traffic; we are deliberately not pretending one dataset's percentiles are yours.
Instrument node latency in graph state
Add a small field to your graph state — a list of {node, started, ended, url, renderer, status} records — and append one entry every time the retrieve node runs. After a few hundred real turns you have your own p50/p90/p99 per node, which is the only distribution that matters for your timeouts. The benchmark numbers above are a starting hypothesis, not your production reality.
Tune timeouts against observed p50/p90
With instrumented data, set the retrieve-node timeout just above your observed p90 and watch two counters: timeout rate (should be near your p90 miss rate, ~10%) and loop-iteration count per turn. If timeouts spike, your budget is too tight and you are killing high-recall fallbacks; if loop count spikes, your recall is suffering and the timeout is too tight for the wrong reason. The two counters together tell you which knob to turn.
Where to stop: the tail you accept for the recall you gain
There is a principled stopping point. Plot total per-turn latency against your timeout setting. As you raise the timeout you pay more tail latency per call but trigger fewer re-plan loops; as you lower it you cap per-call latency but loop more. The minimum of that curve is your answer — and for recall-sensitive RAG it usually sits at a higher timeout than intuition suggests, because the chrome-stealth fallback's recovered content is worth more than the seconds it costs. Compare your numbers against the public benchmarks before you generalize.
Limitations that affect latency budgeting
Stateless requests; manage state in the graph
fastCRW is stateless per request — there is no server-side session that carries cookies, auth, or partial progress between calls. For latency budgeting this is a feature (every call is independent and cacheable in your graph state) and a constraint (you cannot offload loop state to the engine). Keep all retrieval memory — cache, attempted URLs, renderer choices — in LangGraph state.
No /v1/agent endpoint to offload the loop
fastCRW has no /v1/agent (Spark-style) autonomous endpoint and no /v1/deep-research, so you cannot hand the whole retrieve-decide-retrieve loop to the engine and wait for one answer. The loop lives in your graph, which is exactly why per-node latency budgeting is your job and the subject of this page. If you specifically want a managed autonomous research loop, that is a genuine gap — Firecrawl's cloud-only agentic endpoints cover it and fastCRW does not.
Where Firecrawl genuinely wins here
An honest latency page has to concede this: on the tail, Firecrawl wins. Its p90 of 6937 ms is less than half of fastCRW's 14157 ms, so if your graph is latency-capped and you can tolerate lower recall, Firecrawl's distribution is tighter on the slow path. And for the autonomous loop itself, Firecrawl's cloud-only agentic and deep-research endpoints have no fastCRW equivalent. Pick fastCRW when first-pass recall (fewer loop iterations) and the median win matter more to your per-turn latency than a tighter tail; pick Firecrawl when a bounded p90 is the hard constraint.
Sources
- Scrape benchmark of record —
bench/server-runs/RESULT_3WAY_1000_FULL.md(diagnose_3way.py, Firecrawl public 1,000-URL dataset, 819 labeled, 3,000 requests, 2026-05-08). - Search benchmark —
benchmarks/triple-bench.ts(100 queries, single point-in-time measurement). - fastCRW repo: github.com/us/crw · public benchmark write-up at /benchmarks.
Related: Build a LangGraph web scraping agent · Scraping latency explained · RAG pipeline with fastCRW
