Results from the 1,000-URL Firecrawl Dataset Benchmark
A 3-way benchmark of fastCRW, Crawl4AI, and Firecrawl on Firecrawl's own public 1,000-URL scrape-content dataset — truth-recall, scrape-success, and the full p50/p90 latency split.
On Firecrawl's own public 1,000-URL dataset, fastCRW has the highest truth-recall of three tools tested — 63.74% vs Crawl4AI 59.95% and Firecrawl 56.04% — at a p50 latency of 1914 ms (Firecrawl: 2305 ms). Its p90 of 14157 ms is the worst of the three: the disclosed cost of a stealth fallback that recovers URLs the others drop. This page is the canonical reference for those numbers.
Summary
On Firecrawl's own public 1,000-URL dataset, fastCRW returned the most accurate content of the three tools tested. It recovered the labeled content on 63.74% of matchable URLs, ahead of Crawl4AI's 59.95% and Firecrawl's 56.04%. It did this at a p50 latency of 1914 ms — faster at the median than Firecrawl's 2305 ms — while its p90 of 14157 ms is openly the slowest of the three.
This page is the canonical reference for those numbers. The benchmark is a 3-way run on Firecrawl's own published dataset, scored by an open harness, with a per-URL result of record anyone can audit.
What Was Measured
The dataset is scrape-content-dataset-v1 — 1,000 URLs published by Firecrawl for evaluating scrape quality. Of those, 819 carry labeled ground-truth content and form the accuracy denominator. All three tools — fastCRW, Crawl4AI, and Firecrawl — were run against the same 1,000 URLs, in the same conditions, with 3,000 total requests scored by the same harness.
The headline metric is truth-recall: did the tool actually return the page's real content? A scrape that returns 200 OK with an anti-bot interstitial "succeeds" but recovers nothing useful — so success rate alone is misleading. Truth-recall corrects for that.
Results
Source / provenance. Every number below is verbatim from the result of record,
bench/server-runs/RESULT_3WAY_1000_FULL.md, a full 1,000-URL run dated 2026-05-08, scored by the opendiagnose_3way.pyharness. fastCRW does not measure competitors by hand here: all three tools run through the identical scoring pipeline.
| Metric | fastCRW | Crawl4AI | Firecrawl |
|---|---|---|---|
| Truth-recall (522 of 819 labeled URLs) | 63.74% | 59.95% | 56.04% |
| Scrape-success (of 1,000) | 877 (87.7%) | 835 (83.5%) | 897 (89.7%) |
| Thrown errors (of 3,000 requests) | 0 | 0 | 0 |
| p50 latency | 1914 ms | 1916 ms | 2305 ms |
| p90 latency | 14157 ms | 4754 ms | 6937 ms |
| p99 latency | 15012 ms | 13749 ms | 21107 ms |
Read the rows together, not in isolation:
- fastCRW leads on truth-recall by a clear margin — +3.79 points over Crawl4AI, +7.70 over Firecrawl. On accuracy, the metric that decides whether a see the use case gets real content, fastCRW is first.
- fastCRW wins the median (p50) and loses the tail (p90). At p50 it is essentially tied with Crawl4AI and ~17% faster than Firecrawl. At p90 it is the slowest — the next section explains why, honestly.
- Scrape-success and accuracy are different things. Firecrawl has the highest raw success rate (89.7%) but the lowest truth-recall (56.04%): it returns a response more often, but the right content less often.
- Zero thrown errors across 3,000 requests for all three tools — none of them is unreliable in the crash sense.
Why the p90 Tail Is Honest, Not a Defect
fastCRW's p90 of 14157 ms is the worst of the three, and that is disclosed here deliberately rather than hidden behind an average.
The tail is a design trade-off. When a fast fetch returns thin, blocked, or interstitial content, fastCRW does not accept the failure — it retries the URL with a full chrome-stealth browser. That retry is slow, and a few hundred hard URLs taking 14+ seconds is what pushes the 90th percentile up. But that same retry is what recovers the labeled content the other two tools drop. The p90 tail and the truth-recall lead are the same mechanism — you cannot have one without the other.
This is why the page reports the full p50/p90/p99 split instead of a single "average." An average would hide exactly the information a production team needs: most requests are fast (p50 1914 ms), a small minority of difficult pages are slow, and the slow ones come back with content instead of an error.
How to Read These Numbers
Use this benchmark as:
- a starting point for a Firecrawl-replacement evaluation,
- a source of concrete, sourced metrics to weigh against your own workload,
- evidence that fastCRW is the most accurate of the three on a neutral, Firecrawl-published dataset.
Do not use it as proof that fastCRW wins every category for every site. It does not. Crawl4AI and Firecrawl both have a tighter latency tail; product maturity in adjacent workflows differs. The honest claim is narrow and defensible: highest truth-recall of three tools, median latency ahead of Firecrawl, with the slow tail fully disclosed.
What This Benchmark Does Not Prove
- It does not measure every site on the web — it measures Firecrawl's 1,000-URL sample.
- It does not measure bundled feature surface, dashboards, or support quality.
- It does not replace testing your own target sites — a benchmark is a prior, not a guarantee.
That is why this page should be read alongside the methodology page and the fastCRW vs Firecrawl comparison.
Reproduce It Yourself
The dataset is public and the harness is open source:
git clone https://github.com/us/crw
cd crw
# Firecrawl's scrape-content-dataset-v1 + diagnose_3way.py harness
python bench/diagnose_3way.py
Every per-URL outcome is recorded in RESULT_3WAY_1000_FULL.md.
Next Steps
- See pricing — managed plans and free self-hosting for the workload shown here.
- fastCRW vs Firecrawl — full feature, pricing, and migration comparison.
- Benchmark methodology — how fastCRW frames internal vs external numbers and sources its claims.
- Search benchmark — the separate 100-query search-API comparison.
Continue exploring
More from Benchmarks
Related hubs
