Skip to main content

Proof Center

The Honest Benchmark: Firecrawl vs Crawl4AI vs fastCRW (2026)

Most scraper marketing pages cite a single average latency and call it a day. We refuse to. This page publishes the full p50/p90/p99 split, the dataset, the script, and the cases where fastCRW loses — because a benchmark that hides its tail is not a benchmark, it is an ad. Every number below cites its dataset, harness, and date.

Headline number: 63.74% truth-recall on Firecrawl's public 1,000-URL dataset (819 labeled URLs; diagnose_3way.py, 2026-05-08). +3.79 pp over Crawl4AI (59.95%), +7.70 pp over Firecrawl (56.04%).

Methodology

The dataset is Firecrawl's own publicly released scrape-content-dataset-v1 — 1,000 URLs, of which 819 carry labeled ground-truth extraction targets (the accuracy denominator). The harness is diagnose_3way.py, a single Python script that sends identical inputs to all three engines under the same network conditions and records the raw response payloads alongside per-request timing.

  • Truth-recall = fraction of the 819 labeled URLs for which the scraper's extracted text contains the ground-truth target. Source: diagnose_3way.py, run on 2026-05-08.
  • p50 / p90 / p99 latency = the 50th / 90th / 99th percentile end-to-end request duration over 3,000 requests (1,000 URLs × 3 engines). We never publish a single average: averages hide tails, and tails are where scrapers actually differ.
  • Reproducibility: clone github.com/us/crw-opencore, set the API keys for all three engines, and run uv run python bench/diagnose_3way.py. All three engines receive the same input set, in the same order, with the same concurrency cap.
MetricfastCRWCrawl4AIFirecrawl
Truth-recall (of 819 labeled)63.74%59.95%56.04%
p50 latency1914 ms1916 ms2305 ms
p90 latency14157 ms4754 ms6937 ms

Source: diagnose_3way.py (2026-05-08); see bench/server-runs/RESULT_3WAY_1000_FULL.md in github.com/us/crw-opencore. fastCRW's worst-in-class p90 is causal — the chrome-stealth fallback that recovers the URLs others miss is the same mechanism that produces the slow tail.

The True Cost of Web Scraping at Scale

Sticker price per 1,000 credits is not the cost. The cost is price ÷ truth-recall, because credits spent on responses that miss the target are wasted. If a scraper has truth-recall R, then on average 1 / R calls are needed to land one correct extraction. The arithmetic below uses only the published truth-recall numbers above and the locked competitor prices in marketing/competitor-prices.lock.md (verified 2026-05-18) plus the fastCRW prices in PLAN_DISPLAY (the single pricing source of truth — see /pricing).

Effective cost per 1,000 correct extractions (Hobby tier)

  • Firecrawl Hobby: $16 / 5,000 credits = $3.20 / 1k raw credits. Divide by 0.5604 truth-recall → effective $5.71 / 1k correct extractions (56.04% recall, diagnose_3way.py, 2026-05-08; price lock 2026-05-18).
  • fastCRW Hobby: $13 / 5,000 credits (source: PLAN_DISPLAY). Divide by 0.6374 truth-recall → effective $4.08 / 1k correct extractions (63.74% recall, diagnose_3way.py, 2026-05-08).

Standard-tier parity: Firecrawl Standard $83 / 100,000 credits vs fastCRW Standard $69 / 100,000 credits. Self-hosting the AGPL-3.0 OSS engine is $0 / 1,000 scrapes (you pay only the server).

Reproducibility

The full harness lives in the open-core repo. The script, the dataset loader, and the result post-processor are all open and versioned. No private CSVs, no "trust us" numbers.

git clone https://github.com/us/crw-opencore
cd crw-opencore
uv venv && uv pip install -r bench/requirements.txt
export FIRECRAWL_API_KEY=... CRAWL4AI_API_KEY=... FASTCRW_API_KEY=...
uv run python bench/diagnose_3way.py --dataset 1000 --output results/

Script: https://github.com/us/crw-opencore/blob/main/bench/diagnose_3way.py. Methodology docs are mirrored at docs.fastcrw.com.

Detailed benchmarks

Each detail page below drills into one dataset, one harness, and the full set of caveats for that measurement.

  • Results from the 1,000-URL Firecrawl Dataset Benchmark

    A 3-way benchmark of fastCRW, Crawl4AI, and Firecrawl on Firecrawl's own public 1,000-URL scrape-content dataset — truth-recall, scrape-success, and the full p50/p90 latency split.

    Last reviewed: 2026-05-22

  • Search Benchmark: fastCRW vs Tavily vs Firecrawl

    100-query concurrent search benchmark comparing fastCRW, Tavily, and Firecrawl on latency, win rate, and reliability across 10 query categories.

    Last reviewed: 2026-05-17

  • fastCRW Benchmark Methodology

    How fastCRW frames internal and third-party benchmark claims, including metric definitions, source provenance, and interpretation rules.

    Last reviewed: 2026-03-11

What this benchmark does not measure

A benchmark is only as honest as its disclosed omissions. The 1,000-URL run measures truth-recall and end-to-end latency on a fixed, public corpus. It deliberately does not measure:

  • Anti-bot rotation under adversarial load. The dataset URLs are mostly cooperative; aggressive WAF / Cloudflare Turnstile / PerimeterX flows are not in scope.
  • Captcha-solver economics. No engine in this run was configured with a paid captcha-solving provider.
  • JS-heavy SPAs with auth walls. Pages that need a logged-in session, OAuth bounce, or a CSRF handshake are excluded because labeling them objectively is intractable.
  • Geo-restricted content. The run is executed from a single region; multi-region latency variance is a separate measurement.
  • Long-tail format coverage. Truth-recall here judges text extraction. Screenshot, PDF, and structured-JSON extraction modes have their own evaluation harnesses (see the detail pages above).

We publish the misses so readers know exactly where to push back — and so the next run can close the gap instead of pretending it wasn't there.