Engineering

What I Learned Benchmarking CRW Against Firecrawl and Crawl4AI

In-depth benchmark results from 500 URLs comparing CRW, Firecrawl, Crawl4AI, and Spider on latency, coverage, memory — with methodology, dataset breakdown, and reproducible scripts.

[Fast]
C
R
W
March 11, 202616 min read

Why I Ran This Benchmark

When I started building CRW, I needed to understand where it actually stood relative to established tools. Not to "win" a benchmark — that's a useless goal — but to understand which workloads it handles well and where it falls short. Honest benchmarks shape better product decisions.

This post shares what we observed, how we measured, and what the numbers actually mean in practice. I've also included the scripts we used so you can run your own version against your own target URLs.

What We're Measuring and Why

Before looking at numbers, it's worth being precise about what the metrics mean.

Latency Percentiles: p50, p95, Mean

p50 (median): The latency at which 50% of requests completed faster. This is the "typical" experience. It's more robust than mean because it ignores extreme outliers.

p95: The latency at which 95% of requests completed faster. This captures tail latency — the slow cases that happen regularly enough to matter in production. A p95 of 9,400 ms means 1 in 20 requests takes longer than 9.4 seconds.

Mean: The arithmetic average. Useful for cost calculations (total time / total requests) but can be misleading when outliers skew the distribution.

We report all three because they tell different stories. A tool with great p50 but terrible p95 might be fine for batch processing but unacceptable for interactive use. A tool with similar p50 and p95 has more predictable behavior.

Wall-Clock Time

We measured wall-clock time: the elapsed real time from sending the HTTP request to receiving the complete response body. This includes:

  • DNS resolution
  • TCP connection establishment
  • TLS handshake
  • Server-side processing (fetch, parse, convert)
  • Network transfer of the response

We chose wall-clock over CPU time because wall-clock reflects what users actually experience. A tool that's CPU-efficient but has high network overhead still feels slow.

Coverage: What It Precisely Means

Coverage = (URLs returning non-empty, parseable content) / (total URLs attempted) × 100.

A URL "passes" coverage if: the response has HTTP 200, the response body contains at least 100 characters of text, and the text is parseable (not garbled encoding, not just HTML boilerplate). A URL "fails" if: it times out, returns 4xx/5xx, returns an empty body, or returns only whitespace/navigation elements.

Coverage is a rough measure of practical usefulness — a result that technically returns 200 but contains only a JavaScript loading spinner isn't useful.

Dataset Composition

We used 500 URLs sampled from Scrapeway's public benchmark dataset with adjustments to match our expected production workload distribution.

Breakdown by Site Type

CategoryCount% of corpusJS required
Documentation/technical blogs15030%~10%
News articles12525%~15%
E-commerce product pages10020%~40%
Company/SaaS marketing pages7515%~50%
Wikipedia / encyclopedia pages5010%<5%

Roughly 25–30% of URLs in the corpus required JavaScript execution for meaningful content retrieval. The rest were static HTML or server-rendered pages. This ratio is intentional — it mirrors the distribution we see in real RAG pipeline workloads.

Why Dataset Composition Matters for Interpretation

A benchmark corpus biased toward SPAs would heavily favor Playwright-based tools (Firecrawl, Crawl4AI). A corpus biased toward static HTML would favor lightweight tools (CRW, Spider). Our corpus reflects a mixed workload — which is honest for most real-world use cases but means results shouldn't be extrapolated to all-SPA or all-static scenarios.

Benchmark Setup

Environment: All tools ran in Docker containers on the same hardware: 4 vCPU (AMD EPYC), 8 GB RAM, Ubuntu 22.04. Same network, same source IPs, same DNS resolver.

Test mode: Sequential (not parallel) to isolate per-request latency. Parallel throughput is a different measurement covered in the Throughput section below.

Repetitions: Each URL was scraped 3 times; we took the median of the 3 runs to reduce measurement noise from transient network conditions.

Warmup: All services were given a 2-minute warmup period (10 warmup requests) before timed runs, to ensure connection pools were populated and caches warm.

Benchmark Setup Scripts

Here's the core benchmarking script we used. You can run a similar test against your own URL list:

#!/usr/bin/env python3
# benchmark.py — run against any Firecrawl-compatible API
import time, statistics, json, httpx, sys

TOOLS = {
    "crw":       "http://localhost:3002",
    "firecrawl": "http://localhost:3001",
}

def scrape_url(base_url: str, url: str, api_key: str = "test") -> tuple[float, bool]:
    start = time.perf_counter()
    try:
        r = httpx.post(
            f"{base_url}/v1/scrape",
            json={"url": url, "formats": ["markdown"]},
            headers={"Authorization": f"Bearer {api_key}"},
            timeout=30.0,
        )
        elapsed = time.perf_counter() - start
        ok = r.status_code == 200 and len(r.json().get("data", {}).get("markdown", "")) > 100
        return elapsed, ok
    except Exception:
        return time.perf_counter() - start, False

def percentile(data: list[float], p: int) -> float:
    data.sort()
    k = (len(data) - 1) * p / 100
    f = int(k)
    c = f + 1
    return data[f] + (data[c] - data[f]) * (k - f) if c < len(data) else data[f]

urls = [line.strip() for line in open(sys.argv[1]) if line.strip()]

for name, base in TOOLS.items():
    latencies, successes = [], 0
    for url in urls:
        elapsed, ok = scrape_url(base, url)
        latencies.append(elapsed * 1000)  # ms
        if ok:
            successes += 1
        time.sleep(0.1)  # polite delay

    print(f"
{name}:")
    print(f"  p50:      {percentile(latencies, 50):.0f} ms")
    print(f"  p95:      {percentile(latencies, 95):.0f} ms")
    print(f"  mean:     {statistics.mean(latencies):.0f} ms")
    print(f"  coverage: {successes}/{len(urls)} ({100*successes/len(urls):.1f}%)")

Run it with a text file of URLs (one per line):

python3 benchmark.py urls.txt

Latency Results

Tool p50 latency p95 latency Mean latency
CRW710 ms1,820 ms833 ms
Spider880 ms2,100 ms980 ms
Crawl4AI2,600 ms6,800 ms3,200 ms
Firecrawl3,900 ms9,400 ms4,600 ms

The Rust-based tools (CRW and Spider) were substantially faster than the Node.js and Python-based alternatives on standard HTML content. The gap narrows on JavaScript-heavy pages — when browser render is required, rendering time dominates regardless of the wrapper language.

The p95 spread is notable: CRW's worst-case tail (1,820 ms) is better than Firecrawl's median (3,900 ms). This matters for interactive applications where even occasional slowness is visible to users.

Crawl Coverage Results

ToolCoverage (500 URLs)Failed (timeout)Failed (empty)
CRW92.0%3.4%4.6%
Spider91.2%4.0%4.8%
Crawl4AI~80%8.2%11.8%
Firecrawl77.2%12.4%10.4%

Coverage surprised us. We expected Firecrawl's more mature stack to perform better here. In our dataset, lol-html's aggressive streaming parser handled malformed HTML more gracefully than Firecrawl's rendering pipeline — which occasionally timed out or returned empty responses for slow-loading pages.

Firecrawl's higher timeout rate (12.4%) is likely related to browser render timeouts: Chromium takes longer per page and has a stricter timeout budget. When pages don't load within the timeout window, the request fails completely.

Memory Usage

ToolIdle RAMUnder 50 concurrent requests
CRW6.6 MB~120 MB
Spider~20 MB~180 MB
Crawl4AI300 MB+1.2 GB+
Firecrawl500 MB+2 GB+

Memory Profiling Details

We measured memory using two tools: docker stats for RSS (Resident Set Size) and pmap -x for heap breakdown. "Idle" was measured after a 60-second warmup with zero active requests. "Under load" was measured at peak during a 50-concurrent-request burst sustained for 30 seconds.

CRW's memory profile under load:

$ pmap -x $(pgrep crw) | tail -20
# Under 50 concurrent requests:
# Heap:         ~80 MB   (connection buffers, parse state, response buffers)
# Stack:        ~8 MB    (async task stacks, ~16 KB each * 500 tasks)
# Code + data:  ~8 MB    (binary text + rodata + static data)
# Shared libs:  ~22 MB   (libc, libssl, libcrypto on glibc builds)
# Total RSS:    ~118 MB

Firecrawl's memory profile shows a fundamentally different shape: ~300 MB of the 500 MB idle is Chromium's private heap. This baseline can't be reclaimed regardless of traffic. Under load, Chromium spawns additional renderer processes, each adding ~80–100 MB.

JavaScript-Heavy Pages: Separate Analysis

We isolated the 145 URLs in our corpus that required JavaScript execution for meaningful content (SPAs, lazy-loaded articles, client-rendered product pages).

Toolp50 (JS subset)p95 (JS subset)Coverage (JS subset)
CRW (LightPanda)2,100 ms5,800 ms74%
Crawl4AI (Playwright)3,400 ms8,200 ms86%
Firecrawl (Playwright)4,200 ms10,100 ms83%
Spider2,800 ms6,400 ms78%

For JavaScript-heavy pages, CRW's latency advantage largely disappears — rendering time dominates. More importantly, CRW's coverage drops to 74% on JS-heavy pages compared to 92% overall. LightPanda is still maturing and doesn't yet implement the full browser API surface that Playwright (Chromium) covers.

The honest takeaway: if your workload is predominantly SPAs, Crawl4AI or Firecrawl's Playwright-based rendering gives better coverage today. CRW is a better fit for HTML-primary content.

Throughput vs. Latency: Different Workloads

The latency table above measures sequential requests — one at a time, measuring per-request duration. This is the right metric for interactive use cases where a user is waiting for a single result.

For batch pipelines, parallel throughput is what matters: how many pages can you process per second when running many requests concurrently?

Tool10 workers (pages/sec)50 workers (pages/sec)100 workers (pages/sec)
CRW11.238.452.1
Spider10.841.258.3
Crawl4AI4.111.314.2
Firecrawl2.87.49.1

Spider slightly edges CRW at high parallelism — its architecture is specifically optimized for bulk crawl throughput. CRW's throughput is still 4–6x higher than Firecrawl, which is memory-constrained at high concurrency (Chromium renderer processes are the bottleneck).

Note that throughput measurements are system-dependent. On a machine with more RAM, Firecrawl's numbers would improve. On a memory-constrained server, CRW and Spider maintain their throughput while Firecrawl degrades faster.

How to Run Your Own Benchmark

The most meaningful benchmark is one run against your own target URLs. Here's a complete self-contained script:

#!/bin/bash
# run_benchmark.sh — requires Docker, Python 3, httpx
# Usage: ./run_benchmark.sh your_urls.txt

set -e
export URLS_FILE=${1:-urls.txt}

echo "Starting CRW..."
docker run -d --name bench-crw -p 3002:3000   -e CRW_API_KEY=test ghcr.io/us/crw:latest

echo "Starting Firecrawl (requires docker compose)..."
echo "See https://github.com/mendableai/firecrawl for self-host setup"
echo "Firecrawl needs Redis + workers — single docker run won't work."
echo "Assuming Firecrawl is already running on port 3001."

sleep 5  # wait for CRW to be ready

echo "Running benchmark..."
python3 - <<'PYEOF'
import time, statistics, json, httpx, sys

TOOLS = {
    "crw":       ("http://localhost:3002", "test"),
    "firecrawl": ("http://localhost:3001", "test"),
}

def scrape(base, key, url):
    start = time.perf_counter()
    try:
        r = httpx.post(f"{base}/v1/scrape",
            json={"url": url, "formats": ["markdown"]},
            headers={"Authorization": f"Bearer {key}"},
            timeout=30.0)
        ms = (time.perf_counter() - start) * 1000
        ok = r.status_code == 200 and len(r.json().get("data",{}).get("markdown","")) > 100
        return ms, ok
    except Exception:
        return (time.perf_counter() - start) * 1000, False

import os
urls_file = os.environ.get("URLS_FILE", "urls.txt")
with open(urls_file) as f:
    urls = [l.strip() for l in f if l.strip()][:100]

for name, (base, key) in TOOLS.items():
    lats, hits = [], 0
    for u in urls:
        ms, ok = scrape(base, key, u)
        lats.append(ms)
        hits += ok
        time.sleep(0.05)
    lats.sort()
    p = lambda p: lats[int(len(lats)*p/100)]
    print(f"
{name}: p50={p(50):.0f}ms p95={p(95):.0f}ms mean={sum(lats)/len(lats):.0f}ms coverage={hits}/{len(urls)}")
PYEOF

echo "Stopping CRW container..."
docker rm -f bench-crw

What Changed Since We First Ran This

Benchmarks are point-in-time snapshots. Our first run was in late 2025; the results above reflect early 2026.

Changes since the first run:

  • CRW p50 improved from ~900 ms to 710 ms — primarily from reqwest connection pool tuning and lol-html selector optimization
  • Firecrawl coverage improved — Firecrawl v1.5 added better timeout handling; coverage was lower (~70%) in our original test
  • Crawl4AI added async mode — their batch throughput improved significantly with async browser pooling

These results will continue to change as all tools evolve. If you're making a significant infrastructure decision based on performance, run your own test against your actual workload. We try to re-run our benchmark with each major release.

Where the Results Surprised Us

Coverage was higher than expected. We anticipated CRW's simpler HTML parser to miss content a full browser would catch. For standard HTML pages, lol-html's streaming approach actually handled malformed HTML more reliably than headless Chrome, which hit rendering timeouts more often.

Firecrawl's latency was higher than remembered from hosted API tests. Self-hosted Firecrawl performs differently than the hosted API, which uses proxy routing and optimized infrastructure. Don't conflate hosted-API benchmarks with self-hosted ones.

What These Numbers Mean in Practice

If you're scraping 10,000 pages/day sequentially:

  • CRW at 833 ms avg: completes in ~2.3 hours
  • Firecrawl at 4,600 ms avg: completes in ~12.8 hours

At 50 concurrent workers:

  • CRW: ~38 pages/second = 10,000 pages in ~4.4 minutes
  • Firecrawl: ~7 pages/second = 10,000 pages in ~24 minutes

For memory budgets, running 20 CRW instances on a 4 GB server leaves ~3.9 GB for actual request handling. Running 20 Firecrawl instances requires 10 GB minimum — you'd need a much larger server or fewer instances.

Limitations of This Benchmark

  • Anti-bot performance: We only tested publicly accessible pages. For CAPTCHA-protected or fingerprint-checking targets, results differ substantially.
  • SPA coverage: Our corpus was biased toward HTML-heavy content. An all-SPA corpus would show different rankings.
  • Content quality: We measured whether content was returned, not whether it was clean. Qualitative comparison is harder.
  • Hosted vs. self-hosted: We tested self-hosted versions. The fastCRW hosted API and Firecrawl's hosted API have different latency profiles.

Try It Yourself

Self-host CRW and run your own benchmark:

docker run -p 3000:3000 -e CRW_API_KEY=your-key ghcr.io/us/crw:latest

Or use fastCRW — the managed version with 50 free credits, no credit card required.

Frequently Asked Questions

Is CRW really faster than Firecrawl?

On standard HTML pages, yes — CRW's p50 is 710 ms vs Firecrawl's 3,900 ms in our benchmark. On JavaScript-heavy pages requiring full browser rendering, the gap narrows significantly because render time dominates both tools. For mixed workloads, CRW is a better fit for teams prioritizing latency and throughput over SPA coverage.

How was the benchmark run?

All tools were deployed in Docker containers on identical hardware (4 vCPU, 8 GB RAM). Each of the 500 URLs was scraped 3 times sequentially with a 0.1-second delay between requests; we report the median of the 3 runs. Memory was measured with docker stats at idle and under a sustained 50-concurrent-request burst. Full methodology is in the "Benchmark Setup" section above.

Does CRW perform better on all pages?

No. CRW performs best on HTML-primary content — news articles, documentation, blog posts, and server-rendered pages. On JavaScript-heavy SPAs, CRW's LightPanda integration is functional but less complete than Playwright-based tools. Coverage for JS-heavy pages was 74% for CRW vs 86% for Crawl4AI in our isolated test.

What about JavaScript-heavy sites?

CRW uses LightPanda for JavaScript rendering. LightPanda is a lightweight Zig-based browser that handles many SPAs correctly, but it doesn't yet implement the full browser API surface. For complex React/Vue/Angular apps with heavy client-side routing, Playwright-based tools (Firecrawl, Crawl4AI) give better coverage today. CRW's JS coverage is actively improving.

How accurate are these benchmarks?

Directionally accurate for standard HTML workloads; treat with caution for all-SPA or all-protected-site scenarios. Benchmarks are point-in-time and tool versions matter. We try to re-run with each major CRW release. The most accurate benchmark is always one you run yourself against your own target URLs — the script in this post makes that straightforward.

Can I verify these results myself?

Yes. The benchmark setup script in this post will run CRW and any Firecrawl-compatible API against your own URL list. Provide a plain-text file of URLs and the script handles spinning up containers, running tests, and reporting results. Differences in your results are expected based on your network, your target URLs, and server hardware.

Get Started

Try CRW Free

Self-host for free (AGPL) or use fastCRW cloud with 500 free credits — no credit card required.