The problem we kept hitting
This is a build-in-public write-up of fastCRW — what we set out to fix, the choices that worked, the choices we had to revisit, and the parts that are still in flight. If you are evaluating a self-hosted scraping engine for AI agents, this is the document we wish had existed when we started.
The recurring problem, in one sentence: AI agents need scrape + search + crawl as a real-time primitive, but the production-grade options force you to choose between a heavyweight Python stack with a Node-shaped resident footprint, or a Python framework that requires a full headless browser per worker.
- Firecrawl's open-source stack is multi-container — Node API, Python workers, Redis, Playwright. Structural footprint of the full stack is ~2-3 GB total across 5 containers (source: the README §"Structural footprint", labeled as a structural fact, not a benchmark claim). Lovely engineering, but it does not fit a small VPS or a developer's laptop running three other services.
- Crawl4AI is excellent inside a Python notebook, but it requires Playwright + Chromium on every install. That is a ~2 GB Docker image and 300 MB+ idle RAM. Great for research, hard to deploy as a sidecar for an agent.
- Browser-automation libraries (Playwright, Puppeteer, Selenium) are general-purpose scrapers. They are not designed around an LLM consumer, do not output clean markdown, do not speak MCP, and carry a Chromium baseline (~200-300 MB per worker).
What we actually needed for AI agent traffic was different in shape. An agent makes many small scrape calls, sometimes 50-200 in a single tool-use chain. It wants clean markdown, not raw HTML. It wants predictable latency. It wants to call the engine from inside a $5 VPS, a CI job, or a developer's laptop. And it wants the same API surface across all three.
So we wrote one.
Why Rust
The first real decision was the language. We considered Go (the Colly stack is mature; static binaries are nice), and we considered staying in Python (the ecosystem is unbeatable, but the resident footprint is not). We picked Rust for four concrete reasons.
1. One async pipeline, not multiprocessing
Python's CPython has a GIL, so a high-concurrency scraper in Python ends up as multiprocessing + headless Chrome + Redis queue + worker pool. Each piece is fine; the pile is the problem. Rust's Tokio runtime gives you genuine parallelism inside one process. A 100-URL crawl is N async tasks on one event loop, not N child processes each holding a browser open.
2. Predictable memory, no GC pauses
Rust's ownership model frees memory deterministically when a value goes out of scope. For a long-running scraper that processes thousands of pages, this matters: there is no slow heap bloat, no occasional GC pause, no JVM-style warmup. The resident set tracks the actual work in flight, not the historical high-water mark.
3. Single static binary
The output of cargo build --release is one file. No interpreter, no virtualenv, no npm install, no bundled Chromium by default. The Docker image we ship is around 8 MB (structural fact from the README §"Structural footprint" table — single ~8 MB binary, 1 container, plus an optional sidecar). That is roughly two orders of magnitude smaller than a Firecrawl deployment.
4. The async ecosystem is finally mature
Five years ago, async Rust was painful. In 2026 the stack we use — tokio + axum + reqwest + scraper + lol-html + serde_json — is genuinely production-grade. Axum's extractor system is clean, reqwest's connection pooling Just Works, and lol-html is one of the fastest HTML parsers in any language. The ergonomics are no longer an excuse to stay in Python.
To make this concrete, here is the high-level shape difference between a Python multiprocessing scraper and the Rust pipeline we landed on:
// Rust (fastCRW) — single async runtime, N tasks
let mut tasks = JoinSet::new();
for url in urls {
tasks.spawn(scrape_one(url, client.clone()));
}
while let Some(res) = tasks.join_next().await {
handle(res?);
}
// One process. One event loop. Memory tracks in-flight work.
# Python (Crawl4AI / Firecrawl-style) — process pool + browsers
with multiprocessing.Pool(processes=8) as pool:
# Each worker spawns a headless browser:
# ~200-300 MB resident per worker
# 8 workers = ~1.6-2.4 GB before any work starts
results = pool.map(scrape_with_playwright, urls)
The Python pattern works — it is what most production scrapers run on today — but the baseline cost is high enough that "run a scraper on a small VPS" is not really a supported deployment. With the Rust pipeline, it is.
The ~50 MB idle footprint
One of the metrics we tracked from day one was idle RAM. Not because idle RAM is a benchmark — it is not — but because it is a proxy for how cheaply you can deploy the thing. If idle is sub-100 MB, a $5 VPS is viable. If idle is 500 MB+, you are paying for a $20 box just to host the runtime.
fastCRW's idle footprint is around 50 MB RAM on a $5 VPS (structural fact, OSS README §"Structural footprint" — we phrase it as a structural fact, not a benchmark, because actual resident size will vary by kernel and libc). Concretely, that 50 MB includes:
- The Tokio runtime and its worker threads (one per core by default).
- The Axum router and registered routes for
/v1/scrape,/v1/crawl,/v1/crawl/:id,/v1/map,/v1/search,/mcp, and/health. - A reqwest client with idle connections (we cap idle connections per host to keep this bounded).
- The lol-html parser code, but no parsed DOM at rest — lol-html only holds memory while a page is in flight.
- A small SearXNG sidecar slot for search, optional.
What it does not include at idle, and why that matters:
- No pre-loaded headless browser. Chromium would be ~200-300 MB on its own. LightPanda (when used) is launched on demand and torn down after — it is not a long-lived process at idle.
- No interpreter. There is no Python runtime, no V8 heap, no JVM. The binary is the runtime.
- No queue broker. Crawl jobs use an in-memory task graph. Redis is not required for the default deployment.
The trade-off is honest: in-memory crawl state means a restart abandons in-progress crawl jobs. That is fine for the agent-traffic shape we optimised for (small, short crawls). For long, durable crawls, the right pattern is to add a queue layer externally — we do not bake that in, because most callers do not need it.
Under load, the resident set grows with active request state — connection buffers, parse state, crawl queues — but it grows proportionally to actual in-flight work, not baseline overhead.
LightPanda over Chromium for the fast path
The hardest call we had to make was the browser story. There are three honest options for "this page needs JavaScript to render":
- No browser at all. Refuse to render JS. Fine for HTML-primary pages, useless for SPAs.
- Full Chromium (via CDP). The Playwright/Puppeteer approach. Most reliable, heaviest.
- A lighter headless browser like LightPanda — designed for headless automation with a smaller footprint than Chromium.
We landed on a layered model. The default renderer is auto, which auto-selects with a chrome -> lightpanda -> http fallback. The fast path is HTTP-only with lol-html parsing. When that is not enough — JS-rendered content, hydration-dependent markup — we escalate to LightPanda. When LightPanda is not enough — heavy SPA, aggressive bot detection — we escalate to full Chrome (opt-in).
The escalation is driven by content heuristics (is the rendered HTML suspiciously empty? did we see a bot-detection challenge?) and by the explicit renderer body field on /v1/scrape. You can pin a renderer:
curl http://localhost:3000/v1/scrape \
-H "Content-Type: application/json" \
-d '{
"url": "https://app.example.com/dashboard",
"renderer": "lightpanda",
"formats": ["markdown"]
}'
The credit math reflects the cost. Source: the canonical credit-cost table in the README and on our pricing page.
scrapewithhttporlightpandarenderer: 1 credit.scrapewithchromerenderer: 2 credits.crawl: 1 credit per page (2 per page if chrome-rendered).
Why not Chromium-only? Because the agent traffic shape we see is dominated by docs sites, blogs, news, and product pages — most of which render fine without a full SPA browser. Forcing Chromium on every request would multiply the resident footprint by an order of magnitude for content that does not need it. The default Docker Compose ships LightPanda; the chrome variant is opt-in and roughly 500 MB image + 1 GB resident (structural facts from the README, not benchmarks).
Firecrawl-compatible by design
We made one large concession to migration ergonomics: the URL shapes match Firecrawl. This was deliberate. Most teams that try fastCRW are already running on Firecrawl or evaluating it. If migration is "change the base URL", they will try us. If migration is "rewrite every call", they will not.
So the endpoint surface deliberately tracks Firecrawl's:
POST /v1/scrape— scrape one URL, optionalformatsarray, optionaljsonSchemafor LLM extraction.POST /v1/crawl— start an async BFS crawl, returns a job id. AcceptsmaxDepth(cap 10) andmaxPages(cap 1000);limitandmax_pagesare serde aliases.GET /v1/crawl/:id— crawl status and accumulated results.DELETE /v1/crawl/:id— cancel a crawl job.POST /v1/map— discover all URLs on a site.POST /v1/search— web search via the SearXNG sidecar, optional per-result scraping.POST /mcp— streamable HTTP MCP transport for agents that prefer it over stdio.GET /health— health check, no auth.
Honest gaps (we are not Firecrawl, and we don't pretend to be):
/v1/extractexists only on the managed cloud — a 5-credit convenience wrapper over/v1/scrapewithformats: ["json"]. Self-hosters use/v1/scrapewith ajsonSchemadirectly. It is single-URL only.- There is no
/v1/batch/scrape. For many URLs, iterate/v1/scrapeconcurrently or use/v1/crawl. - There is no
/v1/agent(Spark models), no/v1/deep-research, and no Fire-engine anti-bot. - Screenshot output is not supported. A request for
formats: ["screenshot"]returns HTTP 422. - Response field names and error envelopes have minor divergence from Firecrawl. The shapes are close; they are not byte-identical.
- LLM extraction supports OpenAI and Anthropic providers only.
The canonical endpoint reference lives at /api-reference — that is the source of truth for the surface, not this blog post.
First-class MCP
The single design decision that took us from "another scraping API" to "a primitive an agent can call natively" was building MCP into the binary, not bolting it on as a separate service.
fastCRW's fastCRW MCP reuses the scraping engine directly. The npm package crw-mcp@0.6.0 (dist-tag latest) ships a stdio-transport server you launch with one command:
npx crw-mcp
Drop the resulting config into Claude Desktop, Cursor, or any MCP-compatible agent, and the agent gets scrape, crawl, crawl_status, map, and search as native tools. No HTTP wrapping, no JSON envelope translation, no glue code.
The HTTP API also exposes MCP at POST /mcp via the streamable HTTP transport, for agents that prefer HTTP transport over stdio.
Why this matters: an agent's tool-use loop is sensitive to tool-call latency. Going through an extra HTTP hop, then a JSON-RPC bridge, then the scraper, adds tens of milliseconds per call. Multiply by 50-200 calls in a research chain and it shows up. Native MCP keeps the path short.
The 63.74% truth-recall benchmark
The single hardest part of writing a scraper is being honest about how good it is. The temptation is to publish a one-line speed multiplier and a flattering average latency. We do not.
Our headline accuracy number is 63.74% truth-recall on Firecrawl's own public 1,000-URL scrape-content dataset — 522 of 819 labeled URLs recovered (diagnose_3way.py, 2026-05-08). That is +3.79 percentage points over Crawl4AI's 59.95% (491 of 819) and +7.70 percentage points over Firecrawl's 56.04% (459 of 819) on the same labeled set. Scrape-success across the full 1,000 URLs is 87.7% (877). Zero thrown errors over 3,000 total requests across the three engines.
The latency split — full p50 / p90 / p99 — is published because a single average would lie:
| Metric | fastCRW | Crawl4AI | Firecrawl |
|---|---|---|---|
| Truth-recall (of 819 labeled) | 63.74% (522) | 59.95% (491) | 56.04% (459) |
| Scrape-success (of 1,000) | 87.7% (877) | 83.5% (835) | 89.7% (897) |
| p50 latency | 1914 ms | 1916 ms | 2305 ms |
| p90 latency | 14157 ms | 4754 ms | 6937 ms |
| p99 latency | 15012 ms | 13749 ms | 21107 ms |
Read the table honestly:
- We lead on accuracy. fastCRW has the highest truth-recall of the three tools tested.
- We carry the median speed win. p50 1914 ms beats Firecrawl's 2305 ms, and is effectively tied with Crawl4AI (2 ms apart).
- We lose the tail. fastCRW's p90 of 14157 ms is the worst of the three. This is causal, not incidental: the chrome-stealth fallback that recovers the URLs the others miss is the same mechanism that produces the slow tail. The accuracy lead and the slow tail are the same trade.
That is the entire story. There is no "average latency" headline. There is no speed multiplier. The full benchmark methodology, dataset, and a one-command reproducible script live at /benchmarks. The harness is diagnose_3way.py — we publish the script and the raw run data so anyone can re-run it against their own URLs.
If your traffic shape is "median matters, tail does not" — most agent traffic is exactly this — fastCRW is the right pick. If your traffic shape is "tail matters, accuracy is secondary" — for example, a real-time UI that must respond in 5 seconds or fail — Crawl4AI's 4754 ms p90 may be the better fit. We will not tell you otherwise.
What we got wrong and fixed
The build-in-public part. Three concrete things we shipped, found broken under real load, and fixed.
1. DB pool sizing under burst
The first managed-cloud burst we took (a customer migrating a Firecrawl workload) saturated our database connection pool inside the first minute. Symptom: requests queued, latency p99 went through the roof, p50 stayed fine. The pool was sized for steady-state, not for the way agent traffic actually arrives — long quiet stretches punctuated by 200-call research chains.
Fix: pool sizing was retuned and made config-driven so we can adjust per deployment without a rebuild. We also moved auth lookups to a separate pool from credit accounting so auth checks cannot starve credit writes.
2. SearXNG relevance is work in progress
The /v1/search endpoint is backed by a SearXNG sidecar. SearXNG is excellent infrastructure — federated, no API keys, no rate limits we did not set ourselves — but its default ranking is noisier than commercial search APIs. Our search benchmark (separate from the scrape benchmark above) shows good latency wins (880 ms average, 73 of 100 latency wins versus Firecrawl and Tavily, source: benchmarks/triple-bench.ts), but relevance for long-tail queries is a known weak spot. We are iterating on a re-ranking layer and per-engine weighting. If you hit a query where SearXNG's defaults rank a marketing landing page above the canonical doc, that is the thing we are fixing.
3. Auth backend errors that masqueraded as 401
Early on, when our auth backend was unhealthy, callers got HTTP 401 — which is the wrong code, because the credentials were fine, the lookup was broken. Agents would interpret 401 as "your key is invalid", regenerate the key, and the cycle repeated. We now return HTTP 503 with an explicit envelope when an auth backend dependency is down. 401 is reserved for the case where the credential itself is genuinely invalid.
What's next
The public roadmap is the OSS repo at github.com/us/crw — issues labeled roadmap are the canonical source. Things we have publicly committed to and are actively building:
- Screenshot output. Not currently supported (HTTP 422). On the roadmap.
- PDF / DOCX parsing. Not currently supported. On the roadmap.
- Multi-URL
/v1/extract. Currently single-URL on the managed cloud only; a batched form is on the roadmap. - Optional Redis-backed crawl state. So a restart does not abandon in-progress jobs. Opt-in, not default.
- Re-ranking layer for
/v1/search. Improve long-tail relevance on top of SearXNG. - More LLM providers for extraction. Currently OpenAI and Anthropic; the BYOK story for
/v1/searchanswer mode (added in v0.7.0) already covers DeepSeek, Azure, and OpenAI-compatible endpoints — we are bringing that surface to/v1/scrapeextraction next.
If you want to try it
Self-host, free (AGPL-3.0)
docker run -p 3000:3000 ghcr.io/us/crw:latest
curl http://localhost:3000/v1/scrape \
-H "Content-Type: application/json" \
-d '{"url": "https://example.com", "formats": ["markdown"]}'
Single binary, no Redis, no Playwright, no Python environment. AGPL-3.0 for the engine — calling the API from a closed-source product is fine; modifying and redistributing the engine triggers source-sharing obligations. A commercial license is available if AGPL-3.0 is a concern.
Hosted via fastCRW
If you do not want to manage servers, fastcrw.com runs the same engine. Free tier ships 500 one-time lifetime credits (not a monthly meter). Paid tiers and current pricing are on fastcrw.com/pricing — single source of truth.
For agents
npx crw-mcp
Drop the config into Claude Desktop, Cursor, or any MCP-compatible agent and you get scrape, crawl, crawl_status, map, and search as native tools.
Further reading
- Firecrawl vs Crawl4AI vs fastCRW: The Honest Benchmark (2026) — full 3-way numbers and methodology.
- Inside CRW: Architecture of a Lightweight Rust Scraping API — Axum, lol-html, LightPanda integration deep dive.
- Rust vs Python Scrapers: An Architecture and Footprint Deep-Dive — sister piece on the systems-level trade-offs.
- $5 VPS Web Scraping: Run CRW Where Firecrawl Can't — the deployment recipe for the small-VPS case.
- Where CRW Still Falls Short — and What We're Improving — the honest gap list, kept current.
- /benchmarks — reproducible
diagnose_3way.pyscript and full latency distribution. - /api-reference — canonical endpoint reference.