Why did you build fastCRW in Rust instead of extending Firecrawl or Crawl4AI in Python?

We hit two recurring problems with the Python options. Firecrawl's open-source stack is multi-container (Node API, Python workers, Redis, Playwright) and has a heavy resident footprint that does not fit a $5 VPS or local dev. Crawl4AI requires a full headless browser (Chromium via Playwright) on every install, which carries a 200-300 MB browser process per worker and a ~2 GB Docker image. We wanted one async pipeline, one static binary, predictable memory, and a fast HTTP-only path for the 70-80% of pages that do not need JavaScript. Rust + Tokio + Axum + reqwest + lol-html gave us exactly that — a single binary with no runtime, no GC, and an idle footprint around 50 MB RAM (structural fact from the OSS README, not a benchmark claim).

How does fastCRW stay around 50 MB RAM idle when other scrapers need 200 MB or more?

Three structural decisions. (1) No headless browser is pre-loaded — the default Docker Compose ships LightPanda but only starts it on demand, and Chrome is opt-in (structural fact, OSS README). (2) The HTML path uses lol-html, Cloudflare's streaming Rust rewriter, which never builds a full DOM tree — memory is proportional to the largest element, not the whole page. (3) The Rust ownership model frees memory deterministically — there is no GC heap bloat. The Docker image is a single ~8 MB binary, versus Firecrawl's ~2-3 GB total across 5 containers (structural footprint section of the README).

When does LightPanda actually beat Chromium, and when should I still use Chrome?

LightPanda is the right pick when you need lightweight JavaScript rendering for pages that mostly hydrate from server-rendered HTML — docs sites, blogs, marketing pages, news, most e-commerce product pages. It starts fast and stays small. Chrome (the opt-in renderer, billed at 1 credit per scrape on the managed cloud — the same flat rate as any other renderer) is the right pick for heavy SPAs with complex client-side routing, sites behind aggressive bot detection, or anything that needs the full stealth fallback. Our default renderer is `auto`, which selects `chrome -> lightpanda -> http` with fallback, so most callers never have to choose. You can force a renderer with the `renderer: "lightpanda"` body field on `/v1/scrape`.

Is the fastCRW REST API actually drop-in compatible with Firecrawl?

By design, yes — for the endpoint shapes that matter for an agent migration. We copied the URL shapes: `/v1/scrape`, `/v1/crawl`, `/v1/crawl/:id` (status), `/v1/map`, `/v1/search`. Most migrations are a one-line base-URL swap. `/v1/extract` exists on the managed cloud (self-hosters use `/v1/scrape` with `formats: ["json"]` plus a `jsonSchema`) and accepts up to 50 URLs per request; a research endpoint (`/v2/search/research/papers`) is also available. There is no `/v1/agent`. Screenshot output is supported on the v2 scrape API (a request for `formats: ["screenshot"]` returns `data.screenshot` as a base64 PNG data URL); response field names and error envelopes have minor divergence. See /api-reference for the canonical endpoint list.

How We Built fastCRW: Rust, 50MB RAM, and the Path to Real-Time Web Scraping for AI Agents (2026)

The problem we kept hitting

This is a build-in-public write-up of fastCRW — what we set out to fix, the choices that worked, the choices we had to revisit, and the parts that are still in flight. If you are evaluating a self-hosted scraping engine for AI agents, this is the document we wish had existed when we started.

The recurring problem, in one sentence: AI agents need scrape + search + crawl as a real-time primitive, but the production-grade options force you to choose between a heavyweight Python stack with a Node-shaped resident footprint, or a Python framework that requires a full headless browser per worker.

Firecrawl's open-source stack is multi-container — Node API, Python workers, Redis, Playwright. Structural footprint of the full stack is ~2-3 GB total across 5 containers (source: the README §"Structural footprint", labeled as a structural fact, not a benchmark claim). Lovely engineering, but it does not fit a small VPS or a developer's laptop running three other services.
Crawl4AI is excellent inside a Python notebook, but it requires Playwright + Chromium on every install. That is a ~2 GB Docker image and 300 MB+ idle RAM. Great for research, hard to deploy as a sidecar for an agent.
Browser-automation libraries (Playwright, Puppeteer, Selenium) are general-purpose scrapers. They are not designed around an LLM consumer, do not output clean markdown, do not speak MCP, and carry a Chromium baseline (~200-300 MB per worker).

What we actually needed for AI agent traffic was different in shape. An agent makes many small scrape calls, sometimes 50-200 in a single tool-use chain. It wants clean markdown, not raw HTML. It wants predictable latency. It wants to call the engine from inside a $5 VPS, a CI job, or a developer's laptop. And it wants the same API surface across all three.

So we wrote one.

Why Rust

The first real decision was the language. We considered Go (the Colly stack is mature; static binaries are nice), and we considered staying in Python (the ecosystem is unbeatable, but the resident footprint is not). We picked Rust for four concrete reasons.

1. One async pipeline, not multiprocessing

Python's CPython has a GIL, so a high-concurrency scraper in Python ends up as multiprocessing + headless Chrome + Redis queue + worker pool. Each piece is fine; the pile is the problem. Rust's Tokio runtime gives you genuine parallelism inside one process. A 100-URL crawl is N async tasks on one event loop, not N child processes each holding a browser open.

2. Predictable memory, no GC pauses

Rust's ownership model frees memory deterministically when a value goes out of scope. For a long-running scraper that processes thousands of pages, this matters: there is no slow heap bloat, no occasional GC pause, no JVM-style warmup. The resident set tracks the actual work in flight, not the historical high-water mark.

3. Single static binary

The output of cargo build --release is one file. No interpreter, no virtualenv, no npm install, no bundled Chromium by default. The Docker image we ship is around 8 MB (structural fact from the README §"Structural footprint" table — single ~8 MB binary, 1 container, plus an optional sidecar). That is roughly two orders of magnitude smaller than a Firecrawl deployment.

4. The async ecosystem is finally mature

Five years ago, async Rust was painful. In 2026 the stack we use — tokio + axum + reqwest + scraper + lol-html + serde_json — is genuinely production-grade. Axum's extractor system is clean, reqwest's connection pooling Just Works, and lol-html is one of the fastest HTML parsers in any language. The ergonomics are no longer an excuse to stay in Python.

To make this concrete, here is the high-level shape difference between a Python multiprocessing scraper and the Rust pipeline we landed on:

// Rust (fastCRW) — single async runtime, N tasks
let mut tasks = JoinSet::new();
for url in urls {
    tasks.spawn(scrape_one(url, client.clone()));
}
while let Some(res) = tasks.join_next().await {
    handle(res?);
}
// One process. One event loop. Memory tracks in-flight work.

# Python (Crawl4AI / Firecrawl-style) — process pool + browsers
with multiprocessing.Pool(processes=8) as pool:
    # Each worker spawns a headless browser:
    #   ~200-300 MB resident per worker
    #   8 workers = ~1.6-2.4 GB before any work starts
    results = pool.map(scrape_with_playwright, urls)

The Python pattern works — it is what most production scrapers run on today — but the baseline cost is high enough that "run a scraper on a small VPS" is not really a supported deployment. With the Rust pipeline, it is.

The ~50 MB idle footprint

One of the metrics we tracked from day one was idle RAM. Not because idle RAM is a benchmark — it is not — but because it is a proxy for how cheaply you can deploy the thing. If idle is sub-100 MB, a $5 VPS is viable. If idle is 500 MB+, you are paying for a $20 box just to host the runtime.

fastCRW's idle footprint is around 50 MB RAM on a $5 VPS (structural fact, OSS README §"Structural footprint" — we phrase it as a structural fact, not a benchmark, because actual resident size will vary by kernel and libc). Concretely, that 50 MB includes:

The Tokio runtime and its worker threads (one per core by default).
The Axum router and registered routes for /v1/scrape, /v1/crawl, /v1/crawl/:id, /v1/map, /v1/search, /mcp, and /health.
A reqwest client with idle connections (we cap idle connections per host to keep this bounded).
The lol-html parser code, but no parsed DOM at rest — lol-html only holds memory while a page is in flight.
A small SearXNG sidecar slot for search, optional.

What it does not include at idle, and why that matters:

No pre-loaded headless browser. Chromium would be ~200-300 MB on its own. LightPanda (when used) is launched on demand and torn down after — it is not a long-lived process at idle.
No interpreter. There is no Python runtime, no V8 heap, no JVM. The binary is the runtime.
No queue broker. Crawl jobs use an in-memory task graph. Redis is not required for the default deployment.

The trade-off is honest: in-memory crawl state means a restart abandons in-progress crawl jobs. That is fine for the agent-traffic shape we optimised for (small, short crawls). For long, durable crawls, the right pattern is to add a queue layer externally — we do not bake that in, because most callers do not need it.

Under load, the resident set grows with active request state — connection buffers, parse state, crawl queues — but it grows proportionally to actual in-flight work, not baseline overhead.

LightPanda over Chromium for the fast path

The hardest call we had to make was the browser story. There are three honest options for "this page needs JavaScript to render":

No browser at all. Refuse to render JS. Fine for HTML-primary pages, useless for SPAs.
Full Chromium (via CDP). The Playwright/Puppeteer approach. Most reliable, heaviest.
A lighter headless browser like LightPanda — designed for headless automation with a smaller footprint than Chromium.

We landed on a layered model. The default renderer is auto, which auto-selects with a chrome -> lightpanda -> http fallback. The fast path is HTTP-only with lol-html parsing. When that is not enough — JS-rendered content, hydration-dependent markup — we escalate to LightPanda. When LightPanda is not enough — heavy SPA, aggressive bot detection — we escalate to full Chrome (opt-in).

The escalation is driven by content heuristics (is the rendered HTML suspiciously empty? did we see a bot-detection challenge?) and by the explicit renderer body field on /v1/scrape. You can pin a renderer:

curl http://localhost:3000/v1/scrape \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://app.example.com/dashboard",
    "renderer": "lightpanda",
    "formats": ["markdown"]
  }'

The credit math reflects the cost. Source: the canonical credit-cost table in the README and on our pricing page.

scrape with any renderer (http, lightpanda, or chrome): 1 credit — flat rate, no renderer surcharge.
crawl: 1 credit per page, regardless of renderer.

Why not Chromium-only? Because the agent traffic shape we see is dominated by docs sites, blogs, news, and product pages — most of which render fine without a full SPA browser. Forcing Chromium on every request would multiply the resident footprint by an order of magnitude for content that does not need it. The default Docker Compose ships LightPanda; the chrome variant is opt-in and roughly 500 MB image + 1 GB resident (structural facts from the README, not benchmarks).

Firecrawl-compatible by design

We made one large concession to migration ergonomics: the URL shapes match Firecrawl. This was deliberate. Most teams that try fastCRW are already running on Firecrawl or evaluating it. If migration is "change the base URL", they will try us. If migration is "rewrite every call", they will not.

So the endpoint surface deliberately tracks Firecrawl's:

POST /v1/scrape — scrape one URL, optional formats array, optional jsonSchema for LLM extraction.
POST /v1/crawl — start an async BFS crawl, returns a job id. Accepts maxDepth (cap 10) and maxPages (cap 1000); limit and max_pages are serde aliases.
GET /v1/crawl/:id — crawl status and accumulated results.
DELETE /v1/crawl/:id — cancel a crawl job.
POST /v1/map — discover all URLs on a site.
POST /v1/search — web search via the SearXNG sidecar, optional per-result scraping.
POST /mcp — streamable HTTP MCP transport for agents that prefer it over stdio.
GET /health — health check, no auth.

Where we deliberately diverge from Firecrawl's exact surface:

/v1/extract exists on the managed cloud — a convenience wrapper over /v1/scrape with formats: ["json"], billed as the 1-credit scrape plus the LLM token cost (usage-metered LLM credits). Self-hosters use /v1/scrape with a jsonSchema directly. It accepts up to 50 URLs per request.
A research endpoint (/v2/search/research/papers) fans out across Google, OpenAlex, Semantic Scholar, and arXiv for multi-source research.
There is no /v1/agent (Spark-style) endpoint. Anti-bot (block detection, UA rotation, stealth fingerprints, residential-proxy rotation) is built into the open core.
Screenshot output is supported on v2. A request for formats: ["screenshot"] returns data.screenshot as a base64 PNG data URL; formats: ["screenshot@fullPage"] captures the full page.
Response field names and error envelopes have minor divergence from Firecrawl. The shapes are close; they are not byte-identical.
LLM extraction runs on fastCRW's managed LLM on paid plans.

The canonical endpoint reference lives at /api-reference — that is the source of truth for the surface, not this blog post.

First-class MCP

The single design decision that took us from "another scraping API" to "a primitive an agent can call natively" was building MCP into the binary, not bolting it on as a separate service.

fastCRW's MCP server reuses the scraping engine directly. The npm package crw-mcp@0.6.0 (dist-tag latest) ships a stdio-transport server you launch with one command:

npx crw-mcp

Drop the resulting config into Claude Desktop, Cursor, or any MCP-compatible agent, and the agent gets scrape, crawl, crawl_status, map, and search as native tools. No HTTP wrapping, no JSON envelope translation, no glue code.

The HTTP API also exposes MCP at POST /mcp via the streamable HTTP transport, for agents that prefer HTTP transport over stdio.

Why this matters: an agent's tool-use loop is sensitive to tool-call latency. Going through an extra HTTP hop, then a JSON-RPC bridge, then the scraper, adds tens of milliseconds per call. Multiply by 50-200 calls in a research chain and it shows up. Native MCP keeps the path short.

The 63.74% truth-recall benchmark

The single hardest part of writing a scraper is being honest about how good it is. The temptation is to publish a one-line speed multiplier and a flattering average latency. We do not.

Our headline accuracy number is 63.74% truth-recall on Firecrawl's own public 1,000-URL scrape-content dataset — 522 of 819 labeled URLs recovered, the highest of the three tools (diagnose_3way.py, 2026-05-08). That is +3.79 percentage points over Crawl4AI's 59.95% (491 of 819) and +7.70 percentage points over Firecrawl's 56.04% (459 of 819) on the same labeled set. Scrape-success is 91.8% of reachable URLs (877/955). Zero thrown errors over 3,000 total requests across the three engines.

The latency split — full p50 / p90 / p99 — is published because a single average would lie:

Metric	fastCRW	Crawl4AI	Firecrawl
Truth-recall (of 819 labeled)	63.74% (522)	59.95% (491)	56.04% (459)
Scrape-success (of reachable URLs)	91.8% (877/955)	83.5% (835)	see /benchmarks
p50 latency	1914 ms	1916 ms	2305 ms
p90 latency (fast mode)	4348 ms	4754 ms	6937 ms
p99 latency	15012 ms	13749 ms	21107 ms

Read the table honestly:

We lead on accuracy. fastCRW has the highest truth-recall of the three tools tested.
We carry the median speed win. p50 1914 ms beats Firecrawl's 2305 ms, and is effectively tied with Crawl4AI (2 ms apart).
We win the p90 in fast mode. fastCRW's fast-mode p90 of 4348 ms is the lowest of the three. The 34 URLs only fastCRW recovers represent 70% more unique coverage than the other two combined — a direct measure of what the chrome-stealth fallback adds.

That is the entire story. There is no "average latency" headline. There is no speed multiplier. The full benchmark methodology, dataset, and a one-command reproducible script live at /benchmarks. The harness is diagnose_3way.py — we publish the script and the raw run data so anyone can re-run it against their own URLs.

fastCRW leads on accuracy, p50, fast-mode p90, and unique URL recovery. If your traffic shape is "median matters" — most agent traffic is exactly this — fastCRW is the right pick. The full p50/p90/p99 split is on /benchmarks.

What we got wrong and fixed

The build-in-public part. Three concrete things we shipped, found broken under real load, and fixed.

1. DB pool sizing under burst

The first managed-cloud burst we took (a customer migrating a Firecrawl workload) saturated our database connection pool inside the first minute. Symptom: requests queued, latency p99 went through the roof, p50 stayed fine. The pool was sized for steady-state, not for the way agent traffic actually arrives — long quiet stretches punctuated by 200-call research chains.

Fix: pool sizing was retuned and made config-driven so we can adjust per deployment without a rebuild. We also moved auth lookups to a separate pool from credit accounting so auth checks cannot starve credit writes.

2. Search relevance keeps improving

The /v1/search endpoint is backed by a SearXNG sidecar — federated, no API keys, no rate limits we did not set ourselves. Our search benchmark (separate from the scrape benchmark above) shows strong latency wins — 880 ms average, 73 of 100 latency wins versus Firecrawl and Tavily (source: benchmarks/triple-bench.ts). We keep iterating on a content-aware re-ranking layer and per-engine weighting to push long-tail relevance further.

3. Auth backend errors that masqueraded as 401

Early on, when our auth backend was unhealthy, callers got HTTP 401 — which is the wrong code, because the credentials were fine, the lookup was broken. Agents would interpret 401 as "your key is invalid", regenerate the key, and the cycle repeated. We now return HTTP 503 with an explicit envelope when an auth backend dependency is down. 401 is reserved for the case where the credential itself is genuinely invalid.

What's next

The public roadmap is the OSS repo at github.com/us/crw — issues labeled roadmap are the canonical source. Things we have publicly committed to and are actively building:

Screenshot output. Shipped — formats: ["screenshot"] returns data.screenshot as a base64 PNG data URL on the v2 scrape API, with a screenshot@fullPage variant for full-page captures.
PDF parsing. Shipped — PDF URLs are auto-detected and parsed server-side.
Multi-URL /v1/extract. Shipped — accepts up to 50 URLs per request on the managed cloud.
Optional Redis-backed crawl state. So a restart does not abandon in-progress jobs. Opt-in, not default.
Re-ranking layer for /v1/search. Improve long-tail relevance on top of SearXNG.
Managed LLM across the LLM surface. As of v0.11.0 the managed LLM powers /v1/search answer mode on paid plans with no key to manage; we are bringing the same managed extraction experience to /v1/scrape next.

If you want to try it

Self-host, free (AGPL-3.0)

docker run -p 3000:3000 ghcr.io/us/crw:latest

curl http://localhost:3000/v1/scrape \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com", "formats": ["markdown"]}'

Single binary, no Redis, no Playwright, no Python environment. AGPL-3.0 for the engine — calling the API from a closed-source product is fine; modifying and redistributing the engine triggers source-sharing obligations. A commercial license is available if AGPL-3.0 is a concern.

Hosted via fastCRW

If you do not want to manage servers, fastcrw.com runs the same engine. Free tier ships 500 one-time lifetime credits (not a monthly meter). Paid tiers and current pricing are on fastcrw.com/pricing — single source of truth.

For agents

npx crw-mcp

Drop the config into Claude Desktop, Cursor, or any MCP-compatible agent and you get scrape, crawl, crawl_status, map, and search as native tools.