Skip to main content
Engineering

How We Built fastCRW: Rust, 50MB RAM, and the Path to Real-Time Web Scraping for AI Agents (2026)

A build-in-public engineering write-up of fastCRW — why we wrote it in Rust, how the binary stays around 50 MB RAM idle on a $5 VPS, when LightPanda beats Chromium, the Firecrawl-compatible REST surface, the built-in MCP server, the 63.74% truth-recall benchmark (diagnose_3way.py, 2026-05-08), and the things we got wrong along the way.

fastCRW logofastcrw
May 27, 202617 min read

The problem we kept hitting

This is a build-in-public write-up of fastCRW — what we set out to fix, the choices that worked, the choices we had to revisit, and the parts that are still in flight. If you are evaluating a self-hosted scraping engine for AI agents, this is the document we wish had existed when we started.

The recurring problem, in one sentence: AI agents need scrape + search + crawl as a real-time primitive, but the production-grade options force you to choose between a heavyweight Python stack with a Node-shaped resident footprint, or a Python framework that requires a full headless browser per worker.

  • Firecrawl's open-source stack is multi-container — Node API, Python workers, Redis, Playwright. Structural footprint of the full stack is ~2-3 GB total across 5 containers (source: the README §"Structural footprint", labeled as a structural fact, not a benchmark claim). Lovely engineering, but it does not fit a small VPS or a developer's laptop running three other services.
  • Crawl4AI is excellent inside a Python notebook, but it requires Playwright + Chromium on every install. That is a ~2 GB Docker image and 300 MB+ idle RAM. Great for research, hard to deploy as a sidecar for an agent.
  • Browser-automation libraries (Playwright, Puppeteer, Selenium) are general-purpose scrapers. They are not designed around an LLM consumer, do not output clean markdown, do not speak MCP, and carry a Chromium baseline (~200-300 MB per worker).

What we actually needed for AI agent traffic was different in shape. An agent makes many small scrape calls, sometimes 50-200 in a single tool-use chain. It wants clean markdown, not raw HTML. It wants predictable latency. It wants to call the engine from inside a $5 VPS, a CI job, or a developer's laptop. And it wants the same API surface across all three.

So we wrote one.

Why Rust

The first real decision was the language. We considered Go (the Colly stack is mature; static binaries are nice), and we considered staying in Python (the ecosystem is unbeatable, but the resident footprint is not). We picked Rust for four concrete reasons.

1. One async pipeline, not multiprocessing

Python's CPython has a GIL, so a high-concurrency scraper in Python ends up as multiprocessing + headless Chrome + Redis queue + worker pool. Each piece is fine; the pile is the problem. Rust's Tokio runtime gives you genuine parallelism inside one process. A 100-URL crawl is N async tasks on one event loop, not N child processes each holding a browser open.

2. Predictable memory, no GC pauses

Rust's ownership model frees memory deterministically when a value goes out of scope. For a long-running scraper that processes thousands of pages, this matters: there is no slow heap bloat, no occasional GC pause, no JVM-style warmup. The resident set tracks the actual work in flight, not the historical high-water mark.

3. Single static binary

The output of cargo build --release is one file. No interpreter, no virtualenv, no npm install, no bundled Chromium by default. The Docker image we ship is around 8 MB (structural fact from the README §"Structural footprint" table — single ~8 MB binary, 1 container, plus an optional sidecar). That is roughly two orders of magnitude smaller than a Firecrawl deployment.

4. The async ecosystem is finally mature

Five years ago, async Rust was painful. In 2026 the stack we use — tokio + axum + reqwest + scraper + lol-html + serde_json — is genuinely production-grade. Axum's extractor system is clean, reqwest's connection pooling Just Works, and lol-html is one of the fastest HTML parsers in any language. The ergonomics are no longer an excuse to stay in Python.

To make this concrete, here is the high-level shape difference between a Python multiprocessing scraper and the Rust pipeline we landed on:

// Rust (fastCRW) — single async runtime, N tasks
let mut tasks = JoinSet::new();
for url in urls {
    tasks.spawn(scrape_one(url, client.clone()));
}
while let Some(res) = tasks.join_next().await {
    handle(res?);
}
// One process. One event loop. Memory tracks in-flight work.
# Python (Crawl4AI / Firecrawl-style) — process pool + browsers
with multiprocessing.Pool(processes=8) as pool:
    # Each worker spawns a headless browser:
    #   ~200-300 MB resident per worker
    #   8 workers = ~1.6-2.4 GB before any work starts
    results = pool.map(scrape_with_playwright, urls)

The Python pattern works — it is what most production scrapers run on today — but the baseline cost is high enough that "run a scraper on a small VPS" is not really a supported deployment. With the Rust pipeline, it is.

The ~50 MB idle footprint

One of the metrics we tracked from day one was idle RAM. Not because idle RAM is a benchmark — it is not — but because it is a proxy for how cheaply you can deploy the thing. If idle is sub-100 MB, a $5 VPS is viable. If idle is 500 MB+, you are paying for a $20 box just to host the runtime.

fastCRW's idle footprint is around 50 MB RAM on a $5 VPS (structural fact, OSS README §"Structural footprint" — we phrase it as a structural fact, not a benchmark, because actual resident size will vary by kernel and libc). Concretely, that 50 MB includes:

  • The Tokio runtime and its worker threads (one per core by default).
  • The Axum router and registered routes for /v1/scrape, /v1/crawl, /v1/crawl/:id, /v1/map, /v1/search, /mcp, and /health.
  • A reqwest client with idle connections (we cap idle connections per host to keep this bounded).
  • The lol-html parser code, but no parsed DOM at rest — lol-html only holds memory while a page is in flight.
  • A small SearXNG sidecar slot for search, optional.

What it does not include at idle, and why that matters:

  • No pre-loaded headless browser. Chromium would be ~200-300 MB on its own. LightPanda (when used) is launched on demand and torn down after — it is not a long-lived process at idle.
  • No interpreter. There is no Python runtime, no V8 heap, no JVM. The binary is the runtime.
  • No queue broker. Crawl jobs use an in-memory task graph. Redis is not required for the default deployment.

The trade-off is honest: in-memory crawl state means a restart abandons in-progress crawl jobs. That is fine for the agent-traffic shape we optimised for (small, short crawls). For long, durable crawls, the right pattern is to add a queue layer externally — we do not bake that in, because most callers do not need it.

Under load, the resident set grows with active request state — connection buffers, parse state, crawl queues — but it grows proportionally to actual in-flight work, not baseline overhead.

LightPanda over Chromium for the fast path

The hardest call we had to make was the browser story. There are three honest options for "this page needs JavaScript to render":

  1. No browser at all. Refuse to render JS. Fine for HTML-primary pages, useless for SPAs.
  2. Full Chromium (via CDP). The Playwright/Puppeteer approach. Most reliable, heaviest.
  3. A lighter headless browser like LightPanda — designed for headless automation with a smaller footprint than Chromium.

We landed on a layered model. The default renderer is auto, which auto-selects with a chrome -> lightpanda -> http fallback. The fast path is HTTP-only with lol-html parsing. When that is not enough — JS-rendered content, hydration-dependent markup — we escalate to LightPanda. When LightPanda is not enough — heavy SPA, aggressive bot detection — we escalate to full Chrome (opt-in).

The escalation is driven by content heuristics (is the rendered HTML suspiciously empty? did we see a bot-detection challenge?) and by the explicit renderer body field on /v1/scrape. You can pin a renderer:

curl http://localhost:3000/v1/scrape \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://app.example.com/dashboard",
    "renderer": "lightpanda",
    "formats": ["markdown"]
  }'

The credit math reflects the cost. Source: the canonical credit-cost table in the README and on our pricing page.

  • scrape with http or lightpanda renderer: 1 credit.
  • scrape with chrome renderer: 2 credits.
  • crawl: 1 credit per page (2 per page if chrome-rendered).

Why not Chromium-only? Because the agent traffic shape we see is dominated by docs sites, blogs, news, and product pages — most of which render fine without a full SPA browser. Forcing Chromium on every request would multiply the resident footprint by an order of magnitude for content that does not need it. The default Docker Compose ships LightPanda; the chrome variant is opt-in and roughly 500 MB image + 1 GB resident (structural facts from the README, not benchmarks).

Firecrawl-compatible by design

We made one large concession to migration ergonomics: the URL shapes match Firecrawl. This was deliberate. Most teams that try fastCRW are already running on Firecrawl or evaluating it. If migration is "change the base URL", they will try us. If migration is "rewrite every call", they will not.

So the endpoint surface deliberately tracks Firecrawl's:

  • POST /v1/scrape — scrape one URL, optional formats array, optional jsonSchema for LLM extraction.
  • POST /v1/crawl — start an async BFS crawl, returns a job id. Accepts maxDepth (cap 10) and maxPages (cap 1000); limit and max_pages are serde aliases.
  • GET /v1/crawl/:id — crawl status and accumulated results.
  • DELETE /v1/crawl/:id — cancel a crawl job.
  • POST /v1/map — discover all URLs on a site.
  • POST /v1/search — web search via the SearXNG sidecar, optional per-result scraping.
  • POST /mcp — streamable HTTP MCP transport for agents that prefer it over stdio.
  • GET /health — health check, no auth.

Honest gaps (we are not Firecrawl, and we don't pretend to be):

  • /v1/extract exists only on the managed cloud — a 5-credit convenience wrapper over /v1/scrape with formats: ["json"]. Self-hosters use /v1/scrape with a jsonSchema directly. It is single-URL only.
  • There is no /v1/batch/scrape. For many URLs, iterate /v1/scrape concurrently or use /v1/crawl.
  • There is no /v1/agent (Spark models), no /v1/deep-research, and no Fire-engine anti-bot.
  • Screenshot output is not supported. A request for formats: ["screenshot"] returns HTTP 422.
  • Response field names and error envelopes have minor divergence from Firecrawl. The shapes are close; they are not byte-identical.
  • LLM extraction supports OpenAI and Anthropic providers only.

The canonical endpoint reference lives at /api-reference — that is the source of truth for the surface, not this blog post.

First-class MCP

The single design decision that took us from "another scraping API" to "a primitive an agent can call natively" was building MCP into the binary, not bolting it on as a separate service.

fastCRW's fastCRW MCP reuses the scraping engine directly. The npm package crw-mcp@0.6.0 (dist-tag latest) ships a stdio-transport server you launch with one command:

npx crw-mcp

Drop the resulting config into Claude Desktop, Cursor, or any MCP-compatible agent, and the agent gets scrape, crawl, crawl_status, map, and search as native tools. No HTTP wrapping, no JSON envelope translation, no glue code.

The HTTP API also exposes MCP at POST /mcp via the streamable HTTP transport, for agents that prefer HTTP transport over stdio.

Why this matters: an agent's tool-use loop is sensitive to tool-call latency. Going through an extra HTTP hop, then a JSON-RPC bridge, then the scraper, adds tens of milliseconds per call. Multiply by 50-200 calls in a research chain and it shows up. Native MCP keeps the path short.

The 63.74% truth-recall benchmark

The single hardest part of writing a scraper is being honest about how good it is. The temptation is to publish a one-line speed multiplier and a flattering average latency. We do not.

Our headline accuracy number is 63.74% truth-recall on Firecrawl's own public 1,000-URL scrape-content dataset — 522 of 819 labeled URLs recovered (diagnose_3way.py, 2026-05-08). That is +3.79 percentage points over Crawl4AI's 59.95% (491 of 819) and +7.70 percentage points over Firecrawl's 56.04% (459 of 819) on the same labeled set. Scrape-success across the full 1,000 URLs is 87.7% (877). Zero thrown errors over 3,000 total requests across the three engines.

The latency split — full p50 / p90 / p99 — is published because a single average would lie:

MetricfastCRWCrawl4AIFirecrawl
Truth-recall (of 819 labeled)63.74% (522)59.95% (491)56.04% (459)
Scrape-success (of 1,000)87.7% (877)83.5% (835)89.7% (897)
p50 latency1914 ms1916 ms2305 ms
p90 latency14157 ms4754 ms6937 ms
p99 latency15012 ms13749 ms21107 ms

Read the table honestly:

  • We lead on accuracy. fastCRW has the highest truth-recall of the three tools tested.
  • We carry the median speed win. p50 1914 ms beats Firecrawl's 2305 ms, and is effectively tied with Crawl4AI (2 ms apart).
  • We lose the tail. fastCRW's p90 of 14157 ms is the worst of the three. This is causal, not incidental: the chrome-stealth fallback that recovers the URLs the others miss is the same mechanism that produces the slow tail. The accuracy lead and the slow tail are the same trade.

That is the entire story. There is no "average latency" headline. There is no speed multiplier. The full benchmark methodology, dataset, and a one-command reproducible script live at /benchmarks. The harness is diagnose_3way.py — we publish the script and the raw run data so anyone can re-run it against their own URLs.

If your traffic shape is "median matters, tail does not" — most agent traffic is exactly this — fastCRW is the right pick. If your traffic shape is "tail matters, accuracy is secondary" — for example, a real-time UI that must respond in 5 seconds or fail — Crawl4AI's 4754 ms p90 may be the better fit. We will not tell you otherwise.

What we got wrong and fixed

The build-in-public part. Three concrete things we shipped, found broken under real load, and fixed.

1. DB pool sizing under burst

The first managed-cloud burst we took (a customer migrating a Firecrawl workload) saturated our database connection pool inside the first minute. Symptom: requests queued, latency p99 went through the roof, p50 stayed fine. The pool was sized for steady-state, not for the way agent traffic actually arrives — long quiet stretches punctuated by 200-call research chains.

Fix: pool sizing was retuned and made config-driven so we can adjust per deployment without a rebuild. We also moved auth lookups to a separate pool from credit accounting so auth checks cannot starve credit writes.

2. SearXNG relevance is work in progress

The /v1/search endpoint is backed by a SearXNG sidecar. SearXNG is excellent infrastructure — federated, no API keys, no rate limits we did not set ourselves — but its default ranking is noisier than commercial search APIs. Our search benchmark (separate from the scrape benchmark above) shows good latency wins (880 ms average, 73 of 100 latency wins versus Firecrawl and Tavily, source: benchmarks/triple-bench.ts), but relevance for long-tail queries is a known weak spot. We are iterating on a re-ranking layer and per-engine weighting. If you hit a query where SearXNG's defaults rank a marketing landing page above the canonical doc, that is the thing we are fixing.

3. Auth backend errors that masqueraded as 401

Early on, when our auth backend was unhealthy, callers got HTTP 401 — which is the wrong code, because the credentials were fine, the lookup was broken. Agents would interpret 401 as "your key is invalid", regenerate the key, and the cycle repeated. We now return HTTP 503 with an explicit envelope when an auth backend dependency is down. 401 is reserved for the case where the credential itself is genuinely invalid.

What's next

The public roadmap is the OSS repo at github.com/us/crw — issues labeled roadmap are the canonical source. Things we have publicly committed to and are actively building:

  • Screenshot output. Not currently supported (HTTP 422). On the roadmap.
  • PDF / DOCX parsing. Not currently supported. On the roadmap.
  • Multi-URL /v1/extract. Currently single-URL on the managed cloud only; a batched form is on the roadmap.
  • Optional Redis-backed crawl state. So a restart does not abandon in-progress jobs. Opt-in, not default.
  • Re-ranking layer for /v1/search. Improve long-tail relevance on top of SearXNG.
  • More LLM providers for extraction. Currently OpenAI and Anthropic; the BYOK story for /v1/search answer mode (added in v0.7.0) already covers DeepSeek, Azure, and OpenAI-compatible endpoints — we are bringing that surface to /v1/scrape extraction next.

If you want to try it

Self-host, free (AGPL-3.0)

docker run -p 3000:3000 ghcr.io/us/crw:latest

curl http://localhost:3000/v1/scrape \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com", "formats": ["markdown"]}'

Single binary, no Redis, no Playwright, no Python environment. AGPL-3.0 for the engine — calling the API from a closed-source product is fine; modifying and redistributing the engine triggers source-sharing obligations. A commercial license is available if AGPL-3.0 is a concern.

Hosted via fastCRW

If you do not want to manage servers, fastcrw.com runs the same engine. Free tier ships 500 one-time lifetime credits (not a monthly meter). Paid tiers and current pricing are on fastcrw.com/pricing — single source of truth.

For agents

npx crw-mcp

Drop the config into Claude Desktop, Cursor, or any MCP-compatible agent and you get scrape, crawl, crawl_status, map, and search as native tools.

Further reading

FAQ

Frequently asked questions

Why did you build fastCRW in Rust instead of extending Firecrawl or Crawl4AI in Python?
We hit two recurring problems with the Python options. Firecrawl's open-source stack is multi-container (Node API, Python workers, Redis, Playwright) and has a heavy resident footprint that does not fit a $5 VPS or local dev. Crawl4AI requires a full headless browser (Chromium via Playwright) on every install, which carries a 200-300 MB browser process per worker and a ~2 GB Docker image. We wanted one async pipeline, one static binary, predictable memory, and a fast HTTP-only path for the 70-80% of pages that do not need JavaScript. Rust + Tokio + Axum + reqwest + lol-html gave us exactly that — a single binary with no runtime, no GC, and an idle footprint around 50 MB RAM (structural fact from the OSS README, not a benchmark claim).
How does fastCRW stay around 50 MB RAM idle when other scrapers need 200 MB or more?
Three structural decisions. (1) No headless browser is pre-loaded — the default Docker Compose ships LightPanda but only starts it on demand, and Chrome is opt-in (structural fact, OSS README). (2) The HTML path uses lol-html, Cloudflare's streaming Rust rewriter, which never builds a full DOM tree — memory is proportional to the largest element, not the whole page. (3) The Rust ownership model frees memory deterministically — there is no GC heap bloat. The Docker image is a single ~8 MB binary, versus Firecrawl's ~2-3 GB total across 5 containers (structural footprint section of the README).
When does LightPanda actually beat Chromium, and when should I still use Chrome?
LightPanda is the right pick when you need lightweight JavaScript rendering for pages that mostly hydrate from server-rendered HTML — docs sites, blogs, marketing pages, news, most e-commerce product pages. It starts fast and stays small. Chrome (the opt-in renderer, billed at 2 credits per scrape on the managed cloud) is the right pick for heavy SPAs with complex client-side routing, sites behind aggressive bot detection, or anything that needs the full stealth fallback. Our default renderer is `auto`, which selects `chrome -> lightpanda -> http` with fallback, so most callers never have to choose. You can force a renderer with the `renderer: "lightpanda"` body field on `/v1/scrape`.
What is the headline accuracy number for fastCRW, and how do I reproduce it?
63.74% truth-recall on Firecrawl's own public 1,000-URL scrape-content dataset — 522 of 819 labeled URLs recovered, with 87.7% scrape-success across the full 1,000 and 0 thrown errors over 3,000 requests (`diagnose_3way.py`, 2026-05-08). That is +3.79 percentage points over Crawl4AI's 59.95% and +7.70 percentage points over Firecrawl's 56.04% on the same labeled set. p50 latency is 1914 ms; p90 is 14157 ms — we publish the full p50/p90/p99 split because a single average would hide the chrome-stealth fallback's slow tail. The harness, dataset, and one-command repro are at /benchmarks.
Is the fastCRW REST API actually drop-in compatible with Firecrawl?
By design, yes — for the endpoint shapes that matter for an agent migration. We copied the URL shapes: `/v1/scrape`, `/v1/crawl`, `/v1/crawl/:id` (status), `/v1/map`, `/v1/search`. Most migrations are a one-line base-URL swap. The honest gaps: `/v1/extract` exists on the managed cloud only (self-hosters use `/v1/scrape` with `formats: ["json"]` plus a `jsonSchema`); there is no `/v1/batch/scrape`, no `/v1/agent`, no `/v1/deep-research`; screenshot output is not supported (a request for `formats: ["screenshot"]` returns HTTP 422); response field names and error envelopes have minor divergence. See /api-reference for the canonical endpoint list.

Get Started

Try CRW Free

Self-host for free (AGPL) or use fastCRW cloud with 500 free credits — no credit card required.

Continue exploring

More engineering posts

View category archive