Why Memory Is the Most Underrated Metric in Scraping

When developers compare scraping tools, they focus on features: does it support JavaScript? Does it have an SDK? Does it extract structured data? These matter. But for teams self-hosting scraping infrastructure, memory usage is often the most important practical metric — because it directly determines how much your infrastructure costs.

The Economics of Idle RAM

Every service you self-host has a fixed memory floor — the baseline RAM consumed before it processes a single request. This "idle memory" is the price you pay just to have the service running.

Consider a team running 10 concurrent scraping workers:

Tool	Idle RAM per instance	10 instances	Min server size	Monthly cost (DigitalOcean)
CRW	Small static binary	Fits a small node	1 GB ($6/mo)	$6
Crawl4AI	300 MB	3 GB	4 GB ($24/mo)	$24
Firecrawl	500 MB	5 GB	8 GB ($48/mo)	$48

The difference between CRW and Firecrawl for 10 workers: $6/mo vs $48/mo. Over a year: $72 vs $576. For a team running 50 workers: $30/mo vs $240/mo. These numbers compound significantly at scale.

Why Does Firecrawl Use 500 MB at Idle?

Firecrawl's stack includes Node.js (v8 heap ~50 MB baseline), Playwright (~100 MB), and a Chromium browser instance (~300 MB). Chromium is loaded at startup to avoid per-request browser cold starts. This is a reasonable engineering tradeoff for a service that needs to render JavaScript on every request — but it means you're paying for a full browser runtime even when you're not using it.

Why Is CRW's Idle Footprint So Small?

CRW is a single static Rust binary with no garbage-collected runtime, no V8 heap, and no pre-loaded browser. Its baseline is just:

A Tokio async runtime
An Axum HTTP server
Connection pools for outbound requests
Minimal process overhead

When JavaScript rendering is needed, CRW spawns LightPanda on-demand and releases it after the request. Memory scales with actual load, not with the number of idle workers.

Memory Scaling Under Load

Idle memory is only part of the story. How memory grows under concurrent load is equally important:

Tool	Idle	10 concurrent req	50 concurrent req
CRW	Small static binary	Grows with load	Stays modest
Firecrawl	500 MB	~700 MB	~2 GB+
Crawl4AI	300 MB	~600 MB	~1.5 GB+

CRW grows roughly linearly with load because each request is handled by a lightweight async task. Node.js and Python services have higher per-request overhead, and browser-based rendering adds significant memory spikes for JavaScript-heavy pages.

The Sidecar Pattern

One of the most common self-hosting patterns is running a scraping service as a sidecar to your main application — on the same server, sharing resources. This is where memory efficiency matters most.

If your main application uses 1.5 GB of RAM on a 4 GB server, you have 2.5 GB available for the scraping sidecar. CRW fits comfortably and leaves headroom. Firecrawl's idle memory alone would take most of that budget.

The practical implication: CRW can run as a sidecar on virtually any application server. Firecrawl requires a dedicated instance or a significantly larger shared server.

Memory and Deployment Flexibility

Low memory enables deployment patterns that high-memory services can't support:

Serverless-adjacent: Functions with 256 MB memory limits can't run Firecrawl. CRW's binary can be packaged for environments like Fly.io Machines or Railway with tiny memory allocations.

ARM instances: Cloud ARM instances (AWS Graviton, Ampere) offer the best price-performance ratio but often have smaller memory options. CRW runs efficiently on 512 MB ARM instances.

Edge environments: Running scraping closer to your users reduces latency. Edge compute typically limits memory to 128–512 MB per function. CRW's footprint fits; most alternatives don't.

When Memory Efficiency Is Less Important

Memory efficiency matters most when you're self-hosting at scale or in constrained environments. In these cases, it's less critical:

Low-frequency scraping: If you scrape 10 pages/day, hosting cost is trivial regardless of tool choice.
Managed cloud services: If you're using Firecrawl's hosted API, you don't pay for their infrastructure directly — you pay per request.
Feature requirements outweigh cost: If you need screenshot capture or document parsing, the memory premium for Firecrawl may be acceptable.

The Long-Term View

Infrastructure costs are often underestimated in early stages. A $42/month hosting premium seems trivial until your scraping workload scales. At 100 concurrent workers: $60/mo (CRW) vs $480/mo (Firecrawl). Over 3 years: $2,160 vs $17,280.

The memory difference isn't just about current costs — it's about the operational headroom you have to scale without re-architecting your infrastructure.

Getting Started

Self-host CRW on a $5/month server:

docker run -d --restart unless-stopped -p 3000:3000 ghcr.io/us/crw:latest

Or use fastCRW — the managed version — if you prefer not to manage infrastructure at all.

Why Low Memory Usage Matters in Self-Hosted Scraping

Why Memory Is the Most Underrated Metric in Scraping

The Economics of Idle RAM

Why Does Firecrawl Use 500 MB at Idle?

Why Is CRW's Idle Footprint So Small?

Memory Scaling Under Load

The Sidecar Pattern

Memory and Deployment Flexibility

When Memory Efficiency Is Less Important

The Long-Term View

Getting Started

Try CRW Free

More engineering posts

What Is Local-First Web Scraping?

What Is Agentic Search and Why It Beats Stale Caches

Agentic Search vs RAG Retrieval for Agents