Engineering

Why Low Memory Usage Matters in Self-Hosted Scraping

How idle RAM affects your hosting costs and concurrent throughput — and why CRW's 6.6 MB footprint changes the economics.

[Fast]
C
R
W
March 12, 20267 min read

Why Memory Is the Most Underrated Metric in Scraping

When developers compare scraping tools, they focus on features: does it support JavaScript? Does it have an SDK? Does it extract structured data? These matter. But for teams self-hosting scraping infrastructure, memory usage is often the most important practical metric — because it directly determines how much your infrastructure costs.

The Economics of Idle RAM

Every service you self-host has a fixed memory floor — the baseline RAM consumed before it processes a single request. This "idle memory" is the price you pay just to have the service running.

Consider a team running 10 concurrent scraping workers:

ToolIdle RAM per instance10 instancesMin server sizeMonthly cost (DigitalOcean)
CRW6.6 MB~66 MB1 GB ($6/mo)$6
Spider~20 MB~200 MB1 GB ($6/mo)$6
Crawl4AI300 MB3 GB4 GB ($24/mo)$24
Firecrawl500 MB5 GB8 GB ($48/mo)$48

The difference between CRW and Firecrawl for 10 workers: $6/mo vs $48/mo. Over a year: $72 vs $576. For a team running 50 workers: $30/mo vs $240/mo. These numbers compound significantly at scale.

Why Does Firecrawl Use 500 MB at Idle?

Firecrawl's stack includes Node.js (v8 heap ~50 MB baseline), Playwright (~100 MB), and a Chromium browser instance (~300 MB). Chromium is loaded at startup to avoid per-request browser cold starts. This is a reasonable engineering tradeoff for a service that needs to render JavaScript on every request — but it means you're paying for a full browser runtime even when you're not using it.

Why Does CRW Use 6.6 MB at Idle?

CRW is a Rust binary with no garbage-collected runtime, no V8 heap, and no pre-loaded browser. It uses:

  • A Tokio async runtime (~2 MB)
  • An Axum HTTP server (~1 MB)
  • Connection pools for outbound requests (~2 MB)
  • Miscellaneous process overhead (~1 MB)

When JavaScript rendering is needed, CRW spawns LightPanda on-demand and releases it after the request. Memory scales with actual load, not with the number of idle workers.

Memory Scaling Under Load

Idle memory is only part of the story. How memory grows under concurrent load is equally important:

ToolIdle10 concurrent req50 concurrent req
CRW6.6 MB~40 MB~120 MB
Firecrawl500 MB~700 MB~2 GB+
Crawl4AI300 MB~600 MB~1.5 GB+

CRW grows roughly linearly with load because each request is handled by a lightweight async task. Node.js and Python services have higher per-request overhead, and browser-based rendering adds significant memory spikes for JavaScript-heavy pages.

The Sidecar Pattern

One of the most common self-hosting patterns is running a scraping service as a sidecar to your main application — on the same server, sharing resources. This is where memory efficiency matters most.

If your main application uses 1.5 GB of RAM on a 4 GB server, you have 2.5 GB available for the scraping sidecar. CRW fits comfortably and leaves headroom. Firecrawl's idle memory alone would take most of that budget.

The practical implication: CRW can run as a sidecar on virtually any application server. Firecrawl requires a dedicated instance or a significantly larger shared server.

Memory and Deployment Flexibility

Low memory enables deployment patterns that high-memory services can't support:

Serverless-adjacent: Functions with 256 MB memory limits can't run Firecrawl. CRW's binary can be packaged for environments like Fly.io Machines or Railway with tiny memory allocations.

ARM instances: Cloud ARM instances (AWS Graviton, Ampere) offer the best price-performance ratio but often have smaller memory options. CRW runs efficiently on 512 MB ARM instances.

Edge environments: Running scraping closer to your users reduces latency. Edge compute typically limits memory to 128–512 MB per function. CRW's footprint fits; most alternatives don't.

When Memory Efficiency Is Less Important

Memory efficiency matters most when you're self-hosting at scale or in constrained environments. In these cases, it's less critical:

  • Low-frequency scraping: If you scrape 10 pages/day, hosting cost is trivial regardless of tool choice.
  • Managed cloud services: If you're using Firecrawl's hosted API, you don't pay for their infrastructure directly — you pay per request.
  • Feature requirements outweigh cost: If you need screenshot capture or document parsing, the memory premium for Firecrawl may be acceptable.

The Long-Term View

Infrastructure costs are often underestimated in early stages. A $42/month hosting premium seems trivial until your scraping workload scales. At 100 concurrent workers: $60/mo (CRW) vs $480/mo (Firecrawl). Over 3 years: $2,160 vs $17,280.

The memory difference isn't just about current costs — it's about the operational headroom you have to scale without re-architecting your infrastructure.

Getting Started

Self-host CRW on a $5/month server:

docker run -d --restart unless-stopped -p 3000:3000 ghcr.io/us/crw:latest

Or use fastCRW — the managed version — if you prefer not to manage infrastructure at all.

Get Started

Try CRW Free

Self-host for free (AGPL) or use fastCRW cloud with 500 free credits — no credit card required.