Engineering

The Real Cost of Self-Hosting vs Cloud Scraping APIs

Self-hosted vs cloud scraping API costs — TCO breakdown with real calculations for VPS, engineering time, and CRW's lightweight edge.

[Fast]
C
R
W
March 30, 202612 min read

The Question Nobody Actually Calculates

Every team building a scraping pipeline asks the same question: should we self-host or use a cloud API? Most answer based on gut feeling — "cloud is easier" or "self-hosting is cheaper." Very few actually run the numbers. And the numbers are where the surprises live.

This post breaks down the real total cost of ownership (TCO) for web scraping at three scales: hobby/startup (1K–10K pages/month), growth (50K–200K pages/month), and scale (500K+ pages/month). We'll compare cloud scraping APIs (Firecrawl, ScrapingBee, Apify) against self-hosted options, and show where CRW's lightweight architecture changes the economics.

What Goes Into TCO

The sticker price of a cloud API or VPS is never the full cost. Real TCO includes:

Cloud API costs

  • Per-credit pricing: Most cloud scraping APIs charge per "credit" or "request." A credit typically corresponds to one page scrape, but JavaScript rendering, screenshots, or proxy usage can multiply the credit cost per request.
  • Overage charges: Exceeding your plan's credit limit often triggers higher per-credit rates.
  • Feature-gated pricing: Premium features (residential proxies, CAPTCHA solving, structured extraction) are often on higher-tier plans.

Self-hosting costs

  • Infrastructure: VPS or cloud VM costs (monthly recurring).
  • Engineering time: Initial setup, ongoing maintenance, updates, monitoring. This is the cost most teams underestimate.
  • Bandwidth: Egress fees on cloud providers, usually negligible for scraping but worth tracking.
  • Proxy costs: If you need proxy rotation for anti-bot bypass, this is a separate line item regardless of self-hosted or cloud.

Scale 1: Hobby / Startup (1K–10K Pages/Month)

At this scale, you're building an MVP, running a side project, or scraping for personal research. Volume is low and predictable.

Cloud API cost

ProviderPlanCredits/monthMonthly costCost per 10K pages
FirecrawlHobby500$19~$380
FirecrawlStandard50,000$149~$30
ScrapingBeeFreelance150,000$49~$3.30
ApifyStarterVariable (compute-based)$49~$10–40
fastCRWFree tier50$0Paid plans from $10

Note: Pricing accurate as of March 2026. Cloud APIs frequently change pricing — check current rates before making decisions.

Self-hosted CRW cost

ComponentMonthly cost
DigitalOcean Droplet (1 vCPU, 512 MB RAM)$4
CRW license$0 (AGPL-3.0)
Setup time (1 hour, one-time)~$0 amortized
Total$4/month

At hobby scale, the math is simple: $4/month for unlimited self-hosted scraping vs $19–149/month for cloud credits. CRW's 6.6 MB idle RAM means even the smallest VPS tier is overkill — a $4 droplet handles this volume without breaking a sweat.

When cloud wins at this scale

If you need residential proxies for anti-bot bypass, the cloud APIs bundle this into their pricing. Self-hosting proxies at this scale is more expensive than the cloud API itself. If your target sites are heavily protected, a cloud API with built-in proxy rotation is likely cheaper total.

Scale 2: Growth (50K–200K Pages/Month)

This is a production scraping pipeline: feeding an AI agent, building a search index, monitoring competitor prices, or populating a content database. Volume is meaningful and growing.

Cloud API cost

ProviderPlanCredits/monthMonthly costCost per 100K pages
FirecrawlStandard50,000$149~$298
FirecrawlGrowth500,000$999~$200
ScrapingBeeBusiness3,000,000$249~$8.30
ApifyScaleVariable$499~$50–100
fastCRWGrowth100,000$49~$49

Self-hosted CRW cost

ComponentMonthly cost
DigitalOcean Droplet (2 vCPU, 2 GB RAM)$18
CRW license$0 (AGPL-3.0)
Maintenance (~2 hours/month)~$100 (at $50/hr eng time)
Total$118/month

At growth scale, the savings from self-hosting become substantial — especially compared to premium cloud APIs. A $18/month VPS running CRW handles 200K pages/month comfortably. The same volume on Firecrawl's cloud would cost $149–999/month depending on the plan.

The engineering time question

The $100/month for maintenance is the controversial number. Some teams spend zero hours maintaining CRW — it runs unattended. Others spend time on monitoring, updates, and debugging edge cases. Your actual cost depends on your team's operational maturity and how mission-critical the scraping pipeline is.

If you're already running Docker containers in production and have basic monitoring in place, CRW adds negligible operational overhead. If you don't have any infrastructure experience, the cloud API's simplicity has real value.

The hidden cloud cost: credit math

Cloud scraping APIs often charge extra credits for specific features. A "1 credit per scrape" baseline can become 5–10 credits per scrape when you add JavaScript rendering, proxy rotation, or structured extraction. At 100K pages/month with 5× credit multipliers, you're actually consuming 500K credits — which pushes you into a higher pricing tier.

Self-hosted CRW doesn't have credit multipliers. JavaScript rendering via LightPanda, structured extraction, and crawling are all included at zero marginal cost. The $18/month VPS cost is the same whether you scrape with or without JavaScript rendering.

Scale 3: Enterprise (500K+ Pages/Month)

At this scale, you're running scraping as a core business function: large-scale price monitoring, comprehensive search indexing, or AI training data collection.

Cloud API cost

ProviderPlanMonthly costCost at 1M pages
FirecrawlGrowth (500K credits)$999~$1,998
FirecrawlEnterpriseCustomNegotiated
ScrapingBeeBusiness+$499+~$100–200
fastCRWScale$199~$399

Self-hosted CRW cost

ComponentMonthly cost
Dedicated server (4 vCPU, 8 GB RAM)$48
CRW license$0 (AGPL-3.0)
Maintenance (~4 hours/month)~$200
Monitoring (Grafana/Prometheus, shared)~$10
Total~$258/month

At enterprise scale, self-hosting CRW saves thousands per month compared to cloud APIs. A $48/month dedicated server handles 1M+ pages/month because CRW's resource consumption doesn't scale linearly with volume — the streaming parser is inherently efficient.

Why CRW changes the self-hosting math

Traditional self-hosted scraping (Selenium, Playwright, Firecrawl self-hosted) requires beefy infrastructure because browsers consume significant resources. Self-hosting Firecrawl at 1M pages/month needs a server with 8+ GB RAM for the Node.js + Playwright stack — pushing VPS costs to $96+/month before you add Redis and worker processes.

CRW's Rust + lol-html architecture means the server cost stays low even at high volume. The 8 MB Docker image and 6.6 MB idle RAM mean you're paying for compute capacity you actually use, not browser overhead. See our post on memory economics for a detailed analysis of why this matters.

The Self-Hosting Tipping Point

Based on the numbers above, the tipping point where self-hosting CRW becomes clearly cheaper than cloud APIs is around 10K–20K pages/month. Below that, cloud APIs (especially free tiers) are hard to beat on total convenience. Above that, the self-hosting savings compound monthly.

Monthly volumeCloud API (typical)Self-hosted CRWSavings
1K pages$0–19$4Marginal
10K pages$30–150$4$26–146/mo
50K pages$99–499$18$81–481/mo
200K pages$299–999$18$281–981/mo
1M pages$999–2,000+$48$951–1,952/mo

Self-hosted costs exclude engineering time. Add $50–200/month for maintenance depending on team and scale.

What Cloud APIs Give You That Self-Hosting Doesn't

The cost comparison isn't complete without acknowledging what you get from a cloud API beyond raw scraping:

Proxy infrastructure

Cloud scraping APIs bundle proxy rotation — datacenter and residential IPs — into their pricing. Self-hosting proxies is expensive: residential proxy services charge $5–15/GB, and maintaining your own proxy pool requires ongoing work. If your target sites need proxy rotation, factor this into the self-hosting cost.

Anti-bot bypass

Cloud APIs invest heavily in anti-bot technology: CAPTCHA solving, browser fingerprinting, IP rotation strategies. If you're scraping heavily protected sites (e-commerce, travel, social media), the cloud API's anti-bot capabilities may be worth the premium.

Zero operational overhead

No servers to manage, no Docker containers to monitor, no updates to apply. For teams without infrastructure expertise, this simplicity is genuine value — not just convenience.

Compliance and SLAs

Enterprise cloud APIs offer SLAs, compliance certifications, and dedicated support. For regulated industries or mission-critical pipelines, these guarantees have real value.

What Self-Hosting Gives You That Cloud Doesn't

No per-request costs

Once the server is running, every additional request costs approximately zero. There are no credits to track, no overage charges, no anxiety about hitting plan limits. This changes how you architect — you can scrape liberally without cost-per-page math slowing you down.

Data sovereignty

Scraped data never leaves your infrastructure. For teams handling sensitive content or operating under data residency requirements, self-hosting eliminates the compliance risk of sending URLs and receiving content through a third-party API.

Latency control

When CRW runs on the same network as your application, scraping latency is just network-to-target-site + parsing time. No additional round-trip to a cloud API. For real-time AI agent workflows, this difference matters.

Customization

Self-hosted CRW gives you full configuration control: API keys, rate limits, allowed domains, custom headers. You can tailor the scraping behavior to your exact use case without waiting for a cloud provider to add a feature.

The Middle Path: fastCRW

fastCRW is the managed cloud version of CRW. It offers a middle ground between full self-hosting and traditional cloud scraping APIs:

  • Same Firecrawl-compatible API as self-hosted CRW
  • Managed infrastructure with scaling handled for you
  • Lower cost per credit than most cloud APIs
  • Easy migration path: start with fastCRW, move to self-hosted when volume justifies it
  • Same code works on both — just change the URL

For teams that want cloud convenience with the option to self-host later, fastCRW provides a smooth on-ramp. Your integration code is identical whether you're hitting https://fastcrw.com/api or http://localhost:3000.

Decision Framework

Choose a cloud scraping API when:

  • You're scraping fewer than 10K pages/month and don't want to manage infrastructure
  • You need residential proxies and CAPTCHA solving bundled into one service
  • Your target sites are heavily protected and need sophisticated anti-bot bypass
  • Your team has no infrastructure/DevOps experience
  • You need enterprise SLAs and compliance certifications

Choose self-hosted CRW when:

  • You're scraping more than 10K pages/month and want predictable costs
  • You want unlimited scraping without per-request pricing
  • You need data sovereignty — scraped data stays on your infrastructure
  • You're running on constrained infrastructure (small VPS, edge deployments)
  • You need low-latency scraping for real-time AI agent workflows
  • Your target sites don't require advanced anti-bot bypass

Choose fastCRW when:

  • You want cloud convenience with CRW's performance and pricing
  • You want to start quickly and move to self-hosting later
  • You want a managed service without operating servers
  • You want the same API regardless of deployment model

The Bottom Line

The cost difference between self-hosted CRW and cloud scraping APIs is real and grows with volume. At 50K pages/month, self-hosting saves $80–480/month. At 1M pages/month, the savings are $950–1,950/month. Over a year, that's $11K–23K in infrastructure savings alone.

The reason CRW changes the self-hosting math is its resource footprint. Traditional self-hosted scrapers (Firecrawl, Selenium-based) need large servers because browsers are resource-heavy. CRW doesn't run a browser for most pages — its streaming parser handles HTML directly, keeping resource usage minimal and server costs low.

Self-hosting isn't free — there's engineering time and operational overhead. But for teams that already run Docker containers in production, the marginal effort of adding CRW is small, and the cost savings are significant.

Try CRW

Open-Source Path — Self-Host for Free

CRW is AGPL-3.0 licensed. Run it on your own infrastructure at zero software cost:

docker run -p 3000:3000 ghcr.io/us/crw:latest

View the source on GitHub · Read the docs

Hosted Path — Use fastCRW

Don't want to manage servers? fastCRW is the managed cloud version — same Firecrawl-compatible API, same low-latency engine, with infrastructure and scaling handled for you. Start with 500 free credits, no credit card required.

Further Reading

Get Started

Try CRW Free

Self-host for free (AGPL) or use fastCRW cloud with 500 free credits — no credit card required.