The Question Nobody Actually Calculates
Every team building a scraping pipeline asks the same question: should we self-host or use a cloud API? Most answer based on gut feeling — "cloud is easier" or "self-hosting is cheaper." Very few actually run the numbers. And the numbers are where the surprises live.
This post breaks down the real total cost of ownership (TCO) for web scraping at three scales: hobby/startup (1K–10K pages/month), growth (50K–200K pages/month), and scale (500K+ pages/month). We'll compare cloud scraping APIs (Firecrawl, ScrapingBee, Apify) against self-hosted options, and show where CRW's lightweight architecture changes the economics.
What Goes Into TCO
The sticker price of a cloud API or VPS is never the full cost. Real TCO includes:
Cloud API costs
- Per-credit pricing: Most cloud scraping APIs charge per "credit" or "request." A credit typically corresponds to one page scrape, but JavaScript rendering, screenshots, or proxy usage can multiply the credit cost per request.
- Overage charges: Exceeding your plan's credit limit often triggers higher per-credit rates.
- Feature-gated pricing: Premium features (residential proxies, CAPTCHA solving, structured extraction) are often on higher-tier plans.
Self-hosting costs
- Infrastructure: VPS or cloud VM costs (monthly recurring).
- Engineering time: Initial setup, ongoing maintenance, updates, monitoring. This is the cost most teams underestimate.
- Bandwidth: Egress fees on cloud providers, usually negligible for scraping but worth tracking.
- Proxy costs: If you need proxy rotation for anti-bot bypass, this is a separate line item regardless of self-hosted or cloud.
Scale 1: Hobby / Startup (1K–10K Pages/Month)
At this scale, you're building an MVP, running a side project, or scraping for personal research. Volume is low and predictable.
Cloud API cost
| Provider | Plan | Credits/month | Monthly cost | Cost per 10K pages |
|---|---|---|---|---|
| Firecrawl | Hobby | 500 | $19 | ~$380 |
| Firecrawl | Standard | 50,000 | $149 | ~$30 |
| ScrapingBee | Freelance | 150,000 | $49 | ~$3.30 |
| Apify | Starter | Variable (compute-based) | $49 | ~$10–40 |
| fastCRW | Free tier | 50 | $0 | Paid plans from $10 |
Note: Pricing accurate as of March 2026. Cloud APIs frequently change pricing — check current rates before making decisions.
Self-hosted CRW cost
| Component | Monthly cost |
|---|---|
| DigitalOcean Droplet (1 vCPU, 512 MB RAM) | $4 |
| CRW license | $0 (AGPL-3.0) |
| Setup time (1 hour, one-time) | ~$0 amortized |
| Total | $4/month |
At hobby scale, the math is simple: $4/month for unlimited self-hosted scraping vs $19–149/month for cloud credits. CRW's 6.6 MB idle RAM means even the smallest VPS tier is overkill — a $4 droplet handles this volume without breaking a sweat.
When cloud wins at this scale
If you need residential proxies for anti-bot bypass, the cloud APIs bundle this into their pricing. Self-hosting proxies at this scale is more expensive than the cloud API itself. If your target sites are heavily protected, a cloud API with built-in proxy rotation is likely cheaper total.
Scale 2: Growth (50K–200K Pages/Month)
This is a production scraping pipeline: feeding an AI agent, building a search index, monitoring competitor prices, or populating a content database. Volume is meaningful and growing.
Cloud API cost
| Provider | Plan | Credits/month | Monthly cost | Cost per 100K pages |
|---|---|---|---|---|
| Firecrawl | Standard | 50,000 | $149 | ~$298 |
| Firecrawl | Growth | 500,000 | $999 | ~$200 |
| ScrapingBee | Business | 3,000,000 | $249 | ~$8.30 |
| Apify | Scale | Variable | $499 | ~$50–100 |
| fastCRW | Growth | 100,000 | $49 | ~$49 |
Self-hosted CRW cost
| Component | Monthly cost |
|---|---|
| DigitalOcean Droplet (2 vCPU, 2 GB RAM) | $18 |
| CRW license | $0 (AGPL-3.0) |
| Maintenance (~2 hours/month) | ~$100 (at $50/hr eng time) |
| Total | $118/month |
At growth scale, the savings from self-hosting become substantial — especially compared to premium cloud APIs. A $18/month VPS running CRW handles 200K pages/month comfortably. The same volume on Firecrawl's cloud would cost $149–999/month depending on the plan.
The engineering time question
The $100/month for maintenance is the controversial number. Some teams spend zero hours maintaining CRW — it runs unattended. Others spend time on monitoring, updates, and debugging edge cases. Your actual cost depends on your team's operational maturity and how mission-critical the scraping pipeline is.
If you're already running Docker containers in production and have basic monitoring in place, CRW adds negligible operational overhead. If you don't have any infrastructure experience, the cloud API's simplicity has real value.
The hidden cloud cost: credit math
Cloud scraping APIs often charge extra credits for specific features. A "1 credit per scrape" baseline can become 5–10 credits per scrape when you add JavaScript rendering, proxy rotation, or structured extraction. At 100K pages/month with 5× credit multipliers, you're actually consuming 500K credits — which pushes you into a higher pricing tier.
Self-hosted CRW doesn't have credit multipliers. JavaScript rendering via LightPanda, structured extraction, and crawling are all included at zero marginal cost. The $18/month VPS cost is the same whether you scrape with or without JavaScript rendering.
Scale 3: Enterprise (500K+ Pages/Month)
At this scale, you're running scraping as a core business function: large-scale price monitoring, comprehensive search indexing, or AI training data collection.
Cloud API cost
| Provider | Plan | Monthly cost | Cost at 1M pages |
|---|---|---|---|
| Firecrawl | Growth (500K credits) | $999 | ~$1,998 |
| Firecrawl | Enterprise | Custom | Negotiated |
| ScrapingBee | Business+ | $499+ | ~$100–200 |
| fastCRW | Scale | $199 | ~$399 |
Self-hosted CRW cost
| Component | Monthly cost |
|---|---|
| Dedicated server (4 vCPU, 8 GB RAM) | $48 |
| CRW license | $0 (AGPL-3.0) |
| Maintenance (~4 hours/month) | ~$200 |
| Monitoring (Grafana/Prometheus, shared) | ~$10 |
| Total | ~$258/month |
At enterprise scale, self-hosting CRW saves thousands per month compared to cloud APIs. A $48/month dedicated server handles 1M+ pages/month because CRW's resource consumption doesn't scale linearly with volume — the streaming parser is inherently efficient.
Why CRW changes the self-hosting math
Traditional self-hosted scraping (Selenium, Playwright, Firecrawl self-hosted) requires beefy infrastructure because browsers consume significant resources. Self-hosting Firecrawl at 1M pages/month needs a server with 8+ GB RAM for the Node.js + Playwright stack — pushing VPS costs to $96+/month before you add Redis and worker processes.
CRW's Rust + lol-html architecture means the server cost stays low even at high volume. The 8 MB Docker image and 6.6 MB idle RAM mean you're paying for compute capacity you actually use, not browser overhead. See our post on memory economics for a detailed analysis of why this matters.
The Self-Hosting Tipping Point
Based on the numbers above, the tipping point where self-hosting CRW becomes clearly cheaper than cloud APIs is around 10K–20K pages/month. Below that, cloud APIs (especially free tiers) are hard to beat on total convenience. Above that, the self-hosting savings compound monthly.
| Monthly volume | Cloud API (typical) | Self-hosted CRW | Savings |
|---|---|---|---|
| 1K pages | $0–19 | $4 | Marginal |
| 10K pages | $30–150 | $4 | $26–146/mo |
| 50K pages | $99–499 | $18 | $81–481/mo |
| 200K pages | $299–999 | $18 | $281–981/mo |
| 1M pages | $999–2,000+ | $48 | $951–1,952/mo |
Self-hosted costs exclude engineering time. Add $50–200/month for maintenance depending on team and scale.
What Cloud APIs Give You That Self-Hosting Doesn't
The cost comparison isn't complete without acknowledging what you get from a cloud API beyond raw scraping:
Proxy infrastructure
Cloud scraping APIs bundle proxy rotation — datacenter and residential IPs — into their pricing. Self-hosting proxies is expensive: residential proxy services charge $5–15/GB, and maintaining your own proxy pool requires ongoing work. If your target sites need proxy rotation, factor this into the self-hosting cost.
Anti-bot bypass
Cloud APIs invest heavily in anti-bot technology: CAPTCHA solving, browser fingerprinting, IP rotation strategies. If you're scraping heavily protected sites (e-commerce, travel, social media), the cloud API's anti-bot capabilities may be worth the premium.
Zero operational overhead
No servers to manage, no Docker containers to monitor, no updates to apply. For teams without infrastructure expertise, this simplicity is genuine value — not just convenience.
Compliance and SLAs
Enterprise cloud APIs offer SLAs, compliance certifications, and dedicated support. For regulated industries or mission-critical pipelines, these guarantees have real value.
What Self-Hosting Gives You That Cloud Doesn't
No per-request costs
Once the server is running, every additional request costs approximately zero. There are no credits to track, no overage charges, no anxiety about hitting plan limits. This changes how you architect — you can scrape liberally without cost-per-page math slowing you down.
Data sovereignty
Scraped data never leaves your infrastructure. For teams handling sensitive content or operating under data residency requirements, self-hosting eliminates the compliance risk of sending URLs and receiving content through a third-party API.
Latency control
When CRW runs on the same network as your application, scraping latency is just network-to-target-site + parsing time. No additional round-trip to a cloud API. For real-time AI agent workflows, this difference matters.
Customization
Self-hosted CRW gives you full configuration control: API keys, rate limits, allowed domains, custom headers. You can tailor the scraping behavior to your exact use case without waiting for a cloud provider to add a feature.
The Middle Path: fastCRW
fastCRW is the managed cloud version of CRW. It offers a middle ground between full self-hosting and traditional cloud scraping APIs:
- Same Firecrawl-compatible API as self-hosted CRW
- Managed infrastructure with scaling handled for you
- Lower cost per credit than most cloud APIs
- Easy migration path: start with fastCRW, move to self-hosted when volume justifies it
- Same code works on both — just change the URL
For teams that want cloud convenience with the option to self-host later, fastCRW provides a smooth on-ramp. Your integration code is identical whether you're hitting https://fastcrw.com/api or http://localhost:3000.
Decision Framework
Choose a cloud scraping API when:
- You're scraping fewer than 10K pages/month and don't want to manage infrastructure
- You need residential proxies and CAPTCHA solving bundled into one service
- Your target sites are heavily protected and need sophisticated anti-bot bypass
- Your team has no infrastructure/DevOps experience
- You need enterprise SLAs and compliance certifications
Choose self-hosted CRW when:
- You're scraping more than 10K pages/month and want predictable costs
- You want unlimited scraping without per-request pricing
- You need data sovereignty — scraped data stays on your infrastructure
- You're running on constrained infrastructure (small VPS, edge deployments)
- You need low-latency scraping for real-time AI agent workflows
- Your target sites don't require advanced anti-bot bypass
Choose fastCRW when:
- You want cloud convenience with CRW's performance and pricing
- You want to start quickly and move to self-hosting later
- You want a managed service without operating servers
- You want the same API regardless of deployment model
The Bottom Line
The cost difference between self-hosted CRW and cloud scraping APIs is real and grows with volume. At 50K pages/month, self-hosting saves $80–480/month. At 1M pages/month, the savings are $950–1,950/month. Over a year, that's $11K–23K in infrastructure savings alone.
The reason CRW changes the self-hosting math is its resource footprint. Traditional self-hosted scrapers (Firecrawl, Selenium-based) need large servers because browsers are resource-heavy. CRW doesn't run a browser for most pages — its streaming parser handles HTML directly, keeping resource usage minimal and server costs low.
Self-hosting isn't free — there's engineering time and operational overhead. But for teams that already run Docker containers in production, the marginal effort of adding CRW is small, and the cost savings are significant.
Try CRW
Open-Source Path — Self-Host for Free
CRW is AGPL-3.0 licensed. Run it on your own infrastructure at zero software cost:
docker run -p 3000:3000 ghcr.io/us/crw:latest
View the source on GitHub · Read the docs
Hosted Path — Use fastCRW
Don't want to manage servers? fastCRW is the managed cloud version — same Firecrawl-compatible API, same low-latency engine, with infrastructure and scaling handled for you. Start with 500 free credits, no credit card required.