By the fastCRW team · Last reviewed 2026-05-18
Disclosure: fastCRW is the open-core engine described here, built by the author. The framework for evaluating "openness" below is meant to apply to any tool, including ours — judge us by it too.
"Open source" is doing a lot of work in that search query
People searching for an "open-source Firecrawl-compatible scraper" usually want one of four very different things, and conflating them leads to bad choices:
- A license they can audit and trust (security/compliance review).
- A way to run it themselves for free (cost ceiling).
- Data that never leaves their infrastructure (privacy/residency).
- Freedom from a single vendor's roadmap and pricing (lock-in).
A tool can technically have an open-source repo and still fail three of these. So the useful question isn't "is there a repo on GitHub?" — it's "which of my four needs does this actually satisfy in practice?"
Open-core vs open-washing
Two patterns dominate this category, and they are not the same:
- Open-washing: a repo exists, but self-hosting it is a heavyweight multi-service stack that's impractical to actually run, and/or the self-host build is missing the features that make the cloud product good. The openness is technically true and practically hollow — it exists to claim the keyword, not to be run.
- Open-core (done honestly): the actual engine that does the work is open-source and genuinely runnable by one developer, with the same capabilities the managed product is built on. The cloud is a convenience layer (proxies, scale, billing), not a different, better-secret product.
The litmus test: "Can one engineer run the real engine, with its real capabilities, on a small box, this afternoon, for free?" If the honest answer is "only a crippled subset, and only with a platform team," it's open-washing for your purposes even if the license is OSI-approved.
How Firecrawl's openness scores on the four needs
Firecrawl publishes an open-source repository (AGPL-3.0). Honestly scored against the four needs:
| Need | Firecrawl self-host |
|---|---|
| Auditable license | Yes — AGPL-3.0 core exists |
| Practically self-runnable for free | Heavy: multi-service stack (API + workers + queue + store + browser), multiple GB, ongoing maintenance |
| Data stays on your infra | Possible if you operate the full stack; not the default path |
| Free of vendor roadmap/pricing | Partial: self-host excludes some cloud-only features, so most users stay on metered cloud |
So Firecrawl is genuinely open by license but, for the "I want to actually run it myself, lightly, with full capability" interpretation, it's the heavy path. That's the gap an open-core alternative targets.
The single-binary open-core alternative
fastCRW is built specifically to satisfy all four needs with the same engine:
- License: AGPL-3.0 core. Auditable; commercial license available for orgs that need it.
- Practically self-runnable: a single statically-linked Rust binary — roughly a 6MB binary, ~8MB Docker image, tiny idle RAM.
docker run -p 3000:3000 ghcr.io/us/crw:lateston a $5 VPS, a CI runner, or a laptop offline. One process, not a stack. - Data stays local: in self-host mode scraped content and target URLs never touch a third party. Works air-gapped, no key required for local dev.
- No vendor capture: the same Firecrawl-compatible API (
/v1/scrape,/v1/crawl,/v1/map,/v1/search) runs on the self-hosted binary and the optional Managed Cloud — the cloud is convenience, not a hostage situation. Same engine both sides.
That last point is the structural one: because the open-core engine and the managed cloud are the same Firecrawl-compatible engine, you can move between "free, self-hosted, on my infra" and "managed, someone else runs the proxies" with a base-URL change. The escape hatch is permanent, not a downgrade.
Migration is a base-URL change, not a port
Critically for anyone leaving Firecrawl: the official Firecrawl SDK works with fastCRW by changing the api_url. So "switch to the open-source alternative" is not a rewrite — it's redirecting your existing client at a binary you now control:
from firecrawl import FirecrawlApp
app = FirecrawlApp(
api_key="local-or-cloud-key",
api_url="http://localhost:3000", # the open-core engine you run
)
doc = app.scrape_url("https://example.com", params={"formats": ["markdown"]})
Honest limitations of the open-core route
Openness has trade-offs; pretending otherwise is its own kind of open-washing:
- You own ops. Self-host means you run, monitor, and upgrade it — even if it's just one container, it's still yours.
- No giant managed proxy pool in self-host. For hostile anti-bot targets needing a huge residential-IP rotation, a proxy-giant or the managed cloud is the answer.
- AGPL-3.0 friction. AGPL scares some enterprises; the commercial license mitigates it but it's a real procurement conversation.
- Smaller ecosystem than the category reference. Firecrawl has more tutorials and community examples; an open-core challenger wins on substitution and economics, not on mindshare.
An evaluation checklist for "open-source Firecrawl alternative"
- Run the real engine in 10 minutes. If you can't get the actual scraping engine working on one box quickly and free, the openness is hollow for your needs.
- Check feature parity see pricing. If self-host is a crippled subset, you're not actually escaping the meter.
- Confirm the data path. In self-host mode, does anything phone home? It shouldn't.
- Verify the API is compatible. Can your existing Firecrawl client point at it with only a base-URL change? That determines migration cost.
- Read the license for your context. AGPL-3.0 obligations matter for some orgs; know them before adopting.
Bottom line
"Open-source Firecrawl alternative" only means something if the open part is the part that does the work and you can actually run it. Judge candidates by the four needs and the 10-minute test, not by the presence of a repo. An open-core single-binary engine that exposes the same Firecrawl-compatible API is the configuration that satisfies all four — auditable, self-runnable for free, data-local, and lock-in-proof — while keeping migration to a one-line change.
The four needs, scored side by side
To make the abstract concrete, here is how the two honest configurations score against the four reasons people search for an open-source alternative:
| Need | Firecrawl OSS (heavy stack) | Open-core single binary |
|---|---|---|
| Auditable license | Yes (AGPL-3.0) | Yes (AGPL-3.0) |
| Practically self-runnable, free, full capability | Partial — multi-service, multi-GB, some cloud-only features excluded | Yes — one container, $5 VPS, same engine as cloud |
| Data never leaves your infra | Only if you operate the whole stack | Yes — default in self-host mode, works air-gapped |
| Free of vendor roadmap/pricing | Partial — most users stay on metered cloud | Yes — escape hatch is one config line, permanent |
The license row is a tie; the other three are where "open source" either delivers on its promise for your needs or quietly does not. This table, not a GitHub star count, is the comparison that matters.
AGPL-3.0: what it actually obliges you to do
Because both honest options here are AGPL-3.0, it is worth being precise rather than fearful. AGPL-3.0's distinctive clause extends copyleft to network use: if you modify the engine and offer it to users over a network, you must make the modified source available to those users. The practical reality for most teams:
- Internal use, unmodified: running the stock engine for your own pipelines triggers no source-disclosure obligation. This is the overwhelmingly common case.
- Internal use, modified, not offered to third parties: still effectively unproblematic for most internal deployments.
- Offering a modified version as a public network service: the clause engages — you would publish your modifications. If that is incompatible with your product, a commercial license exists precisely to remove the obligation.
The point: AGPL friction is real for one specific pattern (reselling a modified version as a service) and largely a non-issue for the "we self-host it for our own use" pattern that motivates most open-source-alternative searches. Decide based on your actual deployment, not the license's reputation.
A migration sketch that proves the openness is real
The simplest way to verify an "open-source alternative" is not hollow is to exercise it end to end in minutes:
# 1. run the real engine, no signup, no card
docker run -p 3000:3000 ghcr.io/us/crw:latest
# 2. point existing code at it (the entire migration)
export SCRAPE_API_URL=http://localhost:3000
# 3. run your real scrape/crawl/map calls; diff vs the reference
# on 20 representative URLs. Note any divergence.
If that works on your real URLs with a short, boring divergence list, the openness is genuine: you ran the actual engine, with its real capabilities, on your own box, for free, and your existing client did not change. That is the difference between open-core and open-washing made empirical — and it is the only test whose result you should trust over any marketing claim, including this post's.
Sources
- fastCRW repo (AGPL-3.0 core): github.com/us/crw
- Firecrawl open-source repository / docs: docs.firecrawl.dev
Related: Self-hosting Firecrawl guide · Best open-source web crawlers
