What does AGPL-3.0 mean for a web scraper?

AGPL-3.0 is a copyleft license with a network clause (section 13): if you modify the scraper and let users interact with that modified version over a network, you must offer those users the modified source. Running it unmodified, or calling an unmodified AGPL engine over its API from your own app, triggers no obligation. The trade is that improvements to a publicly-offered service flow back to the commons, and the engine can never be quietly closed-sourced.

Which open-source web scrapers are AGPL-licensed?

fastCRW ships its engine under AGPL-3.0 as a single static Rust binary (github.com/us/crw), and Firecrawl's core is also AGPL-3.0 but cloud-first and heavy to self-run (~2-3 GB across 5 containers). Most other well-known open scrapers are permissive rather than copyleft: Scrapy is BSD, Crawl4AI is Apache-2.0, and Playwright/Puppeteer are Apache-2.0 browser-automation tools you build a scraper around.

Do I have to open-source my code if I self-host an AGPL scraper internally?

No. AGPL's obligation only fires when you modify the engine AND offer that modified version to users over a network. Running the unmodified engine inside your own VPC, called by your own pipeline with no external users hitting it directly, sits entirely outside the network clause. Code that merely calls the engine over its REST API is not a derivative work, so your application, prompts, and data stay private.

Is there a commercial license if AGPL doesn't fit my company?

Yes. The open-core model leaves an escape hatch: the managed cloud at fastcrw.com is the same engine offered under commercial terms, so AGPL-averse buyers can consume it as a hosted service without copyleft touching their stack. The license is a choice between self-hosting the AGPL binary or using the commercially-licensed managed service, both on the same Firecrawl-compatible API.

Open-Source AGPL-3.0 Web Scrapers: Options

By the fastCRW team · Footprint and pricing facts verified 2026-05-18 · fastCRW launch pricing reverts to regular price 2026-06-01 · Verify license terms independently before adopting.

Disclosure: We build fastCRW, which is AGPL-3.0. This is a vendor-authored roundup, so weight it accordingly — but the license analysis below is the same regardless of which engine you pick, and we point out where other tools genuinely fit better.

Open-source web scrapers under AGPL-3.0: why the license matters

If you are searching specifically for an open-source web scraper under AGPL-3.0, you are usually doing one of two things: confirming that a tool you like is actually copyleft (not "source-available" marketing), or checking whether AGPL's network clause creates an obligation your company can live with. Both questions matter more for scrapers than for most software, because a scraper is exactly the kind of thing teams expose over a network — internal APIs, RAG ingestion services, agent tool endpoints. AGPL was written for that case.

This guide covers what AGPL-3.0 triggers, how it compares to permissive licenses like MIT and Apache-2.0, which mature scraping engines actually ship under it, and what you really get when you self-host one. fastCRW is one of those engines — a single static Rust binary at github.com/us/crw — and we will be explicit about where it fits and where it does not.

Copyleft over the network: what triggers the obligation

The GPL family says: if you distribute modified software, you must offer the modified source under the same license. The gap GPL-2/GPL-3 left open is the "SaaS loophole" — if you run modified software on a server and only expose it over a network, you never distribute a binary, so the source-sharing obligation never fires. AGPL-3.0 closes that gap with section 13: if you modify the program and let users interact with it remotely over a network, you must offer those users the corresponding source of your modified version.

The two words that do all the work are modify and over a network. Run the engine unmodified, internally, behind your own pipeline, and AGPL asks essentially nothing of you. Patch the engine and expose that patched build to outside users, and you owe them your changes.

AGPL vs permissive (MIT/Apache) for self-hosters

MIT and Apache-2.0 are permissive: take the code, modify it, ship it closed, keep your changes private. Apache-2.0 adds an explicit patent grant that MIT lacks, which is why risk-averse companies often prefer it. AGPL is the opposite trade — it guarantees that improvements to a publicly-offered service flow back to the commons. For a self-hoster the practical difference is narrow: if you are not redistributing a modified network service, AGPL and MIT feel identical in day-to-day use. The difference only appears at the edges, which is exactly where procurement teams get nervous.

When AGPL is an asset vs a procurement objection

AGPL is an asset when you are the buyer: it is a hard guarantee that the engine can never be quietly closed-sourced out from under you, and that any hosted version is built on code you can also run yourself. AGPL is an objection when your legal team has a blanket "no copyleft" policy, or when you intend to build a closed commercial SaaS directly on top of the engine's code. Both reactions are legitimate; the rest of this post is about telling them apart for a scraping workload specifically. (For a deeper SaaS-focused treatment, see AGPL-3.0 for SaaS explained.)

AGPL-3.0 and open-core web scrapers worth knowing

"Open source" gets stretched a lot in the scraping market. Plenty of tools are cloud-only with a thin open SDK, or "source-available" under a non-OSI license that forbids competing services. Below are the engines that genuinely ship a usable scraping core under AGPL-3.0.

fastCRW: AGPL-3.0 single Rust binary, Firecrawl-compatible

fastCRW's engine is a single static Rust binary — no Redis, no Node.js, no container orchestration required — released under AGPL-3.0 at github.com/us/crw. It speaks a Firecrawl-compatible REST API (/v1/scrape, /v1/crawl, /v1/map, /v1/search), so adopting it is a base-URL swap rather than a rewrite, and leaving it later is the same swap in reverse. On Firecrawl's own public dataset it posted the highest truth-recall of the three tools tested — 63.74% of 819 labeled URLs, with 91.8% scrape-success of reachable URLs and 0 thrown errors (diagnose_3way.py, 2026-05-08). In fast mode, fastCRW's p90 is 4348 ms — the lowest of the three.

Firecrawl core (AGPL but cloud-first, heavy to self-run)

Firecrawl is the category reference and its core is also AGPL-3.0 — so on the license badge alone it qualifies for this list. The practical catch is operational weight: self-hosting Firecrawl is a multi-service stack (API, workers, queue, datastore, browser runtime) that runs to roughly 2-3 GB across 5 containers, and several capabilities are cloud-only. It is genuinely open, but it is open-source you operate like a platform, not open-source you drop on a box. See the Firecrawl open-source alternative breakdown for the full self-host comparison.

Other open scraping engines and their licenses

Most of the well-known open scrapers are permissive, not copyleft, so they will not show up in an AGPL search — but they are worth knowing if the license is what you care about:

Scrapy — BSD-licensed Python framework. Permissive, mature, but a library you assemble, not a turnkey API.
Crawl4AI — Apache-2.0. LLM-friendly extraction; in the same 3-way run it reached 59.95% truth-recall of 819 labeled URLs (diagnose_3way.py, 2026-05-08).
Playwright / Puppeteer — Apache-2.0 / Apache-2.0 browser automation, not scrapers per se; you build the scraper around them.

If you specifically need copyleft, the field narrows fast — which is part of why the AGPL search exists. For a broader, license-agnostic survey, see best open-source web crawlers and the open-source web scraping guide.

What you actually get when you self-host AGPL

Unlimited self-host requests, no license fee

Self-hosting an AGPL-3.0 engine is free — you pay only for the server it runs on. There is no per-request meter, no credit ceiling, and no tier you can outgrow. For fastCRW that works out to $0 per 1,000 scrapes self-hosted (you cover the VPS), versus Firecrawl-hosted's $0.83-5.33 per 1,000 scrapes across its tiers (competitor-prices.lock.md, verified 2026-05-18). The managed cloud exists for teams that would rather not run infra; the AGPL core means you are never forced into it.

Footprint matters: one binary vs a multi-container stack

License freedom is theoretical if the thing is miserable to operate. The headline structural fact here — labeled structural, not a benchmark — is footprint: fastCRW is a single ~8 MB binary in 1 container (plus an optional sidecar), against Firecrawl's ~2-3 GB across 5 containers. The default Docker Compose ships the lightweight lightpanda renderer; chrome is opt-in (a heavier ~500 MB image, ~1 GB resident) for JS-heavy targets. For self-hosters that gap is the difference between a platform-team project and one docker run.

Forking, patching, and running it forever

The real durability guarantee of AGPL is that the source is yours to keep. If the project's direction ever diverges from your needs, you fork it. If the company behind it gets acquired or changes its hosted pricing, your self-hosted build keeps running unchanged. Copyleft makes that permanent: the code can never be relicensed out of your reach, and any future hosted version is legally bound to the same open core.

AGPL obligations in practice

If you modify and offer it over a network, share the source

The one rule to internalize: modify the engine + expose your modified build to network users = offer those users your modified source. That is the whole obligation. It does not require you to open-source your application, your prompts, your data, or any code that merely calls the engine over its REST API. Calling an unmodified AGPL service from your closed app is not a derivative work and triggers nothing.

Why most internal pipelines are unaffected

The most common scraping deployment — run the unmodified engine inside your own VPC, called by your own pipeline, with no external users hitting the engine directly — sits entirely outside AGPL's network clause. You did not modify it and you are not offering it as a remote service to third parties. This is why the AGPL panic is usually misplaced for data and RAG teams: their architecture never crosses the trigger.

Commercial-license escape hatch for AGPL-averse buyers

If your legal team simply will not allow AGPL in the codebase — even via API — the open-core model leaves an exit: a commercial license. The managed cloud at fastcrw.com is the same engine offered under commercial terms, so AGPL-averse buyers can consume it as a hosted service without the copyleft obligation touching their stack. The point of open-core is that the license is a choice, not a trap.

Choosing an AGPL scraper for your stack

Self-host-only vs open-core + managed cloud

Some AGPL projects are self-host-only — great if you always want to run infra, limiting if you sometimes do not. Open-core engines (fastCRW, Firecrawl) give you both ends: run the AGPL binary yourself for sensitive or high-volume work, or hit the managed cloud for zero-ops scale, on the same API. For most teams the right answer is "self-host the engine for what has to stay private, use the cloud for burst" — and because it is one API, that is a routing decision, not two integrations.

API compatibility so you are never locked in

The license protects the code; API compatibility protects the integration. fastCRW's Firecrawl-compatible REST surface means your client code does not care which backend answers — self-hosted binary, fastCRW cloud, or Firecrawl cloud — so switching is a config value, not a migration. The honest caveat: response field names and error envelopes diverge slightly from Firecrawl, so validate the short known list before cutover (see Firecrawl API compatibility).

Where the heavier alternatives genuinely win

An AGPL single binary is not the answer for everything, and pretending otherwise would not help you. fastCRW has no built-in Fire-engine anti-bot and no managed residential proxy pool, so against hardened, hostile targets a proxy-first vendor like Bright Data or Oxylabs is the honest recommendation. It is stateless per request (you orchestrate scheduling and audit logging yourself), it has no screenshot output (a formats: ["screenshot"] request returns HTTP 422), and LLM extraction supports OpenAI and Anthropic providers only. If those gaps sit on your critical path, the right tool is a heavier one.

Sources

fastCRW canonical facts: github.com/us/crw (AGPL-3.0, single static Rust binary, structural footprint) · benchmark of record diagnose_3way.py, 819 labeled URLs, 2026-05-08
Cost comparison: competitor-prices.lock.md (Firecrawl $0.83-5.33 per 1,000 scrapes, verified 2026-05-18)
GNU Affero GPL v3.0: gnu.org/licenses/agpl-3.0.html (section 13, network use)
credit-based pricing: /pricing · benchmark detail: /benchmarks