By the fastCRW team · Last reviewed 2026-05-18
Disclosure: fastCRW is a Firecrawl-compatible engine built by the author. This tutorial works against any Firecrawl-compatible backend; we use fastCRW in the examples.
The one fact this whole post turns on
The official Firecrawl SDKs take a configurable base URL — api_url in Python, apiUrl in Node. They don't hard-require https://api.firecrawl.dev. That single parameter is what makes "switch scraping backends" a config change instead of a rewrite. This post is the careful, production-grade way to use it.
Python: the minimal swap
from firecrawl import FirecrawlApp
app = FirecrawlApp(
api_key="your-key",
api_url="https://your-fastcrw-host", # the whole migration
)
doc = app.scrape_url("https://example.com", params={"formats": ["markdown"]})
print(doc["data"]["markdown"])
Same class, same method, same return shape. Point it at a self-hosted single-binary instance (http://localhost:3000) or a managed Firecrawl-compatible cloud — the calling code does not change.
Node / TypeScript
import FirecrawlApp from "@mendable/firecrawl-js";
const app = new FirecrawlApp({
apiKey: process.env.SCRAPE_API_KEY!,
apiUrl: process.env.SCRAPE_API_URL ?? "https://api.firecrawl.dev",
});
const doc = await app.scrapeUrl("https://example.com", { formats: ["markdown"] });
Note the pattern: read both the key and the URL from environment. Never hardcode the base URL in application code — that's the difference between "swap is an env change" and "swap is a deploy."
Framework loaders count too
You don't have to use the raw SDK for this to work. The common framework integrations also accept a base URL:
# LangChain
from langchain_community.document_loaders import FirecrawlLoader
loader = FirecrawlLoader(
url="https://example.com",
api_key="your-key",
api_url="https://your-fastcrw-host", # same idea
mode="scrape",
)
docs = loader.load()
LlamaIndex's Firecrawl reader follows the same convention. If your see the use case ingests via a framework loader, the backend swap is still one parameter — you do not have to drop down to the SDK.
The config pattern that makes this safe
Treat the scraping backend as injected configuration with three variables:
# .env
SCRAPE_API_URL=https://api.firecrawl.dev
SCRAPE_API_KEY=fc-...
SCRAPE_BACKEND=firecrawl # label for logs/metrics only
Then a single factory builds the client:
def make_client():
return FirecrawlApp(
api_key=os.environ["SCRAPE_API_KEY"],
api_url=os.environ["SCRAPE_API_URL"],
)
Now switching backends — to a self-hosted binary, to a managed cloud, back to Firecrawl — is an env change plus a restart. Rollback is the same. Nothing in your business logic knows or cares which backend answered.
A parity test harness you should actually run
Before trusting a swapped backend in production, run real URLs through both and diff. This harness is ~30 lines and pays for itself:
import json
from firecrawl import FirecrawlApp
A = FirecrawlApp(api_key="fc-...", api_url="https://api.firecrawl.dev")
B = FirecrawlApp(api_key="key", api_url="https://your-fastcrw-host")
URLS = [ ... ] # 50-100 of YOUR representative URLs, not example.com
def norm(md: str) -> str:
return " ".join(md.split()) # ignore whitespace-only diffs
mismatches = []
for u in URLS:
a = A.scrape_url(u, params={"formats": ["markdown"]})["data"]["markdown"]
b = B.scrape_url(u, params={"formats": ["markdown"]})["data"]["markdown"]
# heuristic: large length divergence => investigate
ra = len(norm(a)); rb = len(norm(b))
if ra and abs(ra - rb) / ra > 0.25:
mismatches.append((u, ra, rb))
print(json.dumps(mismatches, indent=2))
Whitespace and minor structural differences between engines are expected and fine. A >25% content-length divergence is your signal to eyeball that page. Most of your URL set should pass clean if the overlap surface is genuinely compatible.
What to validate beyond scrape
- Crawl: submit the same crawl on both, compare discovered URL count and per-page document shape.
- Map: compare discovered link-set overlap (cheap, fast, a great first check).
- Structured JSON: if you extract, run your schema on 20+ pages and compare field fill-rate. On a Firecrawl-compatible engine like fastCRW, extraction is a scrape with the JSON format — no separate endpoint or subscription.
- Errors: request a 404 and a blocked page; confirm your error classification (by HTTP status) still holds.
Gotchas to expect
- Trailing slashes / paths: set
api_urlto the host root the SDK expects; the SDK appends/v1/...itself. Don't include the version path twice. - Auth header: the SDK sends
Authorization: Bearer <key>. A self-hosted instance must be configured with a matching key (or it will 401, by design). - Timeouts on crawl: long crawls need a generous client/proxy read timeout; set it explicitly.
- Error-envelope shape: classify failures by HTTP status first; don't assert on vendor-specific error JSON.
Multi-environment configuration done properly
A mature setup does not have one backend — it has different backends per environment, and the same code serving all of them. A sensible matrix:
- Local development: point at a self-hosted single-binary engine on
localhost(or run it in the dev docker-compose). No key, no card, no quota anxiety, and scraped data never leaves the laptop. The free local mode exists precisely for this. - CI: same local engine spun up as a service container in the pipeline. Tests that exercise scraping run hermetically and for free, instead of burning shared cloud credits or flaking on a rate limit during a release.
- Staging: point at whichever backend production will use, so staging actually validates the production path including its error envelope and latency profile.
- Production: the managed cloud for elastic capacity, or a self-hosted cluster for cost/privacy — chosen by the same environment variable, never by a code branch.
The anti-pattern to avoid is environment-specific code paths (if env == "prod" picking a different client). That reintroduces exactly the coupling the configurable base URL was supposed to remove. One client factory, one set of env vars, four environments — that is the whole design, and it falls out naturally once the base URL is configuration rather than a constant.
Testing the swap without hitting the network
Because the client is now injected configuration behind a thin adapter, your application tests should not call any real backend at all. Mock the adapter, not the SDK internals:
class FakeScraper:
def scrape(self, url, opts):
return {"data": {"markdown": "# Fixture\n\nstub body",
"metadata": {"sourceURL": url, "statusCode": 200}}}
# inject FakeScraper in unit tests; inject the real adapter in
# a small, separately-gated integration suite that runs against
# a local single-binary engine (free, hermetic, no quota)
This gives you a clean test pyramid: fast unit tests with no network, plus a thin integration layer that runs against a free local Firecrawl-compatible engine in CI. You get real end-to-end coverage of the scrape path without spending credits or depending on an external service's uptime during your build — a direct, practical dividend of the local-first, self-hostable, API-compatible design.
Why this is the most leverage you'll get from one parameter
For a hosted-only product, the entrenched SDK code is the retention moat. The configurable api_url dissolves it: your scraping backend becomes a runtime choice, not an architectural commitment. With an open-core Firecrawl-compatible engine (fastCRW: single ~6MB AGPL-3.0 Rust binary, self-host unlimited, or managed cloud) that choice includes "run it myself for free" and "let someone run the proxies" — the same SDK, the same code, decided by an environment variable. Architect for the swap now and you never have to do a migration project later; you just change a value.
A worked example: the same RAG ingestion job on two backends
Concrete beats abstract. Here is a small but realistic ingestion job — discover a docs site, scrape every page to markdown, and write it out — expressed once, run against either backend by a single environment variable.
import os
from firecrawl import FirecrawlApp
client = FirecrawlApp(
api_key=os.environ["SCRAPE_API_KEY"],
api_url=os.environ["SCRAPE_API_URL"],
)
site = "https://docs.example.com"
links = client.map_url(site)["links"]
doc_urls = [u for u in links if "/docs/" in u]
corpus = []
for url in doc_urls:
try:
res = client.scrape_url(url, params={"formats": ["markdown"]})
md = res["data"]["markdown"]
if md and len(md.split()) > 30: # skip empty/near-empty pages
corpus.append({"url": url, "markdown": md})
except Exception as exc: # classify by status upstream
print(f"skip {url}: {exc}")
print(f"ingested {len(corpus)} / {len(doc_urls)} pages")
Nothing in this job names a vendor. SCRAPE_API_URL=https://api.firecrawl.dev runs it on Firecrawl; SCRAPE_API_URL=http://localhost:3000 runs the identical code against a self-hosted single-binary engine you control. The map call, the scrape call, the response shape, and the error handling are all on the Firecrawl-compatible overlap surface, so the job is genuinely backend-neutral. This is the practical payoff of the discipline: your ingestion code outlives any single vendor relationship.
Operationalizing the swap in a real deployment
Beyond the code, treat the backend as a first-class operational concern:
- Tag your telemetry with the backend label. Emit
SCRAPE_BACKENDas a dimension on latency, error-rate, and cost metrics so a swap produces a clean before/after in your dashboards rather than an unexplained step change. - Stage the swap. Flip the env var in staging first, run the parity harness on production-representative URLs, and only then promote the change to production config. Because it is config, promotion is a value change, not a code deploy.
- Keep both credentials valid during the cutover window. For the first week, both keys should work so rollback is instantaneous. Decommission the old credential only after a clean week.
- Document the divergence list in-repo. Keep a short markdown note next to the adapter listing the known field/error-envelope differences you validated. Future maintainers should not have to rediscover them.
None of this is exotic — it is the same hygiene you would apply to swapping any infrastructure dependency. The point of API compatibility is that it lets you treat the scraping backend like swappable infrastructure rather than a load-bearing architectural decision baked into application code.
Sources
- Firecrawl SDK docs: docs.firecrawl.dev
- fastCRW repo: github.com/us/crw
Related: Migrate from Firecrawl · Firecrawl API compatibility