Python Web Scraping API — fastCRW [Firecrawl-Compatible]
Scrape, crawl, and search the web from Python with fastCRW — a Firecrawl-compatible REST API backed by a single Rust binary. Async httpx, asyncio.TaskGroup, and the crw Python SDK. AGPL-3.0, self-host free.
Call fastCRW from Python with plain httpx or the crw SDK — the same Firecrawl-compatible REST surface, a static Rust binary under the hood, and clean Markdown out the other side. asyncio.TaskGroup + a Semaphore keeps batch jobs bounded and fast.
Verdict
fastCRW is a Firecrawl-compatible web scraping API — POST /v1/scrape, get back clean Markdown. For Python teams that means two paths: the crw PyPI SDK, which runs a self-contained local engine with no separate server, or an httpx client pointed at https://api.fastcrw.com (or your self-hosted instance). Either path gives you the same Firecrawl-shaped REST surface, a static Rust binary under the hood, and results you can feed directly into text splitters, embeddings, or a React/Vue front-end without an HTML-to-text pass.
Who This Is For
- Python developers building scrapers — you want a clean Markdown API instead of parsing raw HTML.
- RAG / AI pipeline engineers — you need live web content turned into embeddable text with high fidelity.
- Teams migrating off Firecrawl — your existing
scrape()/crawl()calls work unchanged with anapi_urloverride. - Self-hosting shops — you want the whole ingestion path on your own infrastructure under AGPL-3.0 at $0 per 1,000 scrapes.
Setup
1. Install
pip install crw # PyPI — includes the local engine
# or, with uv:
uv add crw
For the REST-only path (managed cloud or self-hosted Docker):
pip install httpx # or: uv add httpx
2. Get an API key
Sign up at fastcrw.com, copy the API key from the dashboard, and export it:
export FASTCRW_API_KEY="fcrw_..."
The our pricing ships 500 one-time lifetime credits — enough to validate a pipeline. Plain scrape is 1 credit; crawl is 1 credit per page; search is 1 credit per query.
Quickstart: Scrape a Page
Using the crw SDK (local engine)
from crw import CrwClient
client = CrwClient() # starts the Rust engine in-process
result = client.scrape("https://example.com", formats=["markdown"])
print(result["data"]["markdown"])
Using httpx against the managed cloud
import os
import httpx
API_KEY = os.environ["FASTCRW_API_KEY"]
BASE = "https://api.fastcrw.com"
HEADERS = {"Authorization": f"Bearer {API_KEY}"}
def scrape(url: str) -> str:
r = httpx.post(
f"{BASE}/v1/scrape",
headers=HEADERS,
json={"url": url, "formats": ["markdown"], "onlyMainContent": True},
timeout=30,
)
r.raise_for_status()
return r.json()["data"]["markdown"]
markdown = scrape("https://docs.fastcrw.com")
print(markdown[:500])
Crawl a Whole Site
/v1/crawl starts an async breadth-first crawl and returns a job ID. Poll until complete:
import os
import time
import httpx
API_KEY = os.environ["FASTCRW_API_KEY"]
BASE = "https://api.fastcrw.com"
HEADERS = {"Authorization": f"Bearer {API_KEY}"}
def crawl(seed_url: str, limit: int = 50, max_depth: int = 3) -> list[dict]:
"""Crawl a site and return a list of page dicts with markdown and metadata."""
# Start the async crawl job
r = httpx.post(
f"{BASE}/v1/crawl",
headers=HEADERS,
json={
"url": seed_url,
"limit": limit, # cap: 1000
"maxDepth": max_depth, # cap: 10
"scrapeOptions": {"formats": ["markdown"], "onlyMainContent": True},
},
timeout=30,
)
r.raise_for_status()
job_id = r.json()["id"]
# Poll until complete
while True:
poll = httpx.get(f"{BASE}/v1/crawl/{job_id}", headers=HEADERS, timeout=30)
poll.raise_for_status()
data = poll.json()
if data["status"] == "completed":
return data["data"]
time.sleep(2)
pages = crawl("https://docs.fastcrw.com", limit=25)
for page in pages:
url = page.get("metadata", {}).get("sourceURL")
words = len((page.get("markdown") or "").split())
print(f"{words:>6} words {url}")
Web Search
import os
import httpx
API_KEY = os.environ["FASTCRW_API_KEY"]
BASE = "https://api.fastcrw.com"
HEADERS = {"Authorization": f"Bearer {API_KEY}"}
def search(query: str, limit: int = 5) -> list[dict]:
r = httpx.post(
f"{BASE}/v1/search",
headers=HEADERS,
json={"query": query, "limit": limit},
timeout=30,
)
r.raise_for_status()
return r.json()["data"]
for result in search("python web scraping api 2026"):
print(result["title"], "→", result["url"])
Async Batch Scraping with TaskGroup and Semaphore
asyncio.TaskGroup (stable since 3.11, the recommended pattern in Python 3.13) combined with an asyncio.Semaphore gives you structured, bounded concurrency. Without the semaphore, unbounded fan-out exhausts file descriptors and trips rate limits:
import asyncio
import os
import httpx
API_KEY = os.environ["FASTCRW_API_KEY"]
BASE = "https://api.fastcrw.com"
HEADERS = {"Authorization": f"Bearer {API_KEY}"}
# p90 latency on the 3-way benchmark was 14157 ms — set a generous timeout
# so chrome-stealth recoveries are not killed prematurely.
REQUEST_TIMEOUT = 25.0
MAX_CONCURRENCY = 8 # tune from Little's Law: concurrency ≈ rps × p90s
async def scrape_one(
client: httpx.AsyncClient,
sem: asyncio.Semaphore,
url: str,
) -> dict:
async with sem:
r = await client.post(
f"{BASE}/v1/scrape",
headers=HEADERS,
json={"url": url, "formats": ["markdown"], "onlyMainContent": True},
timeout=REQUEST_TIMEOUT,
)
r.raise_for_status()
data = r.json()["data"]
return {
"url": url,
"chars": len(data.get("markdown") or ""),
"markdown": data.get("markdown", ""),
}
async def batch_scrape(urls: list[str]) -> list[dict]:
sem = asyncio.Semaphore(MAX_CONCURRENCY)
results: list[dict] = []
async with httpx.AsyncClient() as client:
async with asyncio.TaskGroup() as tg:
tasks = [tg.create_task(scrape_one(client, sem, url)) for url in urls]
# TaskGroup awaits all tasks; exceptions are raised as ExceptionGroup
return [t.result() for t in tasks]
urls = [
"https://docs.fastcrw.com",
"https://fastcrw.com/pricing",
"https://fastcrw.com/integrations/langchain",
]
results = asyncio.run(batch_scrape(urls))
for r in results:
print(f"{r['chars']:>8} chars {r['url']}")
Latency note: On Firecrawl's public 1,000-URL scrape-content-dataset-v1 (
diagnose_3way.py, 2026-05-08), fastCRW's p50 was 1914 ms and p90 was 14157 ms — the highest truth-recall of three (63.74% of 819 labeled URLs), but also the widest tail. SetREQUEST_TIMEOUTabove the p90; the slow tail is the chrome-stealth fallback recovering pages the others miss. The full p50/p90/p99 breakdown is on /benchmarks/firecrawl-dataset.
Structured JSON Extraction
Pass formats: ["json"] with a JSON Schema to extract typed records instead of prose:
import os
import httpx
API_KEY = os.environ["FASTCRW_API_KEY"]
BASE = "https://api.fastcrw.com"
HEADERS = {"Authorization": f"Bearer {API_KEY}"}
schema = {
"type": "object",
"properties": {
"productName": {"type": "string"},
"priceUsd": {"type": "number", "description": "Current price in USD"},
"inStock": {"type": "boolean"},
},
"required": ["productName", "priceUsd"],
}
r = httpx.post(
f"{BASE}/v1/scrape",
headers=HEADERS,
json={
"url": "https://example.com/products/widget",
"formats": ["json"],
"jsonSchema": schema,
},
timeout=60,
)
r.raise_for_status()
product = r.json()["data"]["json"]
print(product)
Cost:
formats: ["json"]is a 5-credit operation vs 1 credit for markdown. LLM extraction supports OpenAI and Anthropic providers only. There is no batch/v1/extractendpoint — iterate/v1/scrapeconcurrently or use/v1/crawl.
MCP Setup
fastCRW ships an Model Context Protocol server (crw-mcp on npm) for AI agents that need live web data. It exposes scrape, crawl, map, and search as MCP tools — no separate HTTP client code needed:
{
"mcpServers": {
"fastcrw": {
"command": "npx",
"args": ["-y", "crw-mcp@latest"],
"env": {
"FASTCRW_API_KEY": "fcrw_...",
"FASTCRW_API_URL": "https://api.fastcrw.com"
}
}
}
}
See /integrations/mcp for full configuration options.
Limits and Honest Gaps
- No screenshot output —
formats: ["screenshot"]returns HTTP 422. - Stateless per request — no session is carried across calls; multi-step authenticated flows must be reconstructed in your Python code.
- LLM extraction — supports OpenAI and Anthropic only.
- No
/v1/batch/scrape— iterate/v1/scrapeconcurrently or use/v1/crawl.
Related
Continue exploring
More from Integrations
Langflow Web Scraping Integration — fastCRW [Firecrawl-Compatible]
Go Web Scraping API — fastCRW [Firecrawl-Compatible]
TypeScript Web Scraping API — fastCRW [Firecrawl-Compatible]
Type-safe web scraping with TypeScript and fastCRW — a Firecrawl-compatible REST API. Use Zod to derive types from JSON schemas, validate extraction output at the boundary, and catch schema drift at the hour it breaks. AGPL-3.0, self-host free.
Make Web Scraping Integration — fastCRW [Firecrawl-Compatible]
Add fastCRW to Make scenarios with the HTTP module. Firecrawl-compatible scrape and search, small single static binary, local-first, self-host free under AGPL-3.0.
Flowise Web Scraping Integration — fastCRW [Firecrawl-Compatible]
Add fastCRW to Flowise workflows with an HTTP node or custom tool definition. No-code web scraping for LangChain flows, RAG pipelines, and AI agents. Small single static binary, local-first, self-host free under AGPL-3.0.
Related hubs
