Integrations/Integration / Python

Python Web Scraping API — fastCRW [Firecrawl-Compatible]

Scrape, crawl, and search the web from Python with fastCRW — a Firecrawl-compatible REST API backed by a single Rust binary. Async httpx, asyncio.TaskGroup, and the crw Python SDK. AGPL-3.0, self-host free.

Published

June 13, 2026

Updated

June 13, 2026

Verdict

fastCRW is a Firecrawl-compatible web scraping API — POST /v1/scrape, get back clean Markdown. For Python teams that means two paths: the crw PyPI SDK, which runs a self-contained local engine with no separate server, or an httpx client pointed at https://api.fastcrw.com (or your self-hosted instance). Either path gives you the same Firecrawl-shaped REST surface, a static Rust binary under the hood, and results you can feed directly into text splitters, embeddings, or a React/Vue front-end without an HTML-to-text pass.

Who This Is For

Python developers building scrapers — you want a clean Markdown API instead of parsing raw HTML.
RAG / AI pipeline engineers — you need live web content turned into embeddable text with high fidelity.
Teams migrating off Firecrawl — your existing scrape() / crawl() calls work unchanged with an api_url override.
Self-hosting shops — you want the whole ingestion path on your own infrastructure under AGPL-3.0 at $0 per 1,000 scrapes.

Setup

1. Install

pip install crw          # PyPI — includes the local engine
# or, with uv:
uv add crw

For the REST-only path (managed cloud or self-hosted Docker):

pip install httpx        # or: uv add httpx

2. Get an API key

export FASTCRW_API_KEY="fcrw_..."

The free tier ships 500 one-time lifetime credits — enough to validate a pipeline. Plain scrape is 1 credit; crawl is 1 credit per page; search is 1 credit per query.

Quickstart: Scrape a Page

Using the crw SDK (local engine)

from crw import CrwClient

client = CrwClient()  # starts the Rust engine in-process

result = client.scrape("https://example.com", formats=["markdown"])
print(result["data"]["markdown"])

Using httpx against the managed cloud

import os
import httpx

API_KEY = os.environ["FASTCRW_API_KEY"]
BASE = "https://api.fastcrw.com"
HEADERS = {"Authorization": f"Bearer {API_KEY}"}

def scrape(url: str) -> str:
    r = httpx.post(
        f"{BASE}/v1/scrape",
        headers=HEADERS,
        json={"url": url, "formats": ["markdown"], "onlyMainContent": True},
        timeout=30,
    )
    r.raise_for_status()
    return r.json()["data"]["markdown"]

markdown = scrape("https://docs.fastcrw.com")
print(markdown[:500])

Crawl a Whole Site

/v1/crawl starts an async breadth-first crawl and returns a job ID. Poll until complete:

import os
import time
import httpx

API_KEY = os.environ["FASTCRW_API_KEY"]
BASE = "https://api.fastcrw.com"
HEADERS = {"Authorization": f"Bearer {API_KEY}"}


def crawl(seed_url: str, limit: int = 50, max_depth: int = 3) -> list[dict]:
    """Crawl a site and return a list of page dicts with markdown and metadata."""
    # Start the async crawl job
    r = httpx.post(
        f"{BASE}/v1/crawl",
        headers=HEADERS,
        json={
            "url": seed_url,
            "limit": limit,          # cap: 1000
            "maxDepth": max_depth,   # cap: 10
            "scrapeOptions": {"formats": ["markdown"], "onlyMainContent": True},
        },
        timeout=30,
    )
    r.raise_for_status()
    job_id = r.json()["id"]

    # Poll until complete
    while True:
        poll = httpx.get(f"{BASE}/v1/crawl/{job_id}", headers=HEADERS, timeout=30)
        poll.raise_for_status()
        data = poll.json()
        if data["status"] == "completed":
            return data["data"]
        time.sleep(2)


pages = crawl("https://docs.fastcrw.com", limit=25)
for page in pages:
    url = page.get("metadata", {}).get("sourceURL")
    words = len((page.get("markdown") or "").split())
    print(f"{words:>6} words  {url}")

Web Search

import os
import httpx

API_KEY = os.environ["FASTCRW_API_KEY"]
BASE = "https://api.fastcrw.com"
HEADERS = {"Authorization": f"Bearer {API_KEY}"}


def search(query: str, limit: int = 5) -> list[dict]:
    r = httpx.post(
        f"{BASE}/v1/search",
        headers=HEADERS,
        json={"query": query, "limit": limit},
        timeout=30,
    )
    r.raise_for_status()
    return r.json()["data"]


for result in search("python web scraping api 2026"):
    print(result["title"], "→", result["url"])

Async Batch Scraping with TaskGroup and Semaphore

asyncio.TaskGroup (stable since 3.11, the recommended pattern in Python 3.13) combined with an asyncio.Semaphore gives you structured, bounded concurrency. Without the semaphore, unbounded fan-out exhausts file descriptors and trips rate limits:

import asyncio
import os
import httpx

API_KEY = os.environ["FASTCRW_API_KEY"]
BASE = "https://api.fastcrw.com"
HEADERS = {"Authorization": f"Bearer {API_KEY}"}

# fast-mode p90 is ~4 s on the 3-way benchmark — set a sane timeout
# so chrome-stealth recoveries are not killed prematurely.
REQUEST_TIMEOUT = 25.0
MAX_CONCURRENCY = 8  # tune from Little's Law: concurrency ≈ rps × p90s


async def scrape_one(
    client: httpx.AsyncClient,
    sem: asyncio.Semaphore,
    url: str,
) -> dict:
    async with sem:
        r = await client.post(
            f"{BASE}/v1/scrape",
            headers=HEADERS,
            json={"url": url, "formats": ["markdown"], "onlyMainContent": True},
            timeout=REQUEST_TIMEOUT,
        )
        r.raise_for_status()
        data = r.json()["data"]
        return {
            "url": url,
            "chars": len(data.get("markdown") or ""),
            "markdown": data.get("markdown", ""),
        }


async def batch_scrape(urls: list[str]) -> list[dict]:
    sem = asyncio.Semaphore(MAX_CONCURRENCY)
    results: list[dict] = []

    async with httpx.AsyncClient() as client:
        async with asyncio.TaskGroup() as tg:
            tasks = [tg.create_task(scrape_one(client, sem, url)) for url in urls]

    # TaskGroup awaits all tasks; exceptions are raised as ExceptionGroup
    return [t.result() for t in tasks]


urls = [
    "https://docs.fastcrw.com",
    "https://fastcrw.com/pricing",
    "https://fastcrw.com/integrations/langchain",
]

results = asyncio.run(batch_scrape(urls))
for r in results:
    print(f"{r['chars']:>8} chars  {r['url']}")

Latency note: On Firecrawl's public 1,000-URL scrape-content dataset (diagnose_3way.py, 2026-05-08), fastCRW in fast mode leads every latency percentile (p50 1361 ms, p90 4062 ms, p99 5588 ms) and returns the highest truth-recall of the three tools tested (63.74% of 819 labeled URLs, up to 67.6% in recall mode) — speed and accuracy are a single config flip. The full p50/p90/p99 breakdown is on /benchmarks/firecrawl-dataset.

Structured JSON Extraction

Pass formats: ["json"] with a JSON Schema to extract typed records instead of prose:

import os
import httpx

API_KEY = os.environ["FASTCRW_API_KEY"]
BASE = "https://api.fastcrw.com"
HEADERS = {"Authorization": f"Bearer {API_KEY}"}

schema = {
    "type": "object",
    "properties": {
        "productName": {"type": "string"},
        "priceUsd": {"type": "number", "description": "Current price in USD"},
        "inStock": {"type": "boolean"},
    },
    "required": ["productName", "priceUsd"],
}

r = httpx.post(
    f"{BASE}/v1/scrape",
    headers=HEADERS,
    json={
        "url": "https://example.com/products/widget",
        "formats": ["json"],
        "jsonSchema": schema,
    },
    timeout=60,
)
r.raise_for_status()
product = r.json()["data"]["json"]
print(product)

Cost: formats: ["json"] is billed as 1 scrape credit plus the managed-LLM token cost for that request, settled from real usage, versus 1 credit for plain markdown. LLM extraction is powered by fastCRW's managed LLM (paid plans only; the FREE plan has no LLM features). Extract across many URLs with concurrent /v1/scrape calls, /v2/batch/scrape, or /v1/crawl.

MCP Setup

fastCRW ships an MCP server (crw-mcp on npm) for AI agents that need live web data. It exposes scrape, crawl, map, and search as MCP tools — no separate HTTP client code needed:

{
  "mcpServers": {
    "fastcrw": {
      "command": "npx",
      "args": ["-y", "crw-mcp@latest"],
      "env": {
        "FASTCRW_API_KEY": "fcrw_...",
        "FASTCRW_API_URL": "https://api.fastcrw.com"
      }
    }
  }
}

See /integrations/mcp for full configuration options.

Good to Know

Screenshots — formats: ["screenshot"] is served by /v2/scrape.
Batch scraping — iterate /v1/scrape concurrently, use /v1/crawl for site-wide jobs, or call /v2/batch/scrape directly for a single-request batch.
Stateless per request — no session is carried across calls; multi-step authenticated flows must be reconstructed in your Python code.
LLM extraction — powered by the managed LLM, available on paid plans (the FREE plan has no LLM features).

fastCRWlive

Scrape any URL, live

Get 500 free credits →

Sources

fastCRW scrape docs

/docs/scrape

fastCRW Python SDK (crw on PyPI)

https://pypi.org/project/crw/

fastCRW 3-way scrape benchmark

/benchmarks/firecrawl-dataset

fastCRW search docs

/docs/search

FAQ

Is there an official Python SDK for fastCRW?

Yes — crw on PyPI. CrwClient() starts a self-contained local engine in-process, so you scrape from Python with no separate server and no cloud round-trip. For cloud access, point an httpx client or the Firecrawl Python SDK at https://api.fastcrw.com with api_url override — the REST surface is wire-compatible.

How do I scrape multiple URLs concurrently in Python?

Use asyncio.TaskGroup (stable since Python 3.11, the recommended pattern in 3.13) with an asyncio.Semaphore to cap in-flight requests. Reuse one httpx.AsyncClient across all tasks for connection pooling. Without the semaphore, unbounded fan-out exhausts file descriptors and trips the target site's rate limits.

What is the p50 latency for fastCRW scrape requests?

In fast mode on Firecrawl's public 1,000-URL scrape-content dataset (diagnose_3way.py, 2026-05-08), fastCRW leads every latency percentile: p50 1361 ms (41% faster than Firecrawl's 2305 ms), p90 4062 ms (ahead of Crawl4AI's 4754 ms and Firecrawl's 6937 ms), and p99 5588 ms. Speed and accuracy are a single config flip — switch to recall mode when you want maximum correct content. A per-request timeout (20–25 s) keeps workers free.

Can I run the fastCRW engine locally without cloud?

Yes. Install the crw PyPI package and call CrwClient() — it starts a self-contained Rust engine in-process with no separate server. Alternatively, run the Docker image (docker run -p 3000:3000 ghcr.io/us/crw:latest) and point your httpx client at http://localhost:3000. Self-hosting under AGPL-3.0 is free — you pay only your server costs.

How much does structured JSON extraction cost in credits?

Any request using formats: ["json"] with a jsonSchema is billed as 1 scrape credit plus the managed-LLM token cost for that request (settled from real usage), versus 1 credit for plain markdown. Extract across many URLs with concurrent /v1/scrape calls, /v2/batch/scrape, or /v1/crawl. LLM extraction is powered by fastCRW's managed LLM, available on paid plans (the FREE plan has no LLM features). See /pricing for current managed-cloud tiers.

Recommended next step

Run a live scrape before you commit.

Use the hosted demo to test scrape, crawl, or map output with fastCRW semantics.

Try Playground

Continue exploring

More from Integrations

View all integrations

Previous in Integrations

OpenAI Agents SDK Web Scraping Integration — fastCRW [Firecrawl-Compatible]

Next in Integrations

Go Web Scraping API — fastCRW [Firecrawl-Compatible]

Integrations

MCP Web Scraping Integration — fastCRW [Firecrawl-Compatible]

fastCRW ships an official MCP server (crw-mcp) exposing scrape, search, crawl, map, and extract to any MCP-compatible client. Small single static binary, local-first, self-host free under AGPL-3.0.

mcp web scrapingOfficial crw-mcp server, single npx command

Integrations

Google ADK Web Scraping Integration — fastCRW [Firecrawl-Compatible]

Wire fastCRW into Google's Agent Development Kit as a FunctionTool. Firecrawl-compatible scrape and search, small single static binary, local-first, self-host free under AGPL-3.0.

google adk web scrapingFunctionTool wrappers for fastCRW scrape and search

Integrations

Make Web Scraping Integration — fastCRW [Firecrawl-Compatible]

Add fastCRW to Make scenarios with the HTTP module. Firecrawl-compatible scrape and search, small single static binary, local-first, self-host free under AGPL-3.0.

make web scrapingWorks with Make's built-in HTTP > Make a request module

Related hubs

Keep the crawl path moving

Docs

Drop into endpoint reference once your integration is wired up.

Use Cases

See where this integration shape fits common AI-agent workloads.

Alternatives

Compare fastCRW against other scraping APIs your stack might consider.