By the fastCRW team · Benchmark figures verified 2026-05-18 against the 2026-05-08 run · Verify independently before quoting internally.
Smolagents + fastCRW: web grounding without the bloat
If you reached for Hugging Face smolagents, you did it on purpose: it is a deliberately tiny agent framework — a few thousand lines, code agents that write and run Python rather than emit JSON blobs, minimal dependencies. The fastest way to ruin that is to bolt a multi-gigabyte web-data service onto the side of it. This guide wires smolagents to fastCRW for web search and scraping while keeping the whole stack lean: fastCRW is a single ~8 MB AGPL-3.0 Rust binary running in one container, it exposes a Firecrawl-compatible REST API you can reach with a base-URL swap, and it posts the highest truth-recall of the three scrapers we benchmarked (63.74% of 819 labeled URLs, diagnose_3way.py, 2026-05-08).
Disclosure: we build fastCRW. This is a vendor-authored tutorial, so weight it accordingly — but the limitations section below states plainly what fastCRW does not do, and where Firecrawl genuinely wins, so you can decide on evidence rather than marketing.
Smolagents' minimalist philosophy and the web
Code agents that call tools as Python
A smolagents CodeAgent does not pick tools from a menu of JSON schemas; it writes Python that calls your tools as ordinary functions, runs that code, observes the result, and iterates. A tool is just a Python callable decorated with @tool and a docstring. That means your web layer should look like a normal function that returns clean text — not a heavyweight SDK with its own runtime, queue, and browser pool. fastCRW fits that shape: one HTTP call, markdown back.
Why a lean web backend fits the smolagents ethos
The smolagents pitch is that you can read the whole framework in an afternoon and run it anywhere. A web-data dependency that needs five containers and a couple of gigabytes of RAM breaks that promise — your "tiny agent" now drags a platform-team-sized stack behind it. fastCRW is the opposite: a single statically-linked binary, no Redis, no Node.js, no browser farm required for the common case. The README labels the footprint as a structural fact (one ~8 MB binary / 1 container vs Firecrawl's ~2–3 GB across 5 containers), not a benchmark, so it holds regardless of load.
Write a fastCRW tool for smolagents
A @tool function calling the REST API
fastCRW speaks a Firecrawl-compatible REST surface, so the call is a plain POST /v1/scrape. Point it at your managed endpoint (https://fastcrw.com) or a locally self-hosted engine — the only difference is the base URL. Here is the whole tool:
import requestsand the smolagents@tooldecorator.- Read the base URL and key from environment so the same tool works against cloud or local.
- Return
data.markdown— clean, LLM-ready text — and nothing else.
In code:
from smolagents import tool, CodeAgent, InferenceClientModel
import os, requests
BASE = os.environ.get("CRW_BASE_URL", "https://fastcrw.com")
KEY = os.environ["CRW_API_KEY"]
@tool
def scrape_page(url: str) -> str:
"""Fetch a web page and return its main content as clean markdown.
Args:
url: The absolute URL to scrape."""
r = requests.post(f"{BASE}/v1/scrape",
headers={"Authorization": f"Bearer {KEY}"},
json={"url": url, "formats": ["markdown"]}, timeout=30)
r.raise_for_status()
return r.json()["data"]["markdown"]
That is the entire integration. Because fastCRW mirrors Firecrawl's request shape, anyone already calling Firecrawl from a smolagents tool can switch by changing BASE — no rewrite. If you prefer the Python SDK over raw requests, the crw package (PyPI) exposes CrwClient() and can run a self-contained local engine, which we use below for the zero-cloud variant.
Returning clean markdown to the code agent
The reason to return markdown rather than raw HTML is that the code agent will pass this string straight into the model's context. HTML burns tokens on tags, scripts, and nav chrome the model has to ignore; fastCRW's extraction strips the page down to the article body. The accuracy of that strip is exactly what truth-recall measures (see below) — and it directly decides how much of the real content your agent gets to reason over.
Adding /v1/search for discovery
A research agent usually does not start with a URL — it starts with a question. Add a second tool over /v1/search so the agent can discover URLs before scraping:
@tool
def web_search(query: str) -> str:
"""Search the web and return the top result URLs and snippets.
Args:
query: A natural-language search query."""
r = requests.post(f"{BASE}/v1/search",
headers={"Authorization": f"Bearer {KEY}"},
json={"query": query, "limit": 5}, timeout=30)
r.raise_for_status()
return "\n".join(f"{x['url']} — {x.get('description','')}" for x in r.json()["data"])
Search costs 1 credit per query; the agent can then feed any returned URL to scrape_page. For a Python-side getting-started walkthrough of these endpoints, see the Python scraping quickstart.
Zero-bloat infrastructure
Single ~8 MB AGPL-3.0 binary, 1 container
fastCRW's engine is one statically-linked Rust binary — no Redis, no Node.js, no separate worker tier. The Docker image is roughly 8 MB and runs as a single container (the default Compose ships the lightweight lightpanda renderer; chrome is opt-in). Compare that to a scraper stack that wants an API service, a worker pool, a queue, a datastore, and a browser runtime — five containers and a couple of gigabytes. For a framework whose whole identity is "small," that footprint difference is the point. We unpack it further in single-binary infra and low-memory scraping.
Self-host locally with the Python SDK crw
If you want the web layer to cost $0 and never leave your machine, skip the cloud entirely. The crw Python SDK runs a self-contained local engine, so your smolagents tool can call it without any external service:
from crw import CrwClient
client = CrwClient() # runs a local engine, no API key, no egress
@tool
def scrape_local(url: str) -> str:
"""Scrape a URL locally and return markdown."""
return client.scrape(url, formats=["markdown"]).markdown
The engine is AGPL-3.0, so self-hosting is free — you pay only for the box it runs on, and a $5 VPS is plenty for a single-agent workload.
Footprint vs a heavy multi-container stack
| Dimension | fastCRW | Typical heavy scraper |
|---|---|---|
| Docker image | single ~8 MB binary | ~2–3 GB total |
| Containers | 1 (+ optional sidecar) | 5 |
| Runtime deps | none (static Rust) | Node.js, queue, datastore, browser |
| Local mode | yes — CrwClient() | cloud-only or heavy compose |
These are structural facts from the repo README, not load-test numbers — they describe what each system is, not how it performed on a given day.
A worked example: a research code agent
Search, scrape, summarize loop
Wire both tools into a CodeAgent and the agent will compose them itself — the framework's whole appeal is that you do not script the loop, the model writes Python that does:
agent = CodeAgent(tools=[web_search, scrape_page],
model=InferenceClientModel())
answer = agent.run("Summarize the latest changes in the Rust 2024 edition.")
Internally the agent will typically call web_search, pick a couple of promising URLs, call scrape_page on each, and synthesize an answer from the markdown — all as generated Python, which is exactly what smolagents is built to run.
Iterating URLs since requests are stateless
fastCRW is stateless per request: there is no session that remembers the last page or carries cookies between calls. For a research agent that is usually fine — each scrape is independent — but it means you own the loop. If the agent needs five pages, it makes five scrape_page calls; there is no single batch-extract call that takes a list of URLs (more on that below). For crawling a whole site rather than hand-picked pages, use /v1/crawl, which walks the site and bills 1 credit per page.
Accuracy and latency, disclosed
Highest truth-recall of the three tools tested
Against Firecrawl's own public scrape-content-dataset-v1 (1,000 URLs, 819 of them carrying labeled ground truth), fastCRW recovered the most labeled content of the three scrapers measured: 63.74% truth-recall (522 of 819 labeled URLs), versus Crawl4AI's 59.95% and Firecrawl's 56.04% (diagnose_3way.py, single run of 3,000 requests, 2026-05-08). We pair that with the honest companions from the same run: an 87.7% scrape-success rate (Firecrawl edged it at 89.7%) and 0 thrown errors across all 3,000 requests. For a code agent, recall is the number that matters — content the scraper missed is content the model never sees, and the answer degrades silently.
p50 win, p90 tail honesty
On latency the picture is genuinely mixed, and we publish the full split rather than a flattering average. fastCRW's median scrape latency was 1914 ms, beating Firecrawl's 2305 ms and effectively tied with Crawl4AI (1916 ms). But fastCRW's p90 was 14157 ms — the worst of the three (Crawl4AI 4754 ms, Firecrawl 6937 ms). That tail is causal, not incidental: the chrome-stealth fallback that recovers the hard URLs the other tools miss is the same mechanism that produces the slow tail. So budget a generous timeout (think tens of seconds, sized off the p90, not the p50) on your scrape_page tool, and the agent will tolerate the occasional slow page in exchange for the higher recall. Search is a separate, faster story: fastCRW search averaged 880 ms over a 100-query benchmark, with 73 of 100 latency wins against Firecrawl and Tavily (triple-bench.ts).
Limitations
No /v1/agent harness
fastCRW gives you scrape, crawl, map, and search — not an autonomous agent endpoint. There is no /v1/agent (no Spark-style models) and no /v1/deep-research. That is by design here: smolagents is your agent harness, so fastCRW only needs to be the web layer. If you wanted the scraper itself to run a multi-step research loop server-side, that is a Firecrawl-cloud capability fastCRW does not replicate — compose the loop in smolagents instead.
No batch /v1/extract
There is no multi-URL batched extract endpoint. The managed /v1/extract is a single-URL, 5-credit convenience wrapper over /v1/scrape with formats: ["json"]; self-hosters use /v1/scrape + jsonSchema directly, also single-URL. For many URLs you iterate scrape_page concurrently or run a crawl. Two more honest gaps worth knowing: screenshot output is not supported (a formats: ["screenshot"] request returns HTTP 422), and LLM-based JSON extraction supports OpenAI and Anthropic providers only (managed search answer mode defaults to DeepSeek). Where Firecrawl genuinely wins: its larger ecosystem, cloud-only anti-bot depth, batch extract, and the agent/deep-research endpoints. If you depend on those, stay on Firecrawl — and because the API is compatible, you can keep your smolagents tool and just change the base URL.
Sources
- fastCRW canonical fact sheet — internal benchmark of record (
bench/server-runs/RESULT_3WAY_1000_FULL.md,diagnose_3way.py, 2026-05-08;benchmarks/triple-bench.ts, 100 queries). - fastCRW open-source engine and README: github.com/us/crw (AGPL-3.0).
- Hugging Face smolagents documentation: github.com/huggingface/smolagents.
- Live pricing and credit costs: /pricing.
Related: Python scraping quickstart · Single-binary infra · Low-memory scraping
