Skip to main content
Tutorial

Smolagents + fastCRW: Web Grounding, Zero Bloat

Add web search and scraping to Hugging Face smolagents with fastCRW: a single ~8 MB binary keeps the stack lean, plus the highest truth-recall of three tools.

fastcrw
June 7, 20269 min readLast updated: June 2, 2026

By the fastCRW team · Benchmark figures verified 2026-05-18 against the 2026-05-08 run · Verify independently before quoting internally.

Smolagents + fastCRW: web grounding without the bloat

If you reached for Hugging Face smolagents, you did it on purpose: it is a deliberately tiny agent framework — a few thousand lines, code agents that write and run Python rather than emit JSON blobs, minimal dependencies. The fastest way to ruin that is to bolt a multi-gigabyte web-data service onto the side of it. This guide wires smolagents to fastCRW for web search and scraping while keeping the whole stack lean: fastCRW is a single ~8 MB AGPL-3.0 Rust binary running in one container, it exposes a Firecrawl-compatible REST API you can reach with a base-URL swap, and it posts the highest truth-recall of the three scrapers we benchmarked (63.74% of 819 labeled URLs, diagnose_3way.py, 2026-05-08).

Disclosure: we build fastCRW. This is a vendor-authored tutorial, so weight it accordingly — but the limitations section below states plainly what fastCRW does not do, and where Firecrawl genuinely wins, so you can decide on evidence rather than marketing.

Smolagents' minimalist philosophy and the web

Code agents that call tools as Python

A smolagents CodeAgent does not pick tools from a menu of JSON schemas; it writes Python that calls your tools as ordinary functions, runs that code, observes the result, and iterates. A tool is just a Python callable decorated with @tool and a docstring. That means your web layer should look like a normal function that returns clean text — not a heavyweight SDK with its own runtime, queue, and browser pool. fastCRW fits that shape: one HTTP call, markdown back.

Why a lean web backend fits the smolagents ethos

The smolagents pitch is that you can read the whole framework in an afternoon and run it anywhere. A web-data dependency that needs five containers and a couple of gigabytes of RAM breaks that promise — your "tiny agent" now drags a platform-team-sized stack behind it. fastCRW is the opposite: a single statically-linked binary, no Redis, no Node.js, no browser farm required for the common case. The README labels the footprint as a structural fact (one ~8 MB binary / 1 container vs Firecrawl's ~2–3 GB across 5 containers), not a benchmark, so it holds regardless of load.

Write a fastCRW tool for smolagents

A @tool function calling the REST API

fastCRW speaks a Firecrawl-compatible REST surface, so the call is a plain POST /v1/scrape. Point it at your managed endpoint (https://fastcrw.com) or a locally self-hosted engine — the only difference is the base URL. Here is the whole tool:

  • import requests and the smolagents @tool decorator.
  • Read the base URL and key from environment so the same tool works against cloud or local.
  • Return data.markdown — clean, LLM-ready text — and nothing else.

In code:

from smolagents import tool, CodeAgent, InferenceClientModel
import os, requests

BASE = os.environ.get("CRW_BASE_URL", "https://fastcrw.com")
KEY = os.environ["CRW_API_KEY"]

@tool
def scrape_page(url: str) -> str:
  """Fetch a web page and return its main content as clean markdown.
  Args:
    url: The absolute URL to scrape."""
  r = requests.post(f"{BASE}/v1/scrape",
    headers={"Authorization": f"Bearer {KEY}"},
    json={"url": url, "formats": ["markdown"]}, timeout=30)
  r.raise_for_status()
  return r.json()["data"]["markdown"]

That is the entire integration. Because fastCRW mirrors Firecrawl's request shape, anyone already calling Firecrawl from a smolagents tool can switch by changing BASE — no rewrite. If you prefer the Python SDK over raw requests, the crw package (PyPI) exposes CrwClient() and can run a self-contained local engine, which we use below for the zero-cloud variant.

Returning clean markdown to the code agent

The reason to return markdown rather than raw HTML is that the code agent will pass this string straight into the model's context. HTML burns tokens on tags, scripts, and nav chrome the model has to ignore; fastCRW's extraction strips the page down to the article body. The accuracy of that strip is exactly what truth-recall measures (see below) — and it directly decides how much of the real content your agent gets to reason over.

Adding /v1/search for discovery

A research agent usually does not start with a URL — it starts with a question. Add a second tool over /v1/search so the agent can discover URLs before scraping:

@tool
def web_search(query: str) -> str:
  """Search the web and return the top result URLs and snippets.
  Args:
    query: A natural-language search query."""
  r = requests.post(f"{BASE}/v1/search",
    headers={"Authorization": f"Bearer {KEY}"},
    json={"query": query, "limit": 5}, timeout=30)
  r.raise_for_status()
  return "\n".join(f"{x['url']} — {x.get('description','')}" for x in r.json()["data"])

Search costs 1 credit per query; the agent can then feed any returned URL to scrape_page. For a Python-side getting-started walkthrough of these endpoints, see the Python scraping quickstart.

Zero-bloat infrastructure

Single ~8 MB AGPL-3.0 binary, 1 container

fastCRW's engine is one statically-linked Rust binary — no Redis, no Node.js, no separate worker tier. The Docker image is roughly 8 MB and runs as a single container (the default Compose ships the lightweight lightpanda renderer; chrome is opt-in). Compare that to a scraper stack that wants an API service, a worker pool, a queue, a datastore, and a browser runtime — five containers and a couple of gigabytes. For a framework whose whole identity is "small," that footprint difference is the point. We unpack it further in single-binary infra and low-memory scraping.

Self-host locally with the Python SDK crw

If you want the web layer to cost $0 and never leave your machine, skip the cloud entirely. The crw Python SDK runs a self-contained local engine, so your smolagents tool can call it without any external service:

from crw import CrwClient
client = CrwClient() # runs a local engine, no API key, no egress

@tool
def scrape_local(url: str) -> str:
  """Scrape a URL locally and return markdown."""
  return client.scrape(url, formats=["markdown"]).markdown

The engine is AGPL-3.0, so self-hosting is free — you pay only for the box it runs on, and a $5 VPS is plenty for a single-agent workload.

Footprint vs a heavy multi-container stack

DimensionfastCRWTypical heavy scraper
Docker imagesingle ~8 MB binary~2–3 GB total
Containers1 (+ optional sidecar)5
Runtime depsnone (static Rust)Node.js, queue, datastore, browser
Local modeyes — CrwClient()cloud-only or heavy compose

These are structural facts from the repo README, not load-test numbers — they describe what each system is, not how it performed on a given day.

A worked example: a research code agent

Search, scrape, summarize loop

Wire both tools into a CodeAgent and the agent will compose them itself — the framework's whole appeal is that you do not script the loop, the model writes Python that does:

agent = CodeAgent(tools=[web_search, scrape_page],
  model=InferenceClientModel())
answer = agent.run("Summarize the latest changes in the Rust 2024 edition.")

Internally the agent will typically call web_search, pick a couple of promising URLs, call scrape_page on each, and synthesize an answer from the markdown — all as generated Python, which is exactly what smolagents is built to run.

Iterating URLs since requests are stateless

fastCRW is stateless per request: there is no session that remembers the last page or carries cookies between calls. For a research agent that is usually fine — each scrape is independent — but it means you own the loop. If the agent needs five pages, it makes five scrape_page calls; there is no single batch-extract call that takes a list of URLs (more on that below). For crawling a whole site rather than hand-picked pages, use /v1/crawl, which walks the site and bills 1 credit per page.

Accuracy and latency, disclosed

Highest truth-recall of the three tools tested

Against Firecrawl's own public scrape-content-dataset-v1 (1,000 URLs, 819 of them carrying labeled ground truth), fastCRW recovered the most labeled content of the three scrapers measured: 63.74% truth-recall (522 of 819 labeled URLs), versus Crawl4AI's 59.95% and Firecrawl's 56.04% (diagnose_3way.py, single run of 3,000 requests, 2026-05-08). We pair that with the honest companions from the same run: an 87.7% scrape-success rate (Firecrawl edged it at 89.7%) and 0 thrown errors across all 3,000 requests. For a code agent, recall is the number that matters — content the scraper missed is content the model never sees, and the answer degrades silently.

p50 win, p90 tail honesty

On latency the picture is genuinely mixed, and we publish the full split rather than a flattering average. fastCRW's median scrape latency was 1914 ms, beating Firecrawl's 2305 ms and effectively tied with Crawl4AI (1916 ms). But fastCRW's p90 was 14157 ms — the worst of the three (Crawl4AI 4754 ms, Firecrawl 6937 ms). That tail is causal, not incidental: the chrome-stealth fallback that recovers the hard URLs the other tools miss is the same mechanism that produces the slow tail. So budget a generous timeout (think tens of seconds, sized off the p90, not the p50) on your scrape_page tool, and the agent will tolerate the occasional slow page in exchange for the higher recall. Search is a separate, faster story: fastCRW search averaged 880 ms over a 100-query benchmark, with 73 of 100 latency wins against Firecrawl and Tavily (triple-bench.ts).

Limitations

No /v1/agent harness

fastCRW gives you scrape, crawl, map, and search — not an autonomous agent endpoint. There is no /v1/agent (no Spark-style models) and no /v1/deep-research. That is by design here: smolagents is your agent harness, so fastCRW only needs to be the web layer. If you wanted the scraper itself to run a multi-step research loop server-side, that is a Firecrawl-cloud capability fastCRW does not replicate — compose the loop in smolagents instead.

No batch /v1/extract

There is no multi-URL batched extract endpoint. The managed /v1/extract is a single-URL, 5-credit convenience wrapper over /v1/scrape with formats: ["json"]; self-hosters use /v1/scrape + jsonSchema directly, also single-URL. For many URLs you iterate scrape_page concurrently or run a crawl. Two more honest gaps worth knowing: screenshot output is not supported (a formats: ["screenshot"] request returns HTTP 422), and LLM-based JSON extraction supports OpenAI and Anthropic providers only (managed search answer mode defaults to DeepSeek). Where Firecrawl genuinely wins: its larger ecosystem, cloud-only anti-bot depth, batch extract, and the agent/deep-research endpoints. If you depend on those, stay on Firecrawl — and because the API is compatible, you can keep your smolagents tool and just change the base URL.

Sources

  • fastCRW canonical fact sheet — internal benchmark of record (bench/server-runs/RESULT_3WAY_1000_FULL.md, diagnose_3way.py, 2026-05-08; benchmarks/triple-bench.ts, 100 queries).
  • fastCRW open-source engine and README: github.com/us/crw (AGPL-3.0).
  • Hugging Face smolagents documentation: github.com/huggingface/smolagents.
  • Live pricing and credit costs: /pricing.

Related: Python scraping quickstart · Single-binary infra · Low-memory scraping

FAQ

Frequently asked questions

How do I add web access to a smolagents code agent?
Write a Python function decorated with smolagents' @tool that POSTs to fastCRW's Firecrawl-compatible REST API — /v1/scrape for a known URL, /v1/search for discovery — and returns the cleaned markdown. Pass the tool(s) to a CodeAgent and the agent will call them as ordinary Python during its loop. The whole integration is a few lines; no SDK or extra runtime is required.
How big is the fastCRW footprint?
fastCRW's engine is a single statically-linked Rust binary — roughly an 8 MB Docker image running in one container, with no Redis, Node.js, or browser farm required for the common case. The repo README frames this as a structural fact versus a heavier scraper stack of about 2–3 GB across five containers. For a minimalist framework like smolagents, that keeps the stack lean.
Can smolagents run fastCRW locally with no cloud?
Yes. The crw Python SDK exposes CrwClient(), which runs a self-contained local engine with no API key and no network egress. Call it from inside your @tool function and scraped content never leaves your machine. The engine is AGPL-3.0, so self-hosting is free — you pay only for the server it runs on.
Does fastCRW give clean markdown or raw HTML?
Request formats: ['markdown'] and fastCRW returns the main content as clean, LLM-ready markdown with nav chrome, scripts, and boilerplate stripped — which saves tokens and gives the code agent better context than raw HTML. JSON-schema structured extraction is also available via formats: ['json'] (5 credits, single URL, OpenAI/Anthropic providers).
Is fastCRW more accurate than Firecrawl at extraction?
On the one public benchmark we ran, yes: against Firecrawl's own scrape-content-dataset-v1, fastCRW posted the highest truth-recall of the three tools tested — 63.74% of 819 labeled URLs versus Crawl4AI 59.95% and Firecrawl 56.04% (diagnose_3way.py, 2026-05-08). That is a single run on one dataset, not a universal guarantee, and Firecrawl edged scrape-success (89.7% vs 87.7%). fastCRW's p90 latency tail (14157 ms) is also the worst of the three — the cost of the chrome-stealth fallback that recovers the missed URLs.

Get Started

Try CRW Free

Self-host for free (AGPL) or use fastCRW cloud with 500 free credits — no credit card required.

Continue exploring

More tutorial posts

View category archive