LangChain Web Scraping Integration — fastCRW [Firecrawl-Compatible]
Wire LangChain document loaders into fastCRW with a single api_url override. Same Firecrawl-compatible API, 6.6 MB RAM runtime, 92% coverage on the 1,000-URL benchmark.
Use fastCRW as a drop-in replacement for Firecrawl inside LangChain document loaders, agent tools, and retrieval pipelines.
Why LangChain + fastCRW
LangChain is the dominant orchestration layer for retrieval pipelines and agent tools. fastCRW slots underneath as the scraping primitive that keeps the LangChain stack honest about latency and memory. The LangChain community already standardized around the Firecrawl document loader interface, and fastCRW is Firecrawl-compatible by design — so plugging fastCRW into a LangChain project is a one-line change. You keep every chain, retriever, and agent loop you already wrote, and you replace the heavy Firecrawl runtime with a 6.6 MB binary that hits 92% coverage at 833 ms average latency on our 1,000-URL benchmark.
Setup
- Install LangChain and the community loaders package.
- Sign up at fastcrw.com and grab an API key from the dashboard.
- Export the key as
FASTCRW_API_KEYin your shell or.envfile. - Point the existing Firecrawl loader at the fastCRW base URL via the
api_urlargument.
pip install -U langchain langchain-community
export FASTCRW_API_KEY="fcrw_..."
You do not need a separate fastCRW LangChain package. The standard FirecrawlLoader already accepts a custom api_url because the fastCRW endpoints are wire-compatible with Firecrawl.
Code Example
import os
from langchain_community.document_loaders import FirecrawlLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
# fastCRW is Firecrawl-compatible. Override api_url and the rest is identical.
loader = FirecrawlLoader(
api_key=os.environ["FASTCRW_API_KEY"],
api_url="https://api.fastcrw.com",
url="https://example.com/blog",
mode="scrape", # or "crawl"
)
docs = loader.load()
splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=100)
chunks = splitter.split_documents(docs)
print(f"Loaded {len(docs)} document(s) from fastCRW")
print(f"Split into {len(chunks)} chunks for the LangChain vector store")
For a LangChain agent tool that calls fastCRW for ad-hoc scraping inside a reasoning loop:
from langchain_core.tools import tool
import requests
@tool
def fastcrw_scrape(url: str) -> str:
"""Scrape a URL via fastCRW and return the Markdown."""
response = requests.post(
"https://api.fastcrw.com/v1/scrape",
headers={"Authorization": f"Bearer {os.environ['FASTCRW_API_KEY']}"},
json={"url": url, "formats": ["markdown"]},
timeout=60,
)
response.raise_for_status()
return response.json()["data"]["markdown"]
When to Use This
- RAG ingestion — feed LangChain vector stores from live URLs without standing up a separate scraping service.
- LangChain agents that browse — give an agent a
fastcrw_scrapetool so it can fetch arbitrary pages mid-reasoning. - Document loaders for evals — run fastCRW inside LangChain pipelines to build evaluation datasets from real web content.
- Migrating from Firecrawl — keep the LangChain code unchanged and swap only the API base URL to cut runtime cost.
Limits + Gotchas
- The
FirecrawlLoadermode argument supports"scrape"and"crawl". For deep-crawl jobs, prefer fastCRW crawl with explicitmaxDepthto keep token spend bounded. - LangChain document metadata is derived from the fastCRW response. If you depend on a specific Firecrawl metadata field that we have not yet shipped, file an issue.
- LangChain JS uses the
@langchain/communitypackage. The sameapiUrloverride applies, but the field names follow camelCase. - Long-running crawls inside a LangChain agent loop can blow the agent's iteration budget. Run crawls outside the agent and pass results back through context.
Related
Continue exploring
More from Integrations
Make Web Scraping Integration — fastCRW [Firecrawl-Compatible]
Add fastCRW to Make scenarios with the HTTP module. Firecrawl-compatible scrape and search, 6.6 MB RAM runtime, 92% coverage on the 1,000-URL benchmark.
Langflow Web Scraping Integration — fastCRW [Firecrawl-Compatible]
Add fastCRW to Langflow as a custom component or HTTP node. Firecrawl-compatible scrape and search, 6.6 MB RAM runtime, 92% coverage on the 1,000-URL benchmark.
Claude Code Web Scraping Integration — fastCRW [Firecrawl-Compatible]
Add fastCRW as a Claude Code MCP server. One npx command registers scrape, search, crawl, map, and extract tools. 6.6 MB RAM runtime, 92% coverage on the 1,000-URL benchmark.
Related hubs