Skip to main content
Integrations/Integration / LangChain

LangChain Web Scraping Integration — fastCRW [Firecrawl-Compatible]

Wire LangChain document loaders into fastCRW with a single api_url override. Same Firecrawl-compatible API, 6.6 MB RAM runtime, 92% coverage on the 1,000-URL benchmark.

Published
April 29, 2026
Updated
April 29, 2026
Category
integrations
Verdict

Use fastCRW as a drop-in replacement for Firecrawl inside LangChain document loaders, agent tools, and retrieval pipelines.

Drop-in FirecrawlLoader replacement via api_url overrideMarkdown output ready for embedding and chunkingWorks with LangChain agents, retrievers, and RAG chains6.6 MB RAM runtime instead of multi-hundred MB scraper containers

Why LangChain + fastCRW

LangChain is the dominant orchestration layer for retrieval pipelines and agent tools. fastCRW slots underneath as the scraping primitive that keeps the LangChain stack honest about latency and memory. The LangChain community already standardized around the Firecrawl document loader interface, and fastCRW is Firecrawl-compatible by design — so plugging fastCRW into a LangChain project is a one-line change. You keep every chain, retriever, and agent loop you already wrote, and you replace the heavy Firecrawl runtime with a 6.6 MB binary that hits 92% coverage at 833 ms average latency on our 1,000-URL benchmark.

Setup

  1. Install LangChain and the community loaders package.
  2. Sign up at fastcrw.com and grab an API key from the dashboard.
  3. Export the key as FASTCRW_API_KEY in your shell or .env file.
  4. Point the existing Firecrawl loader at the fastCRW base URL via the api_url argument.
pip install -U langchain langchain-community
export FASTCRW_API_KEY="fcrw_..."

You do not need a separate fastCRW LangChain package. The standard FirecrawlLoader already accepts a custom api_url because the fastCRW endpoints are wire-compatible with Firecrawl.

Code Example

import os
from langchain_community.document_loaders import FirecrawlLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter

# fastCRW is Firecrawl-compatible. Override api_url and the rest is identical.
loader = FirecrawlLoader(
    api_key=os.environ["FASTCRW_API_KEY"],
    api_url="https://api.fastcrw.com",
    url="https://example.com/blog",
    mode="scrape",  # or "crawl"
)

docs = loader.load()

splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=100)
chunks = splitter.split_documents(docs)

print(f"Loaded {len(docs)} document(s) from fastCRW")
print(f"Split into {len(chunks)} chunks for the LangChain vector store")

For a LangChain agent tool that calls fastCRW for ad-hoc scraping inside a reasoning loop:

from langchain_core.tools import tool
import requests

@tool
def fastcrw_scrape(url: str) -> str:
    """Scrape a URL via fastCRW and return the Markdown."""
    response = requests.post(
        "https://api.fastcrw.com/v1/scrape",
        headers={"Authorization": f"Bearer {os.environ['FASTCRW_API_KEY']}"},
        json={"url": url, "formats": ["markdown"]},
        timeout=60,
    )
    response.raise_for_status()
    return response.json()["data"]["markdown"]

When to Use This

  • RAG ingestion — feed LangChain vector stores from live URLs without standing up a separate scraping service.
  • LangChain agents that browse — give an agent a fastcrw_scrape tool so it can fetch arbitrary pages mid-reasoning.
  • Document loaders for evals — run fastCRW inside LangChain pipelines to build evaluation datasets from real web content.
  • Migrating from Firecrawl — keep the LangChain code unchanged and swap only the API base URL to cut runtime cost.

Limits + Gotchas

  • The FirecrawlLoader mode argument supports "scrape" and "crawl". For deep-crawl jobs, prefer fastCRW crawl with explicit maxDepth to keep token spend bounded.
  • LangChain document metadata is derived from the fastCRW response. If you depend on a specific Firecrawl metadata field that we have not yet shipped, file an issue.
  • LangChain JS uses the @langchain/community package. The same apiUrl override applies, but the field names follow camelCase.
  • Long-running crawls inside a LangChain agent loop can blow the agent's iteration budget. Run crawls outside the agent and pass results back through context.

Related

Continue exploring

More from Integrations

View all integrations

Related hubs

Keep the crawl path moving