By the fastCRW team · Benchmark and credit figures verified 2026-05-18 against the canonical fact sheet · Verify independently before relying on numbers.
Build a chat-with-website bot: the architecture that actually works
To build a chat-with-website bot you need five things in a line: ingest the site, chunk the text, embed the chunks, retrieve the relevant ones at query time, and answer with an LLM. Four of those five steps are solved problems with off-the-shelf LangChain components. The fifth — ingestion — is the one that quietly decides whether your bot is useful or whether it confidently makes things up. This tutorial walks the whole pipeline in Python, using fastCRW for ingestion and LangChain for the RAG chain, and it is honest about where the cost and the limits actually live.
The reason ingestion matters most: a retrieval-augmented bot can only answer from the text you embedded. If your crawler hands you garbled HTML, half-rendered JavaScript, or navigation chrome instead of the actual content, no amount of prompt engineering downstream will fix it. That is why we lead with extraction quality rather than the chain wiring.
What a chat-with-website bot needs under the hood
Ingest, chunk, embed, retrieve, answer
The canonical RAG loop is: (1) ingest — turn web pages into clean text; (2) chunk — split that text into retrieval-sized pieces; (3) embed — convert chunks to vectors and store them; (4) retrieve — at query time, find the chunks nearest the question; (5) answer — feed the retrieved chunks plus the question to an LLM. LangChain gives you batteries-included primitives for steps 2 through 5. Step 1 is where most "chat with my docs" projects fall over.
Why clean ingestion is the part that breaks
Cleaner extraction is not a nicety here — it is the accuracy ceiling for the whole bot. In a 3-way scrape benchmark on Firecrawl's own public dataset, fastCRW had the highest truth-recall of the three tools tested: 63.74% of 819 labeled URLs, versus 59.95% for Crawl4AI and 56.04% for Firecrawl (diagnose_3way.py, 2026-05-08). Truth-recall is the share of labeled ground-truth content the scraper actually recovered. The higher that number, the more of each page's real content lands in your vector store — and the fewer times your bot answers "I don't know" or, worse, hallucinates a plausible-sounding answer from a half-scraped page.
From the same run: in fast mode, fastCRW's p90 latency is 4348 ms — the lowest of the three (Crawl4AI 4754 ms, Firecrawl 6937 ms). The chrome-stealth fallback that recovers the hard pages others miss is the same mechanism that wins on recall and keeps the tail competitive. For a chat-with-website bot this is a non-issue in any case: crawling is an offline, batch job you run before any user asks a question, so tail latency on a handful of stubborn pages is invisible at chat time.
Step 1: Crawl the whole site with /v1/crawl
maxDepth and maxPages caps
The /v1/crawl endpoint runs an async breadth-first crawl from a seed URL and returns a job ID you poll for results. It accepts maxDepth (how many link-hops from the seed, cap 10) and maxPages (how many pages total, cap 1000). For a docs site or knowledge base, set both deliberately: maxDepth: 3 and maxPages: 200 is a sane starting point that keeps you from accidentally crawling an entire blog archive when you only wanted the product docs.
In Python, kick off the crawl and poll the job ID until it completes: start = requests.post(f"{BASE}/v1/crawl", headers=HEADERS, json={"url": url, "maxDepth": 3, "maxPages": 200, "scrapeOptions": {"formats": ["markdown"]}}).json(), grab job_id = start["id"], then loop on requests.get(f"{BASE}/v1/crawl/{job_id}", headers=HEADERS).json() with a short time.sleep(2) between polls until status["status"] is completed (or failed), and read the pages from status["data"].
Getting LLM-ready markdown out of crawl
Pass scrapeOptions: { formats: ["markdown"] } so each crawled page comes back as clean, LLM-ready markdown instead of raw HTML. This is the step that makes chunking stable: markdown strips out the navigation, scripts, and styling noise, leaving headings, paragraphs, lists, and tables — exactly the structure a chunker and an embedder want.
Firecrawl-SDK base-URL swap with LangChain
fastCRW speaks a Firecrawl-compatible REST API, so it is a drop-in after a single base-URL swap. If you already use LangChain's FireCrawlLoader, point it at your fastCRW instance and the existing loader code keeps working — managed cloud or self-hosted, same call shape:
The loader call is one constructor: loader = FireCrawlLoader(api_key=os.environ["CRW_API_KEY"], api_url=os.environ["CRW_BASE_URL"], url="https://docs.example.com", mode="crawl", params={"maxDepth": 3, "maxPages": 200, "scrapeOptions": {"formats": ["markdown"]}}), then docs = loader.load(). The only thing that differs from a stock Firecrawl setup is api_url pointing at your fastCRW instance.
The honest caveat: "compatible" means the overlap surface most pipelines use — scrape, crawl, map, search — not byte-for-byte every field. Validate the short list of fields your loader reads before you cut over, and see crawling an entire website for the full crawl-tuning playbook.
Step 2: Chunk and embed the pages
Choosing a chunk size from clean markdown
Because the crawl handed you markdown, you can chunk on structure rather than blindly on character count. A recursive splitter that respects headings and paragraphs (chunk size ~1000 characters, ~150 overlap) is a solid default for prose-heavy sites; long API-reference pages often want smaller chunks so a single retrieved chunk maps to a single method. We cover the trade-offs in depth in chunking strategies for RAG.
Split and embed in three lines: build a RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=150), call chunks = splitter.split_documents(docs), then persist them with vectorstore = Chroma.from_documents(chunks, embedding=OpenAIEmbeddings(), persist_directory="./site_index").
Loading into a vector store
Any LangChain-supported vector store works — Chroma for a local prototype, pgvector or a hosted store for production. The key detail is to persist the index so you are not re-embedding the whole site on every restart. The source URL travels with each chunk in its metadata, which you will need in Step 3 for citations.
Step 3: Build the LangChain retrieval chain
Retriever + prompt + LLM
The chain is the easy part once ingestion is clean. Wrap the vector store as a retriever, write a prompt that forbids answering outside the retrieved context, and pipe both into an LLM:
Assemble the chain with qa = RetrievalQA.from_chain_type(llm=ChatOpenAI(model="gpt-4o-mini", temperature=0), retriever=vectorstore.as_retriever(search_kwargs={"k": 5}), return_source_documents=True), then answer a question with result = qa.invoke({"query": "How do I rotate an API key?"}) and read result["result"] for the answer plus result["source_documents"] for the citations.
Adding source citations to answers
Because return_source_documents=True is set, every answer comes back with the chunks it used, each carrying the source URL from ingestion metadata. Render those URLs under the answer so users can verify the bot against the real page. Citations are not decoration: they are how you keep a chat-with-website bot honest, and they make hallucinations immediately obvious because a fabricated claim will have no supporting source.
Step 4: Keep the bot fresh
Re-crawling on a schedule
A website changes; your index does not, unless you refresh it. fastCRW is stateless per request — it does not remember your previous crawls or hold any session — so freshness is something you own, not a feature you toggle. Run the crawl-and-embed job on a cron schedule (nightly for docs that change often, weekly otherwise). See building a RAG pipeline with fastCRW for the end-to-end scheduling pattern.
Incremental updates vs full re-index
A full re-crawl and re-embed is the simplest correct approach and is fine for sites under a few hundred pages. For larger sites, store a content hash per URL, re-crawl, and only re-embed the chunks whose page hash changed — deleting the old vectors for that URL first so stale answers do not linger. The stateless model means you keep the hash table; fastCRW just returns the current content each run.
Cost: self-host vs managed for whole-site crawls
Crawl credit math per page
On managed fastCRW, a crawl costs 1 credit per page — flat, regardless of which renderer (auto, http, lightpanda, or chrome) handles the page. So a 200-page docs site indexes for roughly 200 credits per full re-crawl — cheap, and easy to forecast because there is no JS-rendering surcharge. For exact tier allowances and the launch-vs-regular price, link to /pricing rather than trusting a hard-coded number.
When to self-host the ingestion engine
If you re-crawl large sites frequently, self-hosting changes the math entirely: the AGPL-3.0 engine is a single ~8 MB binary in one container, and self-hosted crawls cost $0 in credits — you pay only for the server. That is the difference between a per-page cloud bill and a flat VPS cost for unlimited re-indexing. The same Firecrawl-compatible API runs in both modes, so you can prototype on managed and move ingestion in-house later without touching your LangChain code. For the LangChain-specific wiring, see our LangChain + fastCRW RAG tutorial.
Honest limits
Two things to scope plainly. First, fastCRW has no screenshot output — a request for formats: ["screenshot"] returns HTTP 422 — so a chat-with-website bot built this way answers from text, not from images of pages. Second, there is no multi-URL batch extract endpoint; /v1/crawl handles whole-site ingestion well, but if you want structured JSON from many specific URLs you iterate /v1/scrape concurrently rather than calling a batch API. Neither limit affects the text-RAG bot in this tutorial, but you should know them before you scope a bigger system.
Sources
- fastCRW canonical fact sheet — scrape benchmark, crawl endpoint and caps, footprint, product identity
- 3-way scrape benchmark of record:
bench/server-runs/RESULT_3WAY_1000_FULL.md(diagnose_3way.py, 819 labeled URLs, 2026-05-08) - fastCRW repo and pricing: github.com/us/crw · fastcrw.com
- LangChain document loaders: python.langchain.com
Related: LangChain + fastCRW RAG tutorial · Build a RAG pipeline with fastCRW · Best chunking strategies for RAG · Crawl an entire website
