Use fastCRW to turn websites into markdown and structured payloads for retrieval workflows without a heavy ingestion stack.
RAG pipelines do not want raw HTML and browser noise. They want:
Markdown is useful because it keeps the document readable, compact, and easier to split into chunks.
Use fastCRW when your ingestion layer needs:
| Stage | fastCRW role |
|---|---|
| Discovery | Map a domain or crawl a section |
| Extraction | Scrape into markdown or structured output |
| Preparation | Chunk, deduplicate, and filter the result |
| Retrieval | Send clean text into your vector or ranking layer |
The docs are organized around this flow so you can test each stage separately.
Most RAG ingestion problems are not caused by embeddings or vector databases. They start earlier:
fastCRW is useful when you want to make that front half of the pipeline simpler and more observable.
Ingestion cost is cumulative. A pipeline that refreshes thousands of pages every day benefits from faster responses and a deployment model that does not require a large crawler setup just to keep a knowledge base current.
map to understand site structure.scrape with markdown on representative pages.That order keeps the pipeline easier to debug and usually leads to better retrieval quality.