Blog
Web scraping for AI agents, RAG pipelines, and Rust infrastructure.
An honest look at CRW's current limitations — screenshots, PDF parsing, anti-bot, SPA coverage — and the roadmap for each.
CRW v0.0.11 adds automatic stealth JavaScript injection to bypass bot detection, Chrome as a fallback renderer for complex SPAs, Cloudflare challenge auto-retry, and HTTP-to-CDP auto-escalation.
The case for single-binary deployment in developer infrastructure — operational simplicity, CI speed, and why CRW ships as one 8 MB file.
A technical deep-dive into CRW's Axum-based API, lol-html parser, LightPanda integration, and how it achieves 6.6 MB idle RAM.
CRW v0.0.11 adds automatic stealth JavaScript injection and Cloudflare challenge retry. Here's how it works under the hood, and how to configure it for maximum success rate.
How idle RAM affects your hosting costs and concurrent throughput — and why CRW's 6.6 MB footprint changes the economics.
CRW v0.0.8 fixes Wikipedia extraction with onlyMainContent, adds bring-your-own-key LLM extraction, introduces 3-tier noise matching, and hardens the content cleaning pipeline.
CRW v0.0.10 adds configurable rate limiting, a crawl cancel endpoint, machine-readable error codes on every error response, fenced code blocks, and cleaner markdown output for RAG pipelines.
In-depth benchmark results from 500 URLs comparing CRW, Firecrawl, Crawl4AI, and Spider on latency, coverage, and memory.
A practical look at Rust and Python for building production scraping infrastructure — performance, memory, operability, and when each makes sense.
Run a Firecrawl-compatible scraping API on your own server in under 60 seconds using CRW's single Docker image.
Deploy a full Firecrawl-compatible scraping API on a $5/month VPS with 512 MB RAM. CRW's 6.6 MB memory footprint makes it possible — here's the complete guide.
Turn any web page into clean, noise-free markdown ready for LLMs using CRW's scrape endpoint. No selectors, no regex.
CRW v0.0.2 adds CSS/XPath extraction, RAG-ready chunking with BM25 and cosine scoring, stealth mode for bot detection bypass, per-request proxy, and a setup command for JS rendering.
Connect CRW's built-in MCP server to Claude, Cursor, or any MCP-compatible AI agent for live web scraping in agentic workflows.
Step-by-step guide to scraping websites, converting to clean markdown, and feeding into a RAG pipeline using CRW's API.
An honest comparison of self-hosted web scrapers — Firecrawl, Crawl4AI, Spider, and CRW — for AI agents, RAG pipelines, and structured extraction.
Give Claude Code web scraping superpowers with CRW's built-in MCP server. One command, zero config — scrape any website directly from your terminal AI assistant.
The story behind CRW — why Rust, why single-binary, and why Firecrawl-compatible for AI agent and RAG use cases.
Compare CRW and Crawl4AI for AI agent and RAG workflows. Covers deployment, API design, memory, and the key tradeoffs.
A detailed three-way comparison of Firecrawl, Crawl4AI, and CRW — covering deployment, performance, memory, API design, and which tool fits which team.
A detailed comparison of CRW and Firecrawl covering performance, memory usage, deployment, and which tool fits which use case.