Blog

Engineering & Insights

Web scraping for AI agents, RAG pipelines, and Rust infrastructure.

ComparisonTutorialEngineeringAlternatives
Engineering·10 min read

Where CRW Still Falls Short — and What We're Improving

An honest look at CRW's current limitations — screenshots, PDF parsing, anti-bot, SPA coverage — and the roadmap for each.

Mar 14, 2026
Engineering·10 min read

CRW v0.0.11: Stealth Anti-Bot Bypass, Chrome Failover, and Cloudflare Challenge Retry

CRW v0.0.11 adds automatic stealth JavaScript injection to bypass bot detection, Chrome as a fallback renderer for complex SPAs, Cloudflare challenge auto-retry, and HTTP-to-CDP auto-escalation.

Mar 14, 2026
Engineering·7 min read

Single-Binary Infrastructure: Why It Matters for Developer Tools

The case for single-binary deployment in developer infrastructure — operational simplicity, CI speed, and why CRW ships as one 8 MB file.

Mar 13, 2026
Engineering·11 min read

Inside CRW: Architecture of a Lightweight Rust Scraping API

A technical deep-dive into CRW's Axum-based API, lol-html parser, LightPanda integration, and how it achieves 6.6 MB idle RAM.

Mar 13, 2026
Tutorial·20 min read

How to Scrape Cloudflare-Protected Sites with CRW's Stealth Mode

CRW v0.0.11 adds automatic stealth JavaScript injection and Cloudflare challenge retry. Here's how it works under the hood, and how to configure it for maximum success rate.

Mar 13, 2026
Engineering·7 min read

Why Low Memory Usage Matters in Self-Hosted Scraping

How idle RAM affects your hosting costs and concurrent throughput — and why CRW's 6.6 MB footprint changes the economics.

Mar 12, 2026
Engineering·10 min read

CRW v0.0.8: Wikipedia Fix, BYOK Extraction, and Smarter Noise Detection

CRW v0.0.8 fixes Wikipedia extraction with onlyMainContent, adds bring-your-own-key LLM extraction, introduces 3-tier noise matching, and hardens the content cleaning pipeline.

Mar 12, 2026
Engineering·7 min read

CRW v0.0.10: Rate Limiting, Crawl Cancel, and Machine-Readable Error Codes

CRW v0.0.10 adds configurable rate limiting, a crawl cancel endpoint, machine-readable error codes on every error response, fenced code blocks, and cleaner markdown output for RAG pipelines.

Mar 12, 2026
Engineering·16 min read

What I Learned Benchmarking CRW Against Firecrawl and Crawl4AI

In-depth benchmark results from 500 URLs comparing CRW, Firecrawl, Crawl4AI, and Spider on latency, coverage, and memory.

Mar 11, 2026
Engineering·9 min read

Rust vs Python for Web Scraping Infrastructure

A practical look at Rust and Python for building production scraping infrastructure — performance, memory, operability, and when each makes sense.

Mar 10, 2026
Tutorial·6 min read

How to Self-Host a Firecrawl-Like API with a Single Binary

Run a Firecrawl-compatible scraping API on your own server in under 60 seconds using CRW's single Docker image.

Mar 9, 2026
Tutorial·16 min read

$5 VPS Web Scraping: Run CRW Where Firecrawl Can't

Deploy a full Firecrawl-compatible scraping API on a $5/month VPS with 512 MB RAM. CRW's 6.6 MB memory footprint makes it possible — here's the complete guide.

Mar 9, 2026
Tutorial·16 min read

How to Convert Websites to Clean Markdown for LLMs

Turn any web page into clean, noise-free markdown ready for LLMs using CRW's scrape endpoint. No selectors, no regex.

Mar 8, 2026
Engineering·8 min read

CRW v0.0.2: CSS Selectors, Chunking, BM25 Scoring, and Stealth Mode

CRW v0.0.2 adds CSS/XPath extraction, RAG-ready chunking with BM25 and cosine scoring, stealth mode for bot detection bypass, per-request proxy, and a setup command for JS rendering.

Mar 8, 2026
Tutorial·20 min read

How to Expose Web Scraping to AI Agents with MCP

Connect CRW's built-in MCP server to Claude, Cursor, or any MCP-compatible AI agent for live web scraping in agentic workflows.

Mar 7, 2026
Tutorial·22 min read

How to Build a RAG Pipeline from Websites Using CRW

Step-by-step guide to scraping websites, converting to clean markdown, and feeding into a RAG pipeline using CRW's API.

Mar 6, 2026
Alternatives·16 min read

Best Self-Hosted Web Scraping Tools for AI Agents and RAG (2026)

An honest comparison of self-hosted web scrapers — Firecrawl, Crawl4AI, Spider, and CRW — for AI agents, RAG pipelines, and structured extraction.

Mar 5, 2026
Tutorial·18 min read

How to Add Web Scraping to Claude Code in 30 Seconds

Give Claude Code web scraping superpowers with CRW's built-in MCP server. One command, zero config — scrape any website directly from your terminal AI assistant.

Mar 5, 2026
Engineering·18 min read

Why I Built CRW: A Lightweight Firecrawl-Compatible Scraper in Rust

The story behind CRW — why Rust, why single-binary, and why Firecrawl-compatible for AI agent and RAG use cases.

Mar 4, 2026
Comparison·14 min read

CRW vs Crawl4AI: Rust REST API vs Python Framework for AI Scraping

Compare CRW and Crawl4AI for AI agent and RAG workflows. Covers deployment, API design, memory, and the key tradeoffs.

Mar 3, 2026
Comparison·18 min read

Firecrawl vs Crawl4AI vs CRW: Best Tool for Self-Hosted AI Scraping?

A detailed three-way comparison of Firecrawl, Crawl4AI, and CRW — covering deployment, performance, memory, API design, and which tool fits which team.

Mar 2, 2026
Comparison·15 min read

CRW vs Firecrawl: A Practical Comparison for Self-Hosted Web Scraping

A detailed comparison of CRW and Firecrawl covering performance, memory usage, deployment, and which tool fits which use case.

Mar 1, 2026