Blog category
Engineering
Engineering notes on architecture, performance, benchmarks, releases, and infrastructure tradeoffs behind fastCRW.
CRW v0.7.0: LLM Summary and Search Answer (BYOK, No Token Markup)
v0.7.0 adds AI summaries to /scrape, Perplexity-style answers with citations to /search, and per-result LLM summaries — all BYOK, no CRW credit markup on tokens.
CRW v0.0.10: Rate Limiting, Crawl Cancel, and Machine-Readable Error Codes
CRW v0.0.10 adds configurable rate limiting, a crawl cancel endpoint, machine-readable error codes on every error response, fenced code blocks, and cleaner markdown output for RAG pipelines.
The Real Cost of Self-Hosting vs Cloud Scraping APIs
Self-hosted vs cloud scraping API costs — TCO breakdown with real calculations for VPS, engineering time, and CRW's lightweight edge.
CRW v0.0.2: CSS Selectors, Chunking, BM25 Scoring, and Stealth Mode
CRW v0.0.2 adds CSS/XPath extraction, RAG-ready chunking with BM25 and cosine scoring, stealth mode for bot detection bypass, per-request proxy, and a setup command for JS rendering.
CRW v0.0.11: Stealth Anti-Bot Bypass, Chrome Failover, and Cloudflare Challenge Retry
CRW v0.0.11 adds automatic stealth JavaScript injection to bypass bot detection, Chrome as a fallback renderer for complex SPAs, Cloudflare challenge auto-retry, and HTTP-to-CDP auto-escalation.
Single-Binary Infrastructure: Why It Matters for Developer Tools
The case for single-binary deployment in developer infrastructure — operational simplicity, CI speed, and why CRW ships as one 8 MB file.
Rust vs Python Web Scraping (2026) — 3-10x Faster, 6.6 MB RAM [Benchmarked]
Rust web scrapers run 3-10x faster than Python with 1/40th the RAM. We benchmarked fastCRW (Rust) against Scrapy, BeautifulSoup, and Playwright — latency, memory, throughput, and which to pick for your stack.
Why Every AI Agent Needs a Web Context Layer
Why AI agents need a web context layer — live scraping as infrastructure to reduce hallucinations. Build one with MCP, RAG, and CRW.
Why Low Memory Usage Matters in Self-Hosted Scraping
How idle RAM affects your hosting costs and concurrent throughput — and why CRW's 6.6 MB footprint changes the economics.
Inside CRW: Architecture of a Lightweight Rust Scraping API
A technical deep-dive into CRW's Axum-based API, lol-html parser, LightPanda integration, and how it achieves 6.6 MB idle RAM.
Where CRW Still Falls Short — and What We're Improving
An honest look at CRW's current limitations — screenshots, PDF parsing, anti-bot, SPA coverage, retry logic, caching — and the roadmap for each.
Introducing Search: Find, Scrape, and Extract in One API Call
CRW now includes a search endpoint. Search the web, get structured results, and optionally scrape every result page — all in a single API call.
CRW v0.0.8: Wikipedia Fix, BYOK Extraction, and Smarter Noise Detection
CRW v0.0.8 fixes Wikipedia extraction with onlyMainContent, adds bring-your-own-key LLM extraction, introduces 3-tier noise matching, and hardens the content cleaning pipeline.
What I Learned Benchmarking CRW Against Firecrawl and Crawl4AI
In-depth benchmark results from 500 URLs comparing CRW, Firecrawl, and Crawl4AI on latency, coverage, memory — with methodology, dataset breakdown, and reproducible scripts.
Why I Built CRW: A Lightweight Firecrawl-Compatible Scraper in Rust
The story behind CRW — why Rust, why single-binary, and why Firecrawl-compatible for AI agent and RAG use cases.
Browse more
Jump back to the full archive
This category contains 15 of 68 total posts in the fastCRW blog archive.
View all blog posts