Short Answer
When developers search "cargo vs playwright" they usually mean one of two things: the Cargo build toolchain (and Rust's reqwest/scraper/tokio crate stack) vs Playwright's headless-browser automation for scraping. The honest answer is that they solve different problems.
- Rust HTTP scraping (Cargo + reqwest + scraper): Best for server-rendered HTML at scale — low RAM, low latency, single statically-linked binary, no browser.
- Playwright: Best when you genuinely need a browser — complex SPAs, form submission, click-driven content, anti-bot fingerprinting, screenshots.
- fastCRW: Rust-speed scraping as a REST API — you get the Rust engine's performance without writing or maintaining the crate stack.
| Rust (Cargo crates) | Playwright | fastCRW | |
|---|---|---|---|
| Approach | HTTP + HTML parse | Headless browser automation | HTTP + HTML parse (Rust engine) |
| Language | Rust | JS, Python, Java, C# | Any (REST API) |
| Browser required | No | Yes (Chromium/Firefox/WebKit) | No (LightPanda opt-in) |
| RAM per worker | Tiny (no browser baseline) | 150–400 MB per browser | Tiny (no browser baseline) |
| JS-heavy SPAs | No | ✅ Full browser | Via LightPanda |
| Page interactions | No | ✅ Click, type, scroll | No |
| Markdown output | Manual | Manual | ✅ Built-in |
| MCP server | Manual | No | ✅ Built-in |
| JSON extraction | Manual | Manual | ✅ JSON schema via API |
| Deployment | Single binary | ~1.5 GB Docker image | Single ~8 MB binary |
| License | MIT/Apache (crates) | Apache 2.0 | AGPL-3.0 |
What "Cargo" Means in This Context
Cargo is Rust's package manager and build system — the equivalent of npm for Node or pip for Python. When someone says "use Cargo for web scraping," they mean writing a Rust program that pulls in the right crates via Cargo.toml and compiles to a single statically-linked binary. The three crates that form the scraping core are:
- reqwest: An ergonomic, async HTTP client with TLS, redirect following, cookie handling, and optional JSON support. The Rust equivalent of Python's
httpxor Node'sundici. - scraper: A CSS selector–based HTML parser built on top of Mozilla's
html5everparser (the same parser Firefox uses). You write CSS selectors, it returns matching elements. - tokio: The async runtime that lets you run many concurrent HTTP requests without blocking threads. This is why Rust scrapers can handle high concurrency on minimal RAM — there is no thread-per-request overhead.
Together, these give you a scraper that fetches HTML over HTTP and parses it with CSS selectors. What they do not give you is JavaScript execution, browser rendering, or any ability to interact with a running page.
What Playwright Actually Is
Playwright is Microsoft's browser automation library. It controls Chromium, Firefox, and WebKit through a common API, supports JavaScript, Python, Java, and C#, and includes features like auto-waiting, network interception, and codegen. Originally built for end-to-end testing, it is widely used for scraping because any page a human can see in a browser, Playwright can extract.
The tradeoff is resource cost. Every Playwright session launches a real browser process — Chromium alone idles at 80–150 MB and climbs further with each open tab. Loading a single page takes 2–5 seconds including the browser render cycle. At ten concurrent sessions you are looking at 1.5 GB+ just for browser processes, before any application logic runs.
The Core Architecture Difference
The split between these two approaches is not about language preference. It is about where the content lives when you need to read it.
When the HTML arrives in the HTTP response
For server-rendered pages — news articles, documentation, product listings, search results, most content sites — the HTML you need is present in the HTTP response body. No JavaScript has to run. In this case:
- A Rust scraper (or fastCRW) makes an HTTP GET, streams the response, and parses HTML as bytes arrive. No browser spawns. No JavaScript engine. No GPU.
- Playwright makes an HTTP GET, hands the response to a full browser engine, executes any JavaScript, waits for the DOM to stabilize, then lets you read the content. Three to five seconds later, with 200+ MB resident in memory.
For server-rendered HTML, bringing a browser is strictly overhead. The content was in the HTTP response the whole time.
When the HTML only exists after JavaScript runs
For SPAs built with React, Vue, or Angular where the HTTP response is just <div id="root"></div> and all content loads client-side, you cannot read the page without executing JavaScript. A Rust HTTP scraper sees an empty shell. Playwright sees the fully rendered page. This is the one scenario where Playwright's overhead is the price of admission.
Rust Web Scraping: A Real Code Example
Here is a minimal Rust scraper using Cargo with the three core crates. This is what you add to Cargo.toml and what a basic scraper looks like.
Cargo.toml
[package]
name = "my-scraper"
version = "0.1.0"
edition = "2021"
[dependencies]
reqwest = { version = "0.12", features = ["json"] }
scraper = "0.20"
tokio = { version = "1", features = ["full"] }
serde = { version = "1", features = ["derive"] }
serde_json = "1"
src/main.rs — scrape a product page
use reqwest::Client;
use scraper::{Html, Selector};
use serde::Serialize;
#[derive(Serialize, Debug)]
struct Product {
name: String,
price: String,
description: String,
}
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let client = Client::builder()
.user_agent("Mozilla/5.0 (compatible; MyScraper/1.0)")
.build()?;
let html = client
.get("https://example.com/product")
.send()
.await?
.text()
.await?;
let document = Html::parse_document(&html);
let name_sel = Selector::parse("h1.product-title").unwrap();
let price_sel = Selector::parse(".price").unwrap();
let desc_sel = Selector::parse(".product-description").unwrap();
let product = Product {
name: document
.select(&name_sel)
.next()
.map(|el| el.text().collect::<String>())
.unwrap_or_default(),
price: document
.select(&price_sel)
.next()
.map(|el| el.text().collect::<String>())
.unwrap_or_default(),
description: document
.select(&desc_sel)
.next()
.map(|el| el.text().collect::<String>())
.unwrap_or_default(),
};
println!("{}", serde_json::to_string_pretty(&product)?);
Ok(())
}
Running cargo build --release produces a single statically-linked binary in target/release/my-scraper. You can copy that binary to any Linux machine and run it — no runtime, no dependencies, no Docker image needed beyond the binary itself. This is the deployment story Rust gives you.
Concurrent scraping with tokio
use futures::stream::{self, StreamExt};
use reqwest::Client;
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let client = Client::builder()
.user_agent("Mozilla/5.0 (compatible; MyScraper/1.0)")
.build()?;
let urls = vec![
"https://example.com/page-1",
"https://example.com/page-2",
"https://example.com/page-3",
// ... hundreds more
];
// Scrape 10 pages concurrently — no browser processes, just HTTP connections
let results = stream::iter(urls)
.map(|url| {
let client = client.clone();
async move {
client.get(url).send().await?.text().await
}
})
.buffer_unordered(10)
.collect::<Vec<_>>()
.await;
println!("Scraped {} pages", results.len());
Ok(())
}
With Playwright, ten concurrent page loads would spawn ten browser contexts — each carrying the full Chromium process overhead. With tokio, ten concurrent requests share the same lightweight event loop and connection pool. The memory footprint scales with the number of in-flight HTTP responses, not the number of browser processes.
The Playwright Equivalent
Here is what the same product scrape looks like in Playwright (Node.js). The code is about the same length, but the runtime profile is completely different.
import { chromium } from "playwright";
const browser = await chromium.launch();
const page = await browser.newPage();
await page.goto("https://example.com/product");
const product = {
name: await page.textContent("h1.product-title"),
price: await page.textContent(".price"),
description: await page.textContent(".product-description"),
};
await browser.close();
console.log(product);
// Runtime: ~3 seconds, ~300 MB RAM for the Chromium process
For a static product page, the Playwright version spends most of its time waiting for Chromium to boot, connect to the DevTools Protocol, render the page, and stabilize the DOM — for content that arrived in the first HTTP response. The Rust version reads that HTTP response directly.
When Playwright Wins
Playwright is the right tool when you genuinely need what only a browser provides.
1. Single-page applications
React/Vue/Angular SPAs where the HTTP response is a near-empty shell. Playwright executes the JavaScript bundle, waits for the app to hydrate, and lets you read the rendered DOM. A Rust HTTP scraper or fastCRW's default HTTP mode cannot see content that only exists after JavaScript runs. (fastCRW falls back to LightPanda for these pages, which covers many SPAs but is not at Playwright's maturity for the most complex client-side routing.)
2. Form interaction and login flows
Logging in — typing credentials, clicking a button, handling MFA redirects — requires a browser that can execute JavaScript event handlers and manage session cookies across page navigations. Neither a Rust HTTP client nor fastCRW simulates user interaction. Playwright's auto-wait API handles this reliably.
3. Anti-bot fingerprint requirements
Some sites use advanced bot detection that validates a real browser fingerprint: TLS JA3/JA4 hash, Canvas/WebGL fingerprint, Chromium's V8 heap signatures. With stealth plugins (playwright-extra + puppeteer-stealth), Playwright can pass many of these checks. A bare reqwest request sends an obvious non-browser TLS fingerprint and fails immediately. (Neither approach is reliable against the hardest enterprise anti-bot systems without additional proxy infrastructure.)
4. Screenshots and visual capture
If your workflow requires screenshots of rendered pages, browser automation is the only option. fastCRW does not currently support screenshot output (HTTP 422 on formats: ["screenshot"]). Playwright renders the full page and can capture it as PNG or PDF.
5. E2E testing alongside scraping
If your team already uses Playwright for end-to-end testing and wants to share selectors, fixtures, and infrastructure between tests and scrapers, the ergonomics of staying in Playwright may outweigh the resource cost for your specific workload.
When Rust / HTTP-First Wins
For the majority of web scraping use cases — especially at scale or in AI pipeline contexts — browser automation is unnecessary overhead.
1. High-volume content extraction
At 1,000 pages per run, browser automation needs a large machine just for browser processes. A Rust scraper or fastCRW handles the same volume with a tiny fraction of the RAM, because there is no browser to spawn and no JavaScript to execute. This is not a marginal difference — it is the difference between needing a 32 GB server and fitting comfortably on a 1 GB VPS.
2. AI agent pipelines and RAG
AI agents need clean text, not rendered DOM. fastCRW outputs markdown directly — the format LLMs consume — without requiring you to post-process Playwright's DOM output into something your model can read. For scrape-to-RAG pipelines, the HTTP-first approach eliminates both the browser overhead and the DOM-to-text conversion step.
3. Constrained infrastructure
Playwright on a small VPS is painful: Chromium alone may consume all available RAM, leaving nothing for your application. The Rust/HTTP-first approach — whether you write the crates yourself or call fastCRW — runs on the smallest VPS tier. fastCRW's Docker image is a single ~8 MB binary (CANONICAL-FACTS.md §7, source: OSS README §"Structural footprint"), versus a ~1.5 GB Playwright Docker image.
4. Server-rendered content sites
News articles, documentation, blog posts, product listings, job boards — the vast majority of web content is server-rendered. These pages do not need JavaScript to extract their content. Using a headless browser for them is engineering overhead with no benefit.
5. Scheduled, unattended pipelines
A binary that runs and exits cleanly is easier to schedule and monitor than a process that manages browser lifecycles. Rust scrapers and fastCRW have no warm-up time, no browser process leak risk, and no Chromium version mismatch to debug after an auto-update.
The Maintenance Burden of Writing Your Own Rust Scraper
Building a Rust scraper with Cargo is genuinely attractive for its performance and deployment story, but there is a maintenance surface that is easy to underestimate before you start.
- TLS and HTTP/2: reqwest handles most of this, but you will deal with certificate pinning failures, HTTP/2 multiplexing quirks, and connection pool tuning for high-concurrency workloads.
- Rate limiting and retry logic: You implement this yourself — exponential backoff, jitter, per-domain rate limits, retry budgets. These are not hard to write, but they add code surface.
- Content cleaning: reqwest gives you raw HTML. Converting that to clean text for an LLM — stripping navigation, ads, footers, extracting main content — requires either a good HTML parser strategy or integrating a library like readability-rs.
- JavaScript fallback: When you hit a page that requires JavaScript, you need to detect it and either skip the page, flag it for manual review, or integrate a separate browser-based fallback. This is the part that takes a tidy single-binary story and turns it into a multi-component architecture.
- Cross-compilation: If your deployment target is ARM or musl Linux, Rust's cross-compilation story is good but not zero-effort, especially once you have native dependencies in your transitive tree.
fastCRW absorbs all of this. The Rust engine, lol-html parser, LightPanda fallback, content cleaning, and retry logic are already built in. You call a REST endpoint.
fastCRW: Rust-Speed Scraping Without the Crate Stack
fastCRW is built on the same Rust-first, HTTP-first philosophy as a hand-rolled reqwest scraper, but exposed as a Firecrawl-compatible REST API. You call it from any language — Python, TypeScript, Go, Java — and get the Rust engine's throughput and memory profile without maintaining Rust dependencies or managing a binary build pipeline.
Simple scrape — one API call
curl -X POST https://api.fastcrw.com/v1/scrape \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://example.com/product",
"formats": ["markdown"]
}'
Structured JSON extraction — no CSS selectors needed
import FirecrawlApp from "@mendable/firecrawl-js";
const app = new FirecrawlApp({
apiKey: "YOUR_API_KEY",
apiUrl: "https://api.fastcrw.com",
});
const result = await app.scrapeUrl("https://example.com/product", {
formats: ["json"],
jsonSchema: {
type: "object",
properties: {
name: { type: "string" },
price: { type: "number" },
description: { type: "string" },
in_stock: { type: "boolean" },
},
required: ["name", "price"],
},
});
console.log(result.json?.name); // "Widget Pro"
console.log(result.json?.price); // 29.99
The fastCRW API implements Firecrawl's REST interface, so it works with the official Firecrawl SDK and any LangChain or LlamaIndex integration that accepts an api_url override. Change the base URL, keep every line of your existing client code.
Self-host for free
docker run -p 3000:3000 ghcr.io/us/crw:latest
The AGPL-3.0 open-core engine runs on your own infrastructure at zero per-request cost. The Docker image is a single ~8 MB binary (source: OSS README §"Structural footprint"). No Redis, no Node.js, no Playwright browser bundle — just the binary.
Performance Numbers From the Canonical Benchmark
fastCRW was benchmarked against Firecrawl and Crawl4AI on Firecrawl's own public scrape-content-dataset-v1 — 1,000 URLs, 819 of which carry labeled ground truth. Harness: diagnose_3way.py, single run, 3,000 requests, 2026-05-08 (source: bench/server-runs/RESULT_3WAY_1000_FULL.md).
| Metric | fastCRW | Crawl4AI | Firecrawl |
|---|---|---|---|
| Truth-recall (of 819 labeled) | 63.74% (522) | 59.95% (491) | 56.04% (459) |
| Scrape-success (of 1,000) | 87.7% (877) | 83.5% (835) | 89.7% (897) |
| Thrown errors (of 3,000) | 0 | 0 | 0 |
| p50 latency | 1,914 ms | 1,916 ms | 2,305 ms |
| p90 latency | 14,157 ms | 4,754 ms | 6,937 ms |
| p99 latency | 15,012 ms | 13,749 ms | 21,107 ms |
The honest story: fastCRW leads on truth-recall (+3.79 percentage points over Crawl4AI, +7.70 over Firecrawl) and ties Crawl4AI on median latency (1,914 ms vs 1,916 ms). Its p90 tail (14,157 ms) is the widest of the three — this is the cost of the chrome-stealth fallback that recovers the hard pages the other tools miss. The same mechanism that produces the recall win also produces the slow tail. Full latency distribution and one-command repro are on /benchmarks.
These numbers compare HTTP-first scrapers. A Playwright-based benchmark would show dramatically different absolute latencies due to browser render overhead, but that is measuring a different workload — one where you genuinely need JavaScript execution.
Decision Framework: Which to Use
Use Rust crates (Cargo + reqwest + scraper) when:
- You are already writing a Rust application and want scraping as a native library, not an external service
- You need the tightest possible control over HTTP behavior (custom TLS config, connection pooling, header manipulation)
- Your target pages are server-rendered HTML and you want a single binary with no external dependencies
- You are building a high-throughput pipeline where even a minimal API call overhead matters
Use Playwright when:
- The page is an SPA where content only exists after JavaScript executes
- You need to interact with the page — click, type, scroll, wait for user-triggered events
- You need screenshot or visual capture as part of the workflow
- You need to pass browser fingerprint checks on heavily protected sites
- You are already running Playwright for E2E testing and want to share that infrastructure
Use fastCRW when:
- You want Rust-speed HTTP scraping from Python, TypeScript, Go, or any other language without owning a Rust codebase
- You need clean markdown output for LLMs or RAG pipelines without post-processing DOM output
- You want JSON schema–based structured extraction instead of maintaining CSS selectors that break on page redesigns
- You need MCP server integration for AI agent workflows — fastCRW ships a built-in MCP server
- You want to self-host a full scraping API on a small VPS without browser overhead
- You are already using Firecrawl's API and want a compatible self-hosted alternative
The Hybrid Pattern
In production, most teams end up with a hybrid: an HTTP-first scraper for the 80–90% of pages that are server-rendered, with a browser fallback for the remainder. fastCRW implements this as its default renderer selection — http → lightpanda → chrome fallback chain, auto-selected per page. If you are writing Rust directly, you can replicate this by detecting SPA shells (empty body, script-only HTML) and routing those requests to a separate Playwright service.
The key insight is that committing fully to Playwright for all pages means paying browser overhead for every page, even the ones where it adds no value. The HTTP-first approach optimizes for the common case and pays the browser cost only when necessary.
Try fastCRW
Managed Cloud
The fastest path: fastCRW cloud gives you 500 one-time lifetime credits on the see plan pricing with no credit card required. Same Firecrawl-compatible API, Rust engine, built-in MCP — infrastructure handled for you.
Self-Host (Free, AGPL-3.0)
docker run -p 3000:3000 ghcr.io/us/crw:latest
View source on GitHub · Read the docs
Further Reading
- Rust vs Python Web Scraping: Lower Latency, Tiny Footprint
- Playwright vs Puppeteer vs fastCRW: AI Scraping Compared
- Browser Automation for AI Agents: Playwright, Stagehand, Browser Use, and APIs
- Rust vs Python Scrapers: An Architecture and Footprint Deep-Dive
- Public Benchmark: fastCRW vs Firecrawl vs Crawl4AI
