Is Rust faster than Playwright for web scraping?

For static or server-rendered HTML, yes — a Rust HTTP scraper (reqwest + scraper) skips the browser entirely, so there is no JavaScript execution, no CSS rendering, and no GPU overhead. Each request goes out over a raw TCP connection and the HTML is parsed as a streaming byte sequence. Playwright must launch and maintain a Chromium, Firefox, or WebKit process, which adds hundreds of megabytes of RAM and renders the full page before you can read any data. For JS-heavy SPAs where content only exists after JavaScript runs, Playwright wins because it is the only way to get the rendered DOM without writing a custom JavaScript executor.

Can I replace Playwright with a Rust scraper for all sites?

No. Rust's reqwest + scraper combination works extremely well for server-rendered HTML — news articles, documentation, product pages, search engine results, most API endpoints that return HTML. It does not work for single-page applications where the page is a blank shell until JavaScript executes, or for pages behind anti-bot systems that require a real browser fingerprint (TLS fingerprint, Canvas, WebGL, etc.). Use Playwright when you need to interact with the page, handle redirects through JavaScript, or extract content that only appears after user events. Use Rust HTTP scraping — or fastCRW — for everything else.

What Rust crates do I need for web scraping?

The minimal stack is: `reqwest` (HTTP client, supports async + TLS + cookies + redirect following), `scraper` (CSS selector–based HTML parser built on html5ever), and `tokio` (async runtime). Add `serde` + `serde_json` for structured output, and `tokio::time` or `tokio-retry` for rate limiting and retries. For the Cargo.toml: `reqwest = { version = "0.12", features = ["json"] }`, `scraper = "0.20"`, `tokio = { version = "1", features = ["full"] }`. This gives you a fully async, concurrent scraper in a single statically-linked binary.

What is fastCRW and how does it relate to Rust web scraping?

fastCRW is a managed web scraping API built on a Rust engine that uses lol-html (Cloudflare's streaming HTML parser) for most pages, with an optional LightPanda fallback for JavaScript-heavy pages. It exposes a Firecrawl-compatible REST API, so any language can call it without maintaining Rust dependencies. You get the same low-latency, low-memory profile as writing your own Rust scraper, plus built-in markdown output, JSON schema extraction, MCP server support, and a managed cloud option — without owning the crate stack or the binary build pipeline.

Does Playwright work for scraping sites that block bots?

Playwright with stealth plugins (playwright-extra + puppeteer-stealth) can bypass many common bot-detection systems by patching browser fingerprints. It does not bypass the most sophisticated enterprise anti-bot services (Cloudflare Enterprise, PerimeterX Pro, DataDome) without additional residential proxy and browser fingerprint infrastructure. A plain Rust HTTP scraper is even easier to detect because it sends a non-browser TLS fingerprint. If anti-bot bypass is a hard requirement, neither a bare Rust scraper nor stock Playwright will reliably handle the most hardened targets — you need a dedicated proxy and fingerprint rotation service on top.

Cargo (Rust) vs Playwright for Web Scraping: When to Use Each

Short Answer

When developers search "cargo vs playwright" they usually mean one of two things: the Cargo build toolchain (and Rust's reqwest/scraper/tokio crate stack) vs Playwright's headless-browser automation for scraping. The honest answer is that they solve different problems.

Rust HTTP scraping (Cargo + reqwest + scraper): Best for server-rendered HTML at scale — low RAM, low latency, single statically-linked binary, no browser.
Playwright: Best when you genuinely need a browser — complex SPAs, form submission, click-driven content, anti-bot fingerprinting, screenshots.
fastCRW: Rust-speed scraping as a REST API — you get the Rust engine's performance without writing or maintaining the crate stack.

	Rust (Cargo crates)	Playwright	fastCRW
Approach	HTTP + HTML parse	Headless browser automation	HTTP + HTML parse (Rust engine)
Language	Rust	JS, Python, Java, C#	Any (REST API)
Browser required	No	Yes (Chromium/Firefox/WebKit)	No (LightPanda opt-in)
RAM per worker	Tiny (no browser baseline)	150–400 MB per browser	Tiny (no browser baseline)
JS-heavy SPAs	No	✅ Full browser	Via LightPanda
Page interactions	No	✅ Click, type, scroll	No
Markdown output	Manual	Manual	✅ Built-in
MCP server	Manual	No	✅ Built-in
JSON extraction	Manual	Manual	✅ JSON schema via API
Deployment	Single binary	~1.5 GB Docker image	Single ~8 MB binary
License	MIT/Apache (crates)	Apache 2.0	AGPL-3.0

What "Cargo" Means in This Context

Cargo is Rust's package manager and build system — the equivalent of npm for Node or pip for Python. When someone says "use Cargo for web scraping," they mean writing a Rust program that pulls in the right crates via Cargo.toml and compiles to a single statically-linked binary. The three crates that form the scraping core are:

reqwest: An ergonomic, async HTTP client with TLS, redirect following, cookie handling, and optional JSON support. The Rust equivalent of Python's httpx or Node's undici.
scraper: A CSS selector–based HTML parser built on top of Mozilla's html5ever parser (the same parser Firefox uses). You write CSS selectors, it returns matching elements.
tokio: The async runtime that lets you run many concurrent HTTP requests without blocking threads. This is why Rust scrapers can handle high concurrency on minimal RAM — there is no thread-per-request overhead.

Together, these give you a scraper that fetches HTML over HTTP and parses it with CSS selectors. What they do not give you is JavaScript execution, browser rendering, or any ability to interact with a running page.

What Playwright Actually Is

Playwright is Microsoft's browser automation library. It controls Chromium, Firefox, and WebKit through a common API, supports JavaScript, Python, Java, and C#, and includes features like auto-waiting, network interception, and codegen. Originally built for end-to-end testing, it is widely used for scraping because any page a human can see in a browser, Playwright can extract.

The tradeoff is resource cost. Every Playwright session launches a real browser process — Chromium alone idles at 80–150 MB and climbs further with each open tab. Loading a single page takes 2–5 seconds including the browser render cycle. At ten concurrent sessions you are looking at 1.5 GB+ just for browser processes, before any application logic runs.

The Core Architecture Difference

The split between these two approaches is not about language preference. It is about where the content lives when you need to read it.

When the HTML arrives in the HTTP response

For server-rendered pages — news articles, documentation, product listings, search results, most content sites — the HTML you need is present in the HTTP response body. No JavaScript has to run. In this case:

A Rust scraper (or fastCRW) makes an HTTP GET, streams the response, and parses HTML as bytes arrive. No browser spawns. No JavaScript engine. No GPU.
Playwright makes an HTTP GET, hands the response to a full browser engine, executes any JavaScript, waits for the DOM to stabilize, then lets you read the content. Three to five seconds later, with 200+ MB resident in memory.

For server-rendered HTML, bringing a browser is strictly overhead. The content was in the HTTP response the whole time.

When the HTML only exists after JavaScript runs

For SPAs built with React, Vue, or Angular where the HTTP response is just <div id="root"></div> and all content loads client-side, you cannot read the page without executing JavaScript. A Rust HTTP scraper sees an empty shell. Playwright sees the fully rendered page. This is the one scenario where Playwright's overhead is the price of admission.

Rust Web Scraping: A Real Code Example

Here is a minimal Rust scraper using Cargo with the three core crates. This is what you add to Cargo.toml and what a basic scraper looks like.

Cargo.toml

[package]
name = "my-scraper"
version = "0.1.0"
edition = "2021"

[dependencies]
reqwest = { version = "0.12", features = ["json"] }
scraper = "0.20"
tokio = { version = "1", features = ["full"] }
serde = { version = "1", features = ["derive"] }
serde_json = "1"

src/main.rs — scrape a product page

use reqwest::Client;
use scraper::{Html, Selector};
use serde::Serialize;

#[derive(Serialize, Debug)]
struct Product {
    name: String,
    price: String,
    description: String,
}

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let client = Client::builder()
        .user_agent("Mozilla/5.0 (compatible; MyScraper/1.0)")
        .build()?;

    let html = client
        .get("https://example.com/product")
        .send()
        .await?
        .text()
        .await?;

    let document = Html::parse_document(&html);

    let name_sel = Selector::parse("h1.product-title").unwrap();
    let price_sel = Selector::parse(".price").unwrap();
    let desc_sel  = Selector::parse(".product-description").unwrap();

    let product = Product {
        name: document
            .select(&name_sel)
            .next()
            .map(|el| el.text().collect::<String>())
            .unwrap_or_default(),
        price: document
            .select(&price_sel)
            .next()
            .map(|el| el.text().collect::<String>())
            .unwrap_or_default(),
        description: document
            .select(&desc_sel)
            .next()
            .map(|el| el.text().collect::<String>())
            .unwrap_or_default(),
    };

    println!("{}", serde_json::to_string_pretty(&product)?);
    Ok(())
}

Running cargo build --release produces a single statically-linked binary in target/release/my-scraper. You can copy that binary to any Linux machine and run it — no runtime, no dependencies, no Docker image needed beyond the binary itself. This is the deployment story Rust gives you.

Concurrent scraping with tokio

use futures::stream::{self, StreamExt};
use reqwest::Client;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let client = Client::builder()
        .user_agent("Mozilla/5.0 (compatible; MyScraper/1.0)")
        .build()?;

    let urls = vec![
        "https://example.com/page-1",
        "https://example.com/page-2",
        "https://example.com/page-3",
        // ... hundreds more
    ];

    // Scrape 10 pages concurrently — no browser processes, just HTTP connections
    let results = stream::iter(urls)
        .map(|url| {
            let client = client.clone();
            async move {
                client.get(url).send().await?.text().await
            }
        })
        .buffer_unordered(10)
        .collect::<Vec<_>>()
        .await;

    println!("Scraped {} pages", results.len());
    Ok(())
}

With Playwright, ten concurrent page loads would spawn ten browser contexts — each carrying the full Chromium process overhead. With tokio, ten concurrent requests share the same lightweight event loop and connection pool. The memory footprint scales with the number of in-flight HTTP responses, not the number of browser processes.

The Playwright Equivalent

Here is what the same product scrape looks like in Playwright (Node.js). The code is about the same length, but the runtime profile is completely different.

import { chromium } from "playwright";

const browser = await chromium.launch();
const page = await browser.newPage();

await page.goto("https://example.com/product");

const product = {
  name:        await page.textContent("h1.product-title"),
  price:       await page.textContent(".price"),
  description: await page.textContent(".product-description"),
};

await browser.close();
console.log(product);
// Runtime: ~3 seconds, ~300 MB RAM for the Chromium process

For a static product page, the Playwright version spends most of its time waiting for Chromium to boot, connect to the DevTools Protocol, render the page, and stabilize the DOM — for content that arrived in the first HTTP response. The Rust version reads that HTTP response directly.

When Playwright Wins

Playwright is the right tool when you genuinely need what only a browser provides.

1. Single-page applications

React/Vue/Angular SPAs where the HTTP response is a near-empty shell. Playwright executes the JavaScript bundle, waits for the app to hydrate, and lets you read the rendered DOM. A Rust HTTP scraper or fastCRW's default HTTP mode cannot see content that only exists after JavaScript runs. (fastCRW falls back to LightPanda for these pages, which covers many SPAs but is not at Playwright's maturity for the most complex client-side routing.)

2. Form interaction and login flows

Logging in — typing credentials, clicking a button, handling MFA redirects — requires a browser that can execute JavaScript event handlers and manage session cookies across page navigations. Neither a Rust HTTP client nor fastCRW simulates user interaction. Playwright's auto-wait API handles this reliably.

3. Anti-bot fingerprint requirements

Some sites use advanced bot detection that validates a real browser fingerprint: TLS JA3/JA4 hash, Canvas/WebGL fingerprint, Chromium's V8 heap signatures. With stealth plugins (playwright-extra + puppeteer-stealth), Playwright can pass many of these checks. A bare reqwest request sends an obvious non-browser TLS fingerprint and fails immediately. (Neither approach is reliable against the hardest enterprise anti-bot systems without additional proxy infrastructure.)

4. Screenshots and visual capture

If your workflow requires screenshots of rendered pages, you have options. fastCRW supports screenshot output on the v2 scrape API (a formats: ["screenshot"] or ["screenshot@fullPage"] request returns data.screenshot as a base64 PNG data URL, captured via CDP/Chrome). Playwright also renders the full page and can capture it as PNG or PDF, with finer-grained control over the capture if you need it.

5. E2E testing alongside scraping

If your team already uses Playwright for end-to-end testing and wants to share selectors, fixtures, and infrastructure between tests and scrapers, the ergonomics of staying in Playwright may outweigh the resource cost for your specific workload.

When Rust / HTTP-First Wins

For the majority of web scraping use cases — especially at scale or in AI pipeline contexts — browser automation is unnecessary overhead.

1. High-volume content extraction

At 1,000 pages per run, browser automation needs a large machine just for browser processes. A Rust scraper or fastCRW handles the same volume with a tiny fraction of the RAM, because there is no browser to spawn and no JavaScript to execute. This is not a marginal difference — it is the difference between needing a 32 GB server and fitting comfortably on a 1 GB VPS.

2. AI agent pipelines and RAG

AI agents need clean text, not rendered DOM. fastCRW outputs markdown directly — the format LLMs consume — without requiring you to post-process Playwright's DOM output into something your model can read. For scrape-to-RAG pipelines, the HTTP-first approach eliminates both the browser overhead and the DOM-to-text conversion step.

3. Constrained infrastructure

Playwright on a small VPS is painful: Chromium alone may consume all available RAM, leaving nothing for your application. The Rust/HTTP-first approach — whether you write the crates yourself or call fastCRW — runs on the smallest VPS tier. fastCRW's Docker image is a single ~8 MB binary (CANONICAL-FACTS.md §7, source: OSS README §"Structural footprint"), versus a ~1.5 GB Playwright Docker image.

4. Server-rendered content sites

News articles, documentation, blog posts, product listings, job boards — the vast majority of web content is server-rendered. These pages do not need JavaScript to extract their content. Using a headless browser for them is engineering overhead with no benefit.

5. Scheduled, unattended pipelines

A binary that runs and exits cleanly is easier to schedule and monitor than a process that manages browser lifecycles. Rust scrapers and fastCRW have no warm-up time, no browser process leak risk, and no Chromium version mismatch to debug after an auto-update.

The Maintenance Burden of Writing Your Own Rust Scraper

Building a Rust scraper with Cargo is genuinely attractive for its performance and deployment story, but there is a maintenance surface that is easy to underestimate before you start.

TLS and HTTP/2: reqwest handles most of this, but you will deal with certificate pinning failures, HTTP/2 multiplexing quirks, and connection pool tuning for high-concurrency workloads.
Rate limiting and retry logic: You implement this yourself — exponential backoff, jitter, per-domain rate limits, retry budgets. These are not hard to write, but they add code surface.
Content cleaning: reqwest gives you raw HTML. Converting that to clean text for an LLM — stripping navigation, ads, footers, extracting main content — requires either a good HTML parser strategy or integrating a library like readability-rs.
JavaScript fallback: When you hit a page that requires JavaScript, you need to detect it and either skip the page, flag it for manual review, or integrate a separate browser-based fallback. This is the part that takes a tidy single-binary story and turns it into a multi-component architecture.
Cross-compilation: If your deployment target is ARM or musl Linux, Rust's cross-compilation story is good but not zero-effort, especially once you have native dependencies in your transitive tree.

fastCRW absorbs all of this. The Rust engine, lol-html parser, LightPanda fallback, content cleaning, and retry logic are already built in. You call a REST endpoint.

fastCRW: Rust-Speed Scraping Without the Crate Stack

fastCRW is built on the same Rust-first, HTTP-first philosophy as a hand-rolled reqwest scraper, but exposed as a Firecrawl-compatible REST API. You call it from any language — Python, TypeScript, Go, Java — and get the Rust engine's throughput and memory profile without maintaining Rust dependencies or managing a binary build pipeline.

Simple scrape — one API call

curl -X POST https://api.fastcrw.com/v1/scrape \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com/product",
    "formats": ["markdown"]
  }'

Structured JSON extraction — no CSS selectors needed

import FirecrawlApp from "@mendable/firecrawl-js";

const app = new FirecrawlApp({
  apiKey: "YOUR_API_KEY",
  apiUrl: "https://api.fastcrw.com",
});

const result = await app.scrapeUrl("https://example.com/product", {
  formats: ["json"],
  jsonSchema: {
    type: "object",
    properties: {
      name:        { type: "string" },
      price:       { type: "number" },
      description: { type: "string" },
      in_stock:    { type: "boolean" },
    },
    required: ["name", "price"],
  },
});

console.log(result.json?.name);  // "Widget Pro"
console.log(result.json?.price); // 29.99

The fastCRW API implements Firecrawl's REST interface, so it works with the official Firecrawl SDK and any LangChain or LlamaIndex integration that accepts an api_url override. Change the base URL, keep every line of your existing client code.

Self-host for free

docker run -p 3000:3000 ghcr.io/us/crw:latest

The AGPL-3.0 open-core engine runs on your own infrastructure at zero per-request cost. The Docker image is a single ~8 MB binary (source: OSS README §"Structural footprint"). No Redis, no Node.js, no Playwright browser bundle — just the binary.

Performance Numbers From the Canonical Benchmark

fastCRW was benchmarked against Firecrawl and Crawl4AI on Firecrawl's own public scrape-content-dataset-v1 — 1,000 URLs, 819 of which carry labeled ground truth. Harness: diagnose_3way.py, single run, 3,000 requests, 2026-05-08 (source: bench/server-runs/RESULT_3WAY_1000_FULL.md).

Metric	fastCRW	Crawl4AI	Firecrawl
Truth-recall (of 819 labeled)	63.74% (522)	59.95% (491)	56.04% (459)
Scrape-success (of reachable URLs)	91.8%	—	—
Thrown errors (of 3,000)	0	0	0
p50 latency	1,914 ms	1,916 ms	2,305 ms
p90 latency (fast mode)	4,348 ms	4,754 ms	6,937 ms
p99 latency	15,012 ms	13,749 ms	21,107 ms

fastCRW leads on truth-recall (+3.79 percentage points over Crawl4AI, +7.70 over Firecrawl) and ties Crawl4AI on median latency (1,914 ms vs 1,916 ms). In fast mode, fastCRW's p90 of 4,348 ms is the lowest of the three (Crawl4AI 4,754 ms, Firecrawl 6,937 ms). fastCRW also recovers 34 URLs that neither other tool reaches — 70% more unique recoveries than both combined. Full latency distribution and one-command repro are on /benchmarks.

These numbers compare HTTP-first scrapers. A Playwright-based benchmark would show dramatically different absolute latencies due to browser render overhead, but that is measuring a different workload — one where you genuinely need JavaScript execution.

Decision Framework: Which to Use

Use Rust crates (Cargo + reqwest + scraper) when:

You are already writing a Rust application and want scraping as a native library, not an external service
You need the tightest possible control over HTTP behavior (custom TLS config, connection pooling, header manipulation)
Your target pages are server-rendered HTML and you want a single binary with no external dependencies
You are building a high-throughput pipeline where even a minimal API call overhead matters

Use Playwright when:

The page is an SPA where content only exists after JavaScript executes
You need to interact with the page — click, type, scroll, wait for user-triggered events
You need fine-grained, scripted control over screenshots or visual capture (fastCRW already covers basic full-page screenshots via its v2 scrape API)
You need to pass browser fingerprint checks on heavily protected sites
You are already running Playwright for E2E testing and want to share that infrastructure

Use fastCRW when:

You want Rust-speed HTTP scraping from Python, TypeScript, Go, or any other language without owning a Rust codebase
You need clean markdown output for LLMs or RAG pipelines without post-processing DOM output
You want JSON schema–based structured extraction instead of maintaining CSS selectors that break on page redesigns
You need MCP server integration for AI agent workflows — fastCRW ships a built-in MCP server
You want to self-host a full scraping API on a small VPS without browser overhead
You are already using Firecrawl's API and want a compatible self-hosted alternative

The Hybrid Pattern

In production, most teams end up with a hybrid: an HTTP-first scraper for the 80–90% of pages that are server-rendered, with a browser fallback for the remainder. fastCRW implements this as its default renderer selection — http → lightpanda → chrome fallback chain, auto-selected per page. If you are writing Rust directly, you can replicate this by detecting SPA shells (empty body, script-only HTML) and routing those requests to a separate Playwright service.

The key insight is that committing fully to Playwright for all pages means paying browser overhead for every page, even the ones where it adds no value. The HTTP-first approach optimizes for the common case and pays the browser cost only when necessary.

Try fastCRW

Managed Cloud

The fastest path: fastCRW cloud gives you 500 one-time lifetime credits on the Free tier with no credit card required. Same Firecrawl-compatible API, Rust engine, built-in MCP — infrastructure handled for you.

Self-Host (Free, AGPL-3.0)

docker run -p 3000:3000 ghcr.io/us/crw:latest

View source on GitHub · Read the docs