Can CRW handle JavaScript-heavy SPAs?

Yes. CRW renders JavaScript by default and waits for network activity to settle before returning content. You can also set a custom waitFor duration in milliseconds or wait for a specific CSS selector to appear before returning. Renderer selection runs in auto mode by default, picking the lightest renderer that can handle the page.

Does CRW bypass Cloudflare?

CRW includes a built-in chrome-stealth renderer that handles Cloudflare's standard bot protection. Sites with extremely aggressive challenge modes that require human CAPTCHA completion may not be reachable via any automated tool. In fast mode, fastCRW's p90 latency is 4348 ms — the lowest of the three tools tested on the 1,000-URL diagnose_3way.py benchmark — because the chrome-stealth fallback is reserved only for pages that genuinely need it.

What's the difference between scrape, crawl, and map?

scrape returns content from one URL. crawl follows links from a starting URL via BFS and returns content from all discovered pages, up to maxDepth (cap 10) and maxPages (cap 1000). map discovers all URLs on a site via sitemap and link traversal without returning content. In credits, scrape and crawl-per-page cost 1 each (any renderer, including chrome), and map costs 1.

How do I handle rate limits at scale?

fastCRW's managed cloud handles rate limiting transparently. For self-hosted deployments you configure rate limits per domain to avoid being blocked. The API returns 429 responses you can use to implement exponential backoff in your agent, and 100 parallel scrapes is a config change rather than an infrastructure project.

Is browser automation required for agentic computer use?

Agentic computer use such as Claude Computer Use or OpenAI Operator uses a full browser because it needs to interact with the UI — clicking, typing, navigating. For the common agent use case of reading web content rather than interacting with web UIs, a scraping API is significantly better: lower latency, under 5 MB of client memory instead of 200-600 MB per browser, and built-in anti-bot handling.

Is CRW free to use for AI agents?

fastCRW's Free plan gives 500 one-time lifetime credits with no credit card required, where a scrape costs 1 credit and a search costs 1 credit per query. Paid tiers start at $13/mo for Hobby (3,000 credits, launch pricing through 2026-06-01). The engine is also AGPL-3.0 open source, so self-hosting it is free — you pay only for your own server.

Browser Automation for AI Agents: Playwright, Stagehand, Browser Use, and APIs (2026)

Browser automation for AI agents has exploded in 2026 — Playwright, Stagehand, Browser Use, Browserbase, Playwright MCP, and scraping APIs all compete for the same job. This guide tells you exactly which tool fits which use case.

The Browser Automation Landscape for AI Agents in 2026

Browser automation for AI agents now spans five distinct categories: classic frameworks (Playwright, Puppeteer), AI-native control layers (Stagehand, Browser Use), managed browser infrastructure (Browserbase, Anchor Browser), MCP browser tools (Playwright MCP, Browserbase MCP), and scraping APIs (CRW, Firecrawl). Each solves a different slice of the problem.

The rise of AI agents changed what "browser automation" means. Traditionally, browser automation was for QA testing: simulate a user, fill forms, click buttons, assert results. It ran on a CI server, maybe once a day, and a 10-second page load was acceptable.

AI agents have completely different requirements:

Latency matters. An agent scraping a page mid-reasoning loop needs a response in under 2 seconds, not 15.
Volume is unpredictable. A user asking "research these 50 companies" might trigger 50 scrapes in parallel.
Memory is a hard constraint. A Playwright instance with a live browser uses 200–600 MB. Running 10 in parallel on a cheap server isn't possible.
Anti-bot is the default. Most production sites block headless browsers. Cloudflare, DataDome, PerimeterX, and similar systems detect and block browser automation instantly unless you specifically engineer around them.

This is why search interest in "browser automation" peaked at 100/100 in February 2026 — every team building agents had to solve this problem at once.

The Four Approaches

There are four ways to give an AI agent the ability to read web pages:

Full browser (Playwright / Puppeteer / Selenium) — run a real browser, render JavaScript, click and interact
Headless HTTP client (httpx, requests, curl) — fetch raw HTML without executing JavaScript
Scraping API (CRW, Firecrawl, Apify) — call a managed service that handles the browser, anti-bot, and output formatting for you
AI-native search API (Tavily, Exa) — purpose-built for LLMs, returns pre-summarized search results

For most AI agent use cases, you need one of options 1 or 3. Options 2 and 4 are useful complements but hit limits quickly.

Full Browser: Playwright, Puppeteer, Selenium

Playwright

Playwright (Microsoft) is the current best-in-class full browser framework. It supports Chromium, Firefox, and WebKit, has a great async Python/JavaScript API, and handles most modern web patterns well.

import asyncio
from playwright.async_api import async_playwright

async def scrape(url: str) -> str:
    async with async_playwright() as p:
        browser = await p.chromium.launch()
        page = await browser.new_page()
        await page.goto(url, wait_until="networkidle")
        content = await page.content()
        await browser.close()
        return content

asyncio.run(scrape("https://example.com"))

What's great: Complete JavaScript rendering, can interact with dynamic content, officially supported by major frameworks, good documentation.

What breaks for agents:

~200–600 MB RAM per browser instance — 10 parallel scrapes = 2–6 GB RAM
Cold start time 2–5 seconds just to launch the browser
Immediate detection by Cloudflare, DataDome, and similar systems without significant stealth configuration
You maintain the infrastructure: updates, Chromium versions, memory limits, crash recovery

Puppeteer

Puppeteer (Google) is the older Node.js-only option. It only controls Chromium and has been largely superseded by Playwright for new projects, but it's still widely used and well-documented.

const puppeteer = require('puppeteer');

async function scrape(url) {
  const browser = await puppeteer.launch({ headless: true });
  const page = await browser.newPage();
  await page.goto(url, { waitUntil: 'networkidle2' });
  const content = await page.content();
  await browser.close();
  return content;
}

scrape('https://example.com').then(console.log);

Verdict: If you're starting a new project, choose Playwright. If you're maintaining existing Puppeteer code, it still works fine. The agent-scale problems are identical — it's the same Chromium engine underneath.

Selenium

Selenium is the oldest option and is primarily a testing framework. It requires a separate WebDriver binary (chromedriver, geckodriver) and is significantly slower and more complex to operate than Playwright. For AI agent use cases, there's almost no reason to choose Selenium over Playwright in 2026 unless you're maintaining legacy code.

AI-Native Browser Control: Stagehand, Browser Use, and Browserbase

Between raw Playwright and a scraping API sits a newer category: AI-native browser control frameworks. These are purpose-built for LLMs and handle the translation between "what the agent wants to do" and "which DOM elements to click."

Stagehand (by Browserbase)

Stagehand wraps Playwright with LLM-powered act(), extract(), and observe() methods. Instead of writing brittle CSS selectors, you write natural language:

import { Stagehand } from "@browserbasehq/stagehand";

const stagehand = new Stagehand({ env: "LOCAL" });
await stagehand.init();
await stagehand.page.goto("https://example.com");
await stagehand.act({ action: "click the login button" });
const price = await stagehand.extract({
  instruction: "extract the product price",
  schema: z.object({ price: z.number() })
});

Best for: Tasks that require actual page interaction — filling forms, clicking, navigating multi-step flows. Not useful if you only need to read content.

Browser Use

Browser Use is a Python library that gives LLMs direct control over a browser. It builds a DOM tree the LLM can reason over, executes actions, and handles the agent loop for you:

from browser_use import Agent
from langchain_anthropic import ChatAnthropic

agent = Agent(
    task="Go to amazon.com and find the price of the top-selling laptop",
    llm=ChatAnthropic(model="claude-sonnet-4-6"),
)
result = await agent.run()

Best for: Agentic browsing tasks where the agent needs to navigate, interact, and extract across multiple steps without predefined paths.

Browserbase

Browserbase is managed cloud infrastructure for Playwright and Puppeteer. You get remote browsers that handle stealth, residential proxies, and session management. Stagehand runs on Browserbase by default.

# Connect to a remote Browserbase session
from playwright.sync_api import sync_playwright
import browserbase

bb = browserbase.Browserbase(api_key="...")
session = bb.sessions.create(project_id="...")

with sync_playwright() as p:
    browser = p.chromium.connect_over_cdp(session.connect_url)
    page = browser.new_page()
    page.goto("https://example.com")

Playwright MCP

Playwright MCP (from Microsoft) exposes browser actions as MCP tools — playwright_navigate, playwright_click, playwright_fill, playwright_screenshot. This lets any MCP client (Claude Desktop, Cursor) control a browser directly.

{
  "mcpServers": {
    "playwright": {
      "command": "npx",
      "args": ["@playwright/mcp"]
    }
  }
}

Limitation: Playwright MCP exposes the full browser interaction surface, which is powerful but also means your agent can accidentally navigate away, trigger popups, or fill forms incorrectly. For read-only tasks, CRW's scraping MCP is safer and faster.

The Real Problem: Anti-Bot Detection

Here's what the documentation for all three tools won't tell you upfront: a default headless browser gets blocked by most production websites.

Cloudflare's bot protection, for example, checks dozens of browser fingerprint signals: the navigator.webdriver flag, missing browser plugins, canvas fingerprint anomalies, timing patterns, and more. A stock Playwright launch fails these checks immediately and gets served a challenge page instead of content.

Working around this requires:

Stealth plugins like playwright-extra with puppeteer-extra-plugin-stealth
Real residential proxies (not datacenter IPs)
Realistic browser fingerprints and user-agent strings
Human-like timing and mouse movement patterns
Rotating sessions to avoid IP-based rate limiting

This is an arms race. Anti-bot vendors update detection constantly. Maintaining effective stealth configuration is ongoing engineering work, not a one-time setup.

API-First Approach: Scraping APIs

Scraping APIs — CRW, Firecrawl, Apify, and similar — handle the browser, stealth, anti-bot, and infrastructure for you. You make an HTTP request to the API; you get clean content back.

CRW (fastcrw.com)

import httpx

response = httpx.post(
    "https://api.fastcrw.com/v1/scrape",
    headers={"Authorization": "Bearer crw_live_..."},
    json={
        "url": "https://example.com",
        "formats": ["markdown"],
        "waitFor": 1000
    }
)
data = response.json()
print(data["markdown"])

Or with the Python SDK:

from crw import CRW

client = CRW(api_key="crw_live_...")
result = client.scrape("https://example.com", formats=["markdown"])
print(result.markdown)

The same API call works for:

Scraping — single page, clean markdown output
Crawling — follow links from a root URL, scrape all discovered pages
Mapping — discover all URLs on a site without scraping content
Extracting — return structured JSON matching a schema you define
Searching — search the web and return scraped results directly

Why API-First Changes the Math for Agents

When your agent makes an API call instead of launching a browser, several things improve:

Latency: A CRW scrape returns quickly with no browser to launch. Playwright with a cold browser start takes seconds before the page even starts loading.
Memory: One API call uses ~5 MB on your agent's side. Running 50 Playwright instances would require 10+ GB RAM.
Anti-bot: CRW's infrastructure handles stealth — residential proxies, fingerprint rotation, Cloudflare bypass — so you don't.
Scale: 100 parallel scrapes is a config change in an API, not an infrastructure project.

Full Comparison Table

Tool	Category	Avg latency	RAM	Anti-bot	JS rendering	Interaction
Playwright	Browser framework	3–8s cold start	200–600 MB	❌ Manual	✅ Full	✅ Yes
Puppeteer	Browser framework	3–8s cold start	200–600 MB	❌ Manual	✅ Full	✅ Yes
Stagehand	AI-native control	3–8s + LLM overhead	200–600 MB	✅ Via Browserbase	✅ Full	✅ Natural language
Browser Use	AI-native control	3–8s + LLM overhead	200–600 MB	⚠️ Partial	✅ Full	✅ Agentic
Browserbase	Managed infra	~2s (remote)	Cloud	✅ Built-in	✅ Full	✅ Yes
Playwright MCP	MCP tool	3–8s	200–600 MB	❌ Manual	✅ Full	✅ Yes
httpx / requests	HTTP client	0.2–0.5s	<5 MB	❌ None	❌ None	❌ No
CRW API	Scraping API	Low (no browser)	<5 MB client	✅ Built-in	✅ Full	❌ Read-only

MCP Integration: Browser Automation Without Browser Code

The Model Context Protocol (MCP) is the fastest-growing way to give AI agents tool access in 2026. Instead of writing agent code that calls an API directly, you expose tools through MCP and let the model decide when and how to use them.

CRW ships with a built-in MCP server. Once configured, any MCP-compatible agent (Claude, Cursor, Claude Desktop, or your own LangChain/LangGraph agent with an MCP client) gets access to crw_scrape, crw_crawl, crw_map, crw_extract, and crw_search as native tools.

Claude Desktop config

{
  "mcpServers": {
    "crw": {
      "command": "crw",
      "args": ["mcp"],
      "env": {
        "CRW_API_KEY": "crw_live_..."
      }
    }
  }
}

Once added, Claude can say "scrape this URL and summarize it" and the tool call happens automatically — no extra code on your side.

LangGraph agent with MCP

from langchain_mcp_adapters.client import MultiServerMCPClient
from langgraph.prebuilt import create_react_agent
from langchain_anthropic import ChatAnthropic

async def main():
    async with MultiServerMCPClient({
        "crw": {
            "command": "crw",
            "args": ["mcp"],
            "env": {"CRW_API_KEY": "crw_live_..."},
            "transport": "stdio",
        }
    }) as client:
        tools = await client.get_tools()
        agent = create_react_agent(ChatAnthropic(model="claude-sonnet-4-6"), tools)
        result = await agent.ainvoke({
            "messages": [{"role": "user", "content": "Scrape https://news.ycombinator.com and summarize the top 5 posts"}]
        })
        print(result["messages"][-1].content)

This pattern — MCP tools + LangGraph agent + CRW — is the current best-practice for production AI agents that need live web data.

Decision Tree: Which Tool for Which Task?

Task	Best tool
Read/extract content from a URL	CRW scrape API
Research all pages on a domain	CRW crawl API
Search web + get scraped content	CRW search API
Extract structured JSON from a page	CRW extract API
Log in + fill a form + submit	Playwright / Stagehand
Multi-step navigation without fixed paths	Browser Use / Stagehand
Browser automation without local Chromium	Browserbase
Give Claude Desktop browser control	Playwright MCP
QA testing your own application	Playwright

The short version: if your agent is reading web content, use a scraping API. If your agent is interacting with web UIs, use a browser tool. Most agents that seem to need browser automation actually only need scraping.

When to Use Full Browser Automation

Browser automation (Playwright/Puppeteer) still makes sense for specific use cases:

Form interaction: If your agent needs to log in, fill forms, click buttons, or navigate through multi-step flows, a full browser is required. CRW handles JavaScript rendering but not stateful interaction.
Screenshot capture: If your agent needs to see a page as a user does (visual reasoning, screenshot-based QA), you need a real browser.
Authentication with cookies: Sites where you need to maintain a session across multiple requests may require a persistent browser context.
Testing your own application: For QA automation of your own app, Playwright is still the right tool.

For everything else — reading content, extracting data, researching competitors, monitoring prices, feeding RAG pipelines — an API-first approach is faster, cheaper, and more reliable.

When to Use a Search API Instead

If your agent is answering questions rather than extracting structured data, a search API may be more appropriate than browser automation at all:

Tavily / Exa: Return pre-summarized search results optimized for LLM context. Good when you want "find me information about X" rather than "scrape this specific URL."
CRW search: Returns actual scraped content from search results — you get both the structured search data and the raw page content, which is useful for RAG pipelines that need full page text.

Practical Architecture for AI Agent Web Access

A production AI agent stack in 2026 typically looks like this:

# Tool routing logic (pseudocode)

def get_web_tool(task):
    if task.needs_interaction:          # login, forms, clicks
        return playwright_tool
    elif task.is_search_query:          # "find articles about X"
        return crw_search_tool
    elif task.is_single_url:            # scrape this specific page
        return crw_scrape_tool
    elif task.is_site_research:         # research everything on example.com
        return crw_crawl_tool
    else:
        return crw_scrape_tool          # sensible default

Most agents only need crw_scrape and crw_search. The browser launcher is a fallback for rare interaction-heavy tasks.

Quick Start: Add Web Access to Your Agent in 3 Minutes

The fastest path from "no web access" to "full browser automation capabilities" for an AI agent:

1. Get an API key

2. Install the SDK

pip install crw        # Python
npm install crw-ts     # TypeScript/Node

3. Add to your agent

from crw import CRW

client = CRW(api_key="crw_live_...")

# As a plain tool function
def scrape_url(url: str) -> str:
    result = client.scrape(url, formats=["markdown"])
    return result.markdown

# As an OpenAI-compatible tool
scrape_tool = {
    "type": "function",
    "function": {
        "name": "scrape_url",
        "description": "Scrape a URL and return clean markdown content",
        "parameters": {
            "type": "object",
            "properties": {
                "url": {"type": "string", "description": "URL to scrape"}
            },
            "required": ["url"]
        }
    }
}

That's it. Your agent now has the same page-reading capability as a full browser, at API latency, with no infrastructure to maintain.

Self-Hosting Option

If you want zero ongoing API costs and full control, CRW ships as a single small static binary that runs on any Linux server:

curl -fsSL https://fastcrw.com/install | bash
crw  # starts on http://localhost:3000

# Then use the local API
client = CRW(api_url="http://localhost:3000")

Self-hosted CRW uses the same API surface as the cloud version. No API key required. See the quick start guide for setup instructions.