Skip to main content
Tutorial

Browser Automation for AI Agents: Playwright, Stagehand, Browser Use, and APIs (2026)

Playwright, Puppeteer, Stagehand, Browser Use, Browserbase, or a scraping API? A practical guide to browser automation for AI agents in 2026.

fastCRW logofastcrw
April 11, 202620 min read

Browser automation for AI agents has exploded in 2026 — Playwright, Stagehand, Browser Use, Browserbase, Playwright MCP, and scraping APIs all compete for the same job. This guide tells you exactly which tool fits which use case.

The Browser Automation Landscape for AI Agents in 2026

Browser automation for AI agents now spans five distinct categories: classic frameworks (Playwright, Puppeteer), AI-native control layers (Stagehand, Browser Use), managed browser infrastructure (Browserbase, Anchor Browser), MCP browser tools (Playwright MCP, Browserbase MCP), and scraping APIs (CRW, Firecrawl). Each solves a different slice of the problem.

The rise of AI agents changed what "browser automation" means. Traditionally, browser automation was for QA testing: simulate a user, fill forms, click buttons, assert results. It ran on a CI server, maybe once a day, and a 10-second page load was acceptable.

AI agents have completely different requirements:

  • Latency matters. An agent scraping a page mid-reasoning loop needs a response in under 2 seconds, not 15.
  • Volume is unpredictable. A user asking "research these 50 companies" might trigger 50 scrapes in parallel.
  • Memory is a hard constraint. A Playwright instance with a live browser uses 200–600 MB. Running 10 in parallel on a cheap server isn't possible.
  • Anti-bot is the default. Most production sites block headless browsers. Cloudflare, DataDome, PerimeterX, and similar systems detect and block browser automation instantly unless you specifically engineer around them.

This is why search interest in "browser automation" peaked at 100/100 in February 2026 — every team building agents had to solve this problem at once.

The Four Approaches

There are four ways to give an AI agent the ability to read web pages:

  1. Full browser (Playwright / Puppeteer / Selenium) — run a real browser, render JavaScript, click and interact
  2. Headless HTTP client (httpx, requests, curl) — fetch raw HTML without executing JavaScript
  3. Scraping API (CRW, Firecrawl, Apify) — call a managed service that handles the browser, anti-bot, and output formatting for you
  4. AI-native search API (Tavily, Exa) — purpose-built for LLMs, returns pre-summarized search results

For most AI agent use cases, you need one of options 1 or 3. Options 2 and 4 are useful complements but hit limits quickly.

Full Browser: Playwright, Puppeteer, Selenium

Playwright

Playwright (Microsoft) is the current best-in-class full browser framework. It supports Chromium, Firefox, and WebKit, has a great async Python/JavaScript API, and handles most modern web patterns well.

import asyncio
from playwright.async_api import async_playwright

async def scrape(url: str) -> str:
    async with async_playwright() as p:
        browser = await p.chromium.launch()
        page = await browser.new_page()
        await page.goto(url, wait_until="networkidle")
        content = await page.content()
        await browser.close()
        return content

asyncio.run(scrape("https://example.com"))

What's great: Complete JavaScript rendering, can interact with dynamic content, officially supported by major frameworks, good documentation.

What breaks for agents:

  • ~200–600 MB RAM per browser instance — 10 parallel scrapes = 2–6 GB RAM
  • Cold start time 2–5 seconds just to launch the browser
  • Immediate detection by Cloudflare, DataDome, and similar systems without significant stealth configuration
  • You maintain the infrastructure: updates, Chromium versions, memory limits, crash recovery

Puppeteer

Puppeteer (Google) is the older Node.js-only option. It only controls Chromium and has been largely superseded by Playwright for new projects, but it's still widely used and well-documented.

const puppeteer = require('puppeteer');

async function scrape(url) {
  const browser = await puppeteer.launch({ headless: true });
  const page = await browser.newPage();
  await page.goto(url, { waitUntil: 'networkidle2' });
  const content = await page.content();
  await browser.close();
  return content;
}

scrape('https://example.com').then(console.log);

Verdict: If you're starting a new project, choose Playwright. If you're maintaining existing Puppeteer code, it still works fine. The agent-scale problems are identical — it's the same Chromium engine underneath.

Selenium

Selenium is the oldest option and is primarily a testing framework. It requires a separate WebDriver binary (chromedriver, geckodriver) and is significantly slower and more complex to operate than Playwright. For AI agent use cases, there's almost no reason to choose Selenium over Playwright in 2026 unless you're maintaining legacy code.

AI-Native Browser Control: Stagehand, Browser Use, and Browserbase

Between raw Playwright and a scraping API sits a newer category: AI-native browser control frameworks. These are purpose-built for LLMs and handle the translation between "what the agent wants to do" and "which DOM elements to click."

Stagehand (by Browserbase)

Stagehand wraps Playwright with LLM-powered act(), extract(), and observe() methods. Instead of writing brittle CSS selectors, you write natural language:

import { Stagehand } from "@browserbasehq/stagehand";

const stagehand = new Stagehand({ env: "LOCAL" });
await stagehand.init();
await stagehand.page.goto("https://example.com");
await stagehand.act({ action: "click the login button" });
const price = await stagehand.extract({
  instruction: "extract the product price",
  schema: z.object({ price: z.number() })
});

Best for: Tasks that require actual page interaction — filling forms, clicking, navigating multi-step flows. Not useful if you only need to read content.

Browser Use

Browser Use is a Python library that gives LLMs direct control over a browser. It builds a DOM tree the LLM can reason over, executes actions, and handles the agent loop for you:

from browser_use import Agent
from langchain_anthropic import ChatAnthropic

agent = Agent(
    task="Go to amazon.com and find the price of the top-selling laptop",
    llm=ChatAnthropic(model="claude-sonnet-4-6"),
)
result = await agent.run()

Best for: Agentic browsing tasks where the agent needs to navigate, interact, and extract across multiple steps without predefined paths.

Browserbase

Browserbase is managed cloud infrastructure for Playwright and Puppeteer. You get remote browsers that handle stealth, residential proxies, and session management. Stagehand runs on Browserbase by default.

# Connect to a remote Browserbase session
from playwright.sync_api import sync_playwright
import browserbase

bb = browserbase.Browserbase(api_key="...")
session = bb.sessions.create(project_id="...")

with sync_playwright() as p:
    browser = p.chromium.connect_over_cdp(session.connect_url)
    page = browser.new_page()
    page.goto("https://example.com")

Playwright MCP

Playwright MCP (from Microsoft) exposes browser actions as MCP tools — playwright_navigate, playwright_click, playwright_fill, playwright_screenshot. This lets any MCP client (Claude Desktop, Cursor) control a browser directly.

{
  "mcpServers": {
    "playwright": {
      "command": "npx",
      "args": ["@playwright/mcp"]
    }
  }
}

Limitation: Playwright MCP exposes the full browser interaction surface, which is powerful but also means your agent can accidentally navigate away, trigger popups, or fill forms incorrectly. For read-only tasks, CRW's scraping MCP is safer and faster.

The Real Problem: Anti-Bot Detection

Here's what the documentation for all three tools won't tell you upfront: a default headless browser gets blocked by most production websites.

Cloudflare's bot protection, for example, checks dozens of browser fingerprint signals: the navigator.webdriver flag, missing browser plugins, canvas fingerprint anomalies, timing patterns, and more. A stock Playwright launch fails these checks immediately and gets served a challenge page instead of content.

Working around this requires:

  • Stealth plugins like playwright-extra with puppeteer-extra-plugin-stealth
  • Real residential proxies (not datacenter IPs)
  • Realistic browser fingerprints and user-agent strings
  • Human-like timing and mouse movement patterns
  • Rotating sessions to avoid IP-based rate limiting

This is an arms race. Anti-bot vendors update detection constantly. Maintaining effective stealth configuration is ongoing engineering work, not a one-time setup.

API-First Approach: Scraping APIs

Scraping APIs — CRW, Firecrawl, Apify, and similar — handle the browser, stealth, anti-bot, and infrastructure for you. You make an HTTP request to the API; you get clean content back.

CRW (fastcrw.com)

import httpx

response = httpx.post(
    "https://api.fastcrw.com/v1/scrape",
    headers={"Authorization": "Bearer crw_live_..."},
    json={
        "url": "https://example.com",
        "formats": ["markdown"],
        "waitFor": 1000
    }
)
data = response.json()
print(data["markdown"])

Or with the Python SDK:

from crw import CRW

client = CRW(api_key="crw_live_...")
result = client.scrape("https://example.com", formats=["markdown"])
print(result.markdown)

The same API call works for:

  • Scraping — single page, clean markdown output
  • Crawling — follow links from a root URL, scrape all discovered pages
  • Mapping — discover all URLs on a site without scraping content
  • Extracting — return structured JSON matching a schema you define
  • Searching — search the web and return scraped results directly

Why API-First Changes the Math for Agents

When your agent makes an API call instead of launching a browser, several things improve:

  • Latency: CRW averages 833ms for a scrape. Playwright with a cold browser start takes 3–8 seconds before the page even starts loading.
  • Memory: One API call uses ~5 MB on your agent's side. Running 50 Playwright instances would require 10+ GB RAM.
  • Anti-bot: CRW's infrastructure handles stealth — residential proxies, fingerprint rotation, Cloudflare bypass — so you don't.
  • Scale: 100 parallel scrapes is a config change in an API, not an infrastructure project.

Full Comparison Table

Tool Category Avg latency RAM Anti-bot JS rendering Interaction
Playwright Browser framework 3–8s cold start 200–600 MB ❌ Manual ✅ Full ✅ Yes
Puppeteer Browser framework 3–8s cold start 200–600 MB ❌ Manual ✅ Full ✅ Yes
Stagehand AI-native control 3–8s + LLM overhead 200–600 MB ✅ Via Browserbase ✅ Full ✅ Natural language
Browser Use AI-native control 3–8s + LLM overhead 200–600 MB ⚠️ Partial ✅ Full ✅ Agentic
Browserbase Managed infra ~2s (remote) Cloud ✅ Built-in ✅ Full ✅ Yes
Playwright MCP MCP tool 3–8s 200–600 MB ❌ Manual ✅ Full ✅ Yes
httpx / requests HTTP client 0.2–0.5s <5 MB ❌ None ❌ None ❌ No
CRW API Scraping API 833ms avg <5 MB client ✅ Built-in ✅ Full ❌ Read-only

MCP Integration: Browser Automation Without Browser Code

The Model Context Protocol (MCP) is the fastest-growing way to give AI agents tool access in 2026. Instead of writing agent code that calls an API directly, you expose tools through MCP and let the model decide when and how to use them.

CRW ships with a built-in MCP server. Once configured, any MCP-compatible agent (Claude, Cursor, Claude Desktop, or your own LangChain/LangGraph agent with an MCP client) gets access to crw_scrape, crw_crawl, crw_map, crw_extract, and crw_search as native tools.

Claude Desktop config

{
  "mcpServers": {
    "crw": {
      "command": "crw",
      "args": ["mcp"],
      "env": {
        "CRW_API_KEY": "crw_live_..."
      }
    }
  }
}

Once added, Claude can say "scrape this URL and summarize it" and the tool call happens automatically — no extra code on your side.

LangGraph agent with MCP

from langchain_mcp_adapters.client import MultiServerMCPClient
from langgraph.prebuilt import create_react_agent
from langchain_anthropic import ChatAnthropic

async def main():
    async with MultiServerMCPClient({
        "crw": {
            "command": "crw",
            "args": ["mcp"],
            "env": {"CRW_API_KEY": "crw_live_..."},
            "transport": "stdio",
        }
    }) as client:
        tools = await client.get_tools()
        agent = create_react_agent(ChatAnthropic(model="claude-sonnet-4-6"), tools)
        result = await agent.ainvoke({
            "messages": [{"role": "user", "content": "Scrape https://news.ycombinator.com and summarize the top 5 posts"}]
        })
        print(result["messages"][-1].content)

This pattern — MCP tools + LangGraph agent + CRW — is the current best-practice for production AI agents that need live web data.

Decision Tree: Which Tool for Which Task?

TaskBest tool
Read/extract content from a URLCRW scrape API
Research all pages on a domainCRW crawl API
Search web + get scraped contentCRW search API
Extract structured JSON from a pageCRW extract API
Log in + fill a form + submitPlaywright / Stagehand
Multi-step navigation without fixed pathsBrowser Use / Stagehand
Browser automation without local ChromiumBrowserbase
Give Claude Desktop browser controlPlaywright MCP
QA testing your own applicationPlaywright

The short version: if your agent is reading web content, use a scraping API. If your agent is interacting with web UIs, use a browser tool. Most agents that seem to need browser automation actually only need scraping.

When to Use Full Browser Automation

Browser automation (Playwright/Puppeteer) still makes sense for specific use cases:

  • Form interaction: If your agent needs to log in, fill forms, click buttons, or navigate through multi-step flows, a full browser is required. CRW handles JavaScript rendering but not stateful interaction.
  • Screenshot capture: If your agent needs to see a page as a user does (visual reasoning, screenshot-based QA), you need a real browser.
  • Authentication with cookies: Sites where you need to maintain a session across multiple requests may require a persistent browser context.
  • Testing your own application: For QA automation of your own app, Playwright is still the right tool.

For everything else — reading content, extracting data, researching competitors, monitoring prices, feeding RAG pipelines — an API-first approach is faster, cheaper, and more reliable.

When to Use a Search API Instead

If your agent is answering questions rather than extracting structured data, a search API may be more appropriate than browser automation at all:

  • Tavily / Exa: Return pre-summarized search results optimized for LLM context. Good when you want "find me information about X" rather than "scrape this specific URL."
  • CRW search: Returns actual scraped content from search results — you get both the structured search data and the raw page content, which is useful for RAG pipelines that need full page text.

Practical Architecture for AI Agent Web Access

A production AI agent stack in 2026 typically looks like this:

# Tool routing logic (pseudocode)

def get_web_tool(task):
    if task.needs_interaction:          # login, forms, clicks
        return playwright_tool
    elif task.is_search_query:          # "find articles about X"
        return crw_search_tool
    elif task.is_single_url:            # scrape this specific page
        return crw_scrape_tool
    elif task.is_site_research:         # research everything on example.com
        return crw_crawl_tool
    else:
        return crw_scrape_tool          # sensible default

Most agents only need crw_scrape and crw_search. The browser launcher is a fallback for rare interaction-heavy tasks.

Quick Start: Add Web Access to Your Agent in 3 Minutes

The fastest path from "no web access" to "full browser automation capabilities" for an AI agent:

1. Get an API key

Sign up at fastcrw.com — 500 free credits, no credit card required.

2. Install the SDK

pip install crw        # Python
npm install crw-ts     # TypeScript/Node

3. Add to your agent

from crw import CRW

client = CRW(api_key="crw_live_...")

# As a plain tool function
def scrape_url(url: str) -> str:
    result = client.scrape(url, formats=["markdown"])
    return result.markdown

# As an OpenAI-compatible tool
scrape_tool = {
    "type": "function",
    "function": {
        "name": "scrape_url",
        "description": "Scrape a URL and return clean markdown content",
        "parameters": {
            "type": "object",
            "properties": {
                "url": {"type": "string", "description": "URL to scrape"}
            },
            "required": ["url"]
        }
    }
}

That's it. Your agent now has the same page-reading capability as a full browser, at API latency, with no infrastructure to maintain.

Self-Hosting Option

If you want zero ongoing API costs and full control, CRW ships as a single ~6 MB binary that runs on any Linux server:

curl -fsSL https://raw.githubusercontent.com/us/crw/main/install.sh | bash
crw  # starts on http://localhost:3000
# Then use the local API
client = CRW(api_url="http://localhost:3000")

Self-hosted CRW uses the same API surface as the cloud version. No API key required. See the quick start guide for setup instructions.

Related Reading

FAQ

Can CRW handle JavaScript-heavy SPAs?

Yes. CRW renders JavaScript by default and waits for network activity to settle before returning content. You can also set a custom waitFor duration in milliseconds or wait for a specific CSS selector to appear before returning.

Does CRW bypass Cloudflare?

CRW includes built-in stealth mode that handles Cloudflare's standard bot protection. Sites with extremely aggressive challenge modes (requiring human CAPTCHA completion) may not be reachable via any automated tool.

What's the difference between scrape, crawl, and map?

scrape returns content from one URL. crawl follows links from a starting URL and returns content from all discovered pages (up to a configurable limit). map discovers all URLs on a site via sitemap and link traversal without returning content.

How do I handle rate limits at scale?

CRW's cloud tier handles rate limiting transparently. For self-hosted deployments, you configure rate limits per domain to avoid being blocked. The API returns 429 responses that you can use to implement exponential backoff in your agent.

Is browser automation required for agentic computer use?

Agentic computer use (Claude Computer Use, OpenAI Operator) uses a full browser because it needs to interact with the UI — clicking, typing, navigating. For the common agent use case of reading web content rather than interacting with web UIs, an API is significantly better.

Get Started

Try CRW Free

Self-host for free (AGPL) or use fastCRW cloud with 500 free credits — no credit card required.

Continue exploring

More tutorial posts

View category archive