Use Cases/Use Case / Self-Hosting

Self-Hosted Web Scraping API

Run fastCRW on your own infrastructure — a single ~8 MB Docker image, no Redis or Node.js required, full Firecrawl-compatible API. Deploy on a $5 VPS or inside your own VPC for complete data control, privacy, and zero per-scrape fees.

Published

March 11, 2026

Updated

June 13, 2026

Why Teams Self-Host a Scraping API

Hosted scraping APIs solve the infrastructure problem — you get an endpoint and start scraping in minutes. But hosted APIs introduce a different set of constraints:

Per-scrape billing. Every page you scrape costs a credit or a dollar. For high-volume workloads (millions of pages/month), managed pricing adds up quickly — Firecrawl charges $0.83–5.33 per 1,000 scrapes across its tiers (source: marketing/competitor-prices.lock.md, verified 2026-05-18). At $0 per 1,000 self-hosted scrapes (CANONICAL-FACTS §8), the economics flip completely at scale.
Data egress. Every URL you scrape and every page you receive passes through a third-party API. For regulated industries (healthcare, fintech, legal) or when scraping proprietary internal data sources, that egress is often a compliance or security problem.
Network topology. If your scraping targets are inside a private network (internal documentation, intranet pages, staging environments), a public cloud API can't reach them. A self-hosted instance inside your VPN can.
Operational predictability. Managed APIs can throttle, rate-limit, or reprice. Self-hosting gives you a fixed infrastructure cost and full control over throughput.

fastCRW is designed to make self-hosting as simple as possible. The goal is not "run your own large crawler platform" — it is "expose a Firecrawl-compatible scraping API with as few moving parts as possible."

The fastCRW Self-Hosting Architecture

Default: one container

The minimal fastCRW deployment is a single Docker container running a static Rust binary. No Redis. No Node.js. No message queue. No separate worker process. The image size is approximately 8 MB (CANONICAL-FACTS §7: "Docker image — single ~8 MB binary").

┌─────────────────────────────────┐
│  fastCRW container (~8 MB)      │
│  POST /v1/scrape                │
│  POST /v1/crawl                 │
│  POST /v1/map                   │
│  POST /v1/search                │
│  GET  /health                   │
└─────────────────────────────────┘

This handles HTTP scraping (the http renderer) out of the box. Most static and server-rendered sites respond correctly to HTTP scraping without JavaScript execution.

Add LightPanda for JavaScript rendering

LightPanda is a lightweight browser sidecar that handles most JavaScript-rendered pages without the full overhead of Chrome. Add it to your Docker Compose file when your targets include React SPAs, Next.js apps, and other client-rendered sites.

┌──────────────────┐     ┌──────────────────────┐
│  fastCRW         │────▶│  LightPanda sidecar   │
│  (~8 MB)         │     │  (lightweight browser) │
└──────────────────┘     └──────────────────────┘

LightPanda scraping costs 1 credit — same as HTTP. In managed mode, the default renderer is auto, which tries HTTP first and falls back to LightPanda for dynamic content.

Add Chrome for heavy anti-bot targets

For sites with sophisticated bot detection (Cloudflare challenges, fingerprinting, browser environment checks), Chrome is the most effective renderer. It is also the heaviest: roughly 500 MB image size plus approximately 1 GB resident RAM when active (CANONICAL-FACTS §7: "The opt-in chrome Compose variant is ~500 MB image + ~1 GB resident").

┌──────────────────┐     ┌──────────────────────┐
│  fastCRW         │────▶│  Chrome sidecar       │
│  (~8 MB)         │     │  (~500 MB image,      │
└──────────────────┘     │   ~1 GB resident RAM) │
                         └──────────────────────┘

Chrome is opt-in. Start without it, test your real targets, and only add Chrome if you find your success rate on critical targets is unsatisfactory. Most scraping workloads — including most market research and enrichment pipelines — don't need Chrome.

Comparison: fastCRW Self-Hosted vs. Alternatives

All structural facts from CANONICAL-FACTS §7 (marketing/CANONICAL-FACTS.md, verified 2026-05-22).

	fastCRW (self-hosted)	Firecrawl (self-hosted)	Managed cloud API
Container count (minimal)	1	5	0 (managed for you)
Docker image size	~8 MB	~2–3 GB total	N/A
External service deps	None	Redis required	N/A
Renderer options	HTTP, LightPanda, Chrome	Playwright/Chrome	Varies
Per-scrape fee	$0 (pay your server)	$0 (pay your server)	Per-credit billing
License	AGPL-3.0	AGPL-3.0	Proprietary
API compatibility	Firecrawl-compatible	—	Depends on provider
Data egress	None (stays in your network)	None	Passes through provider

Deployment Walkthrough

Minimal: single container on a VPS

# On your VPS (Ubuntu/Debian)
# Install Docker
curl -fsSL https://get.docker.com | sh

# Pull and run fastCRW
docker run -d \
  --name fastcrw \
  --restart unless-stopped \
  -p 3002:3002 \
  -e CRW_API_KEY=your-secret-key \
  ghcr.io/us/crw:latest

# Test
curl -X POST http://localhost:3002/v1/scrape \
  -H "Authorization: Bearer your-secret-key" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com", "formats": ["markdown"]}'

Standard: Docker Compose with LightPanda

# docker-compose.yml
services:
  fastcrw:
    image: ghcr.io/us/crw:latest
    restart: unless-stopped
    ports:
      - "3002:3002"
    environment:
      CRW_API_KEY: "your-secret-key"
      LIGHTPANDA_URL: "http://lightpanda:9222"
    depends_on:
      - lightpanda

  lightpanda:
    image: ghcr.io/us/lightpanda:latest
    restart: unless-stopped
    # No port exposure needed — internal network only

docker compose up -d

Full: Docker Compose with Chrome

# docker-compose.chrome.yml
services:
  fastcrw:
    image: ghcr.io/us/crw:latest
    restart: unless-stopped
    ports:
      - "3002:3002"
    environment:
      CRW_API_KEY: "your-secret-key"
      LIGHTPANDA_URL: "http://lightpanda:9222"
      CHROME_URL: "http://chrome:9223"
    depends_on:
      - lightpanda
      - chrome

  lightpanda:
    image: ghcr.io/us/lightpanda:latest
    restart: unless-stopped

  chrome:
    image: browserless/chrome:latest
    restart: unless-stopped
    environment:
      MAX_CONCURRENT_SESSIONS: "5"
      # Budget ~1 GB RAM for Chrome

Reverse proxy with TLS (Caddy)

# Caddyfile — TLS is automatic via Let's Encrypt
scrape.yourcompany.com {
  reverse_proxy localhost:3002
}

# Install Caddy and run
caddy run --config Caddyfile

Your self-hosted fastCRW API is now available at https://scrape.yourcompany.com/v1/scrape with automatic HTTPS.

Migrating from Managed Firecrawl to Self-Hosted fastCRW

fastCRW is Firecrawl-compatible. The API shapes — request bodies, response envelopes, endpoint paths — match Firecrawl's /v1 surface. The migration is a base-URL swap:

# Before (Firecrawl managed)
CRW_BASE_URL = "https://api.firecrawl.dev/v1"

# After (fastCRW self-hosted)
CRW_BASE_URL = "http://your-server:3002/v1"

# Everything else stays the same
response = requests.post(
    f"{CRW_BASE_URL}/scrape",
    json={"url": "https://example.com", "formats": ["markdown"]},
    headers={"Authorization": f"Bearer {API_KEY}"}
)

Minor divergences from Firecrawl exist in response field names and error envelopes (CANONICAL-FACTS §9 — "Response field names and error envelopes have minor divergence from Firecrawl"). Test your integration against the migration guide to catch any field name differences.

Using the Same API from Your Applications

Once self-hosted, your existing application code works without changes — just update the base URL and API key.

Python

import requests

CRW_BASE_URL = "http://your-server:3002/v1"
CRW_API_KEY = "your-secret-key"

# Scrape a page
def scrape(url: str) -> dict:
    resp = requests.post(
        f"{CRW_BASE_URL}/scrape",
        json={"url": url, "formats": ["markdown"]},
        headers={"Authorization": f"Bearer {CRW_API_KEY}"},
        timeout=30
    )
    resp.raise_for_status()
    return resp.json()

# Map a domain
def map_domain(domain: str) -> list[str]:
    resp = requests.post(
        f"{CRW_BASE_URL}/map",
        json={"url": f"https://{domain}"},
        headers={"Authorization": f"Bearer {CRW_API_KEY}"},
        timeout=30
    )
    resp.raise_for_status()
    return resp.json().get("urls", [])

# Crawl a site
def start_crawl(url: str, max_pages: int = 100) -> str:
    resp = requests.post(
        f"{CRW_BASE_URL}/crawl",
        json={"url": url, "maxPages": max_pages, "maxDepth": 3},
        headers={"Authorization": f"Bearer {CRW_API_KEY}"},
        timeout=30
    )
    resp.raise_for_status()
    return resp.json().get("id")  # crawl job ID

result = scrape("https://news.ycombinator.com")
print(result["data"]["markdown"][:500])

JavaScript/TypeScript

const BASE_URL = "http://your-server:3002/v1";
const API_KEY = process.env.CRW_API_KEY!;

const headers = {
  Authorization: `Bearer ${API_KEY}`,
  "Content-Type": "application/json",
};

// Scrape
async function scrape(url: string) {
  const res = await fetch(`${BASE_URL}/scrape`, {
    method: "POST",
    headers,
    body: JSON.stringify({ url, formats: ["markdown"] }),
  });
  return res.json();
}

// Map
async function mapDomain(url: string): Promise<string[]> {
  const res = await fetch(`${BASE_URL}/map`, {
    method: "POST",
    headers,
    body: JSON.stringify({ url }),
  });
  const data = await res.json();
  return data.urls ?? [];
}

// Extract with schema
async function extract(url: string, schema: object) {
  const res = await fetch(`${BASE_URL}/scrape`, {
    method: "POST",
    headers,
    body: JSON.stringify({ url, formats: ["json"], jsonSchema: schema }),
  });
  const data = await res.json();
  return data?.data?.json;
}

// Example
const result = await scrape("https://example.com");
console.log(result.data.markdown);

curl — all core endpoints

# Health check (no auth)
curl http://localhost:3002/health

# Scrape (markdown)
curl -X POST http://localhost:3002/v1/scrape \
  -H "Authorization: Bearer $CRW_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com", "formats": ["markdown"]}'

# Scrape with structured extraction
curl -X POST http://localhost:3002/v1/scrape \
  -H "Authorization: Bearer $CRW_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com",
    "formats": ["json"],
    "jsonSchema": {
      "type": "object",
      "properties": {
        "title":       { "type": "string" },
        "description": { "type": "string" }
      }
    }
  }'

# Map a domain
curl -X POST http://localhost:3002/v1/map \
  -H "Authorization: Bearer $CRW_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com"}'

# Start a crawl
curl -X POST http://localhost:3002/v1/crawl \
  -H "Authorization: Bearer $CRW_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com", "maxPages": 50, "maxDepth": 2}'

# Check crawl status (use the ID from the crawl response)
curl http://localhost:3002/v1/crawl/your-crawl-id \
  -H "Authorization: Bearer $CRW_API_KEY"

Choosing Renderer: HTTP vs. LightPanda vs. Chrome

The renderer choice affects both scrape quality and cost. fastCRW's auto mode selects intelligently, but you can force a specific renderer with the renderer field.

Renderer	What it handles	Cost	RAM per request	When to use
`http`	Static HTML, server-rendered pages	1 credit	Minimal	Default for most sites
`lightpanda`	JavaScript SPAs, lazy-loaded content	1 credit	~50–100 MB	Most dynamic pages
`chrome`	Heavy anti-bot, fingerprinting, CAPTCHA-guarded	2 credits	~1 GB	Only when lightpanda fails
`auto` (default)	Tries chrome → lightpanda → http	1–2 credits	Varies	Production default

Credit costs from CANONICAL-FACTS §3 (marketing/CANONICAL-FACTS.md, verified 2026-05-29).

A practical approach: start with auto. After your first batch of scrapes, check which URLs consistently returned thin content. For those URLs, test lightpanda explicitly. Only add Chrome when lightpanda still fails on a critical target.

In self-hosted mode, you're not paying per-credit — you're paying for server resources. The Chrome sidecar's ~1 GB RAM cost is a fixed overhead whether you use it or not, so size your server accordingly before enabling it.

MCP Integration: fastCRW as a Tool for AI Agents

fastCRW ships an MCP (Model Context Protocol) transport at /mcp, which means your self-hosted instance works directly as a tool server for Claude, Claude Code, and any MCP-compatible AI agent framework.

Install the MCP package:

# npm
npm install -g crw-mcp@0.6.0

# or via bunx
bunx crw-mcp@0.6.0

Point it at your self-hosted instance:

// Claude Desktop or Claude Code config
{
  "mcpServers": {
    "fastcrw": {
      "command": "npx",
      "args": ["crw-mcp"],
      "env": {
        "CRW_API_URL": "http://your-server:3002",
        "CRW_API_KEY": "your-secret-key"
      }
    }
  }
}

Your AI agent now calls scrape, crawl, map, and search as native tools against your self-hosted instance. No data transits any cloud service — the agent calls your server, your server scrapes the target, results return to the agent. See MCP integration for the full configuration guide.

Production Hardening

Authentication

fastCRW requires a bearer token (Authorization: Bearer yourkey) for all API calls. Set a strong, randomly generated key (32+ characters) via the CRW_API_KEY environment variable. If you expose the API externally, rotate the key periodically and use a secrets manager (Vault, AWS SSM, Doppler) rather than hardcoding it in Compose files.

TLS

Never expose the raw HTTP port (3002) to the public internet. Put fastCRW behind a reverse proxy (Caddy, Nginx, Traefik) that terminates TLS. Caddy's automatic HTTPS via Let's Encrypt is the easiest path.

Rate limiting

Implement rate limiting at the reverse proxy level to prevent runaway scrape loops from exhausting your server's bandwidth or triggering IP bans on target sites. Nginx's limit_req_zone or Traefik's RateLimit middleware are straightforward options.

Network isolation

If fastCRW should only be accessible inside your private network, bind the container to the internal network interface rather than 0.0.0.0. For Kubernetes deployments, use a ClusterIP service and expose only through your ingress.

Resource limits

Set container CPU and memory limits in your Compose file or Kubernetes manifests to prevent a burst of heavy Chrome-rendered scrapes from OOM-killing other services on the same host.

Logging and monitoring

fastCRW logs each request with URL, renderer, status, and latency. Ship these logs to your existing log aggregation stack (Loki, Datadog, CloudWatch). Set an alert if the scrape success rate drops below 80% — that signals target-site changes or network issues.

Read the full hardening guide at /docs/self-hosting-hardening.

AGPL-3.0: What It Means for Self-Hosters

fastCRW is licensed AGPL-3.0. For most teams self-hosting, the practical implications are:

If you run fastCRW without modifying the source: no obligation. You can use it commercially, internally, or as an API for your own products.
If you modify the fastCRW source and deploy it as a network service: you must publish your modifications under AGPL-3.0. This is the "network use is distribution" clause specific to AGPL.
If you build a product on top of fastCRW (e.g., your own managed scraping service): consult a lawyer. AGPL copyleft may require you to open-source your wrapper, depending on how tightly coupled it is.

For the vast majority of self-hosting use cases — running fastCRW inside your own infrastructure to power internal tools, data pipelines, or AI agent workflows — AGPL-3.0 has no practical impact.

When to Self-Host vs. Use Managed Cloud

Self-hosting is the right choice when:

Cost at scale. You're scraping millions of pages per month and per-scrape fees are a significant line item. Self-hosted cost is fixed server overhead; managed cost scales with volume.
Data control and privacy. Your scraping targets contain sensitive data, or your organization's policy prohibits data egress to third parties.
Private network access. Your targets are inside a VPN or private network that the public cloud can't reach.
Compliance requirements. HIPAA, SOC 2, GDPR data-residency requirements, or similar constraints that require data to stay in specific jurisdictions.
Custom SLA. You need guaranteed throughput and uptime SLAs that a shared managed API can't provide.

Managed cloud (fastcrw.com) is the right choice when:

Speed to first scrape. You want an API key and production-ready endpoint in under 5 minutes with no ops work.
Burst capacity. Your scraping volume is unpredictable and you want elastic throughput without provisioning servers.
Minimal ops. No team member wants to own the infrastructure. Managed handles scaling, browser sidecars, updates, and uptime.
Low volume. Under ~50,000 scrapes per month, the Hobby or Standard plan is often cheaper than a dedicated VPS once ops time is factored in.

The two are not mutually exclusive. Run self-hosted for your steady-state high-volume workloads; use the managed API for burst capacity or experimentation. Because the API is identical, you can route traffic between the two without code changes.

Good Fits for Self-Hosting

Privacy-sensitive workloads — healthcare, legal, and fintech teams where scraped data cannot leave the organization's network
High-volume pipelines — millions of pages per month where per-scrape managed fees are prohibitive
Internal knowledge ingestion — scraping private intranet pages, documentation sites, or staging environments inside a VPN
Cost-sensitive startups — teams that want production-grade scraping without a large managed API bill
Platform engineering teams — building an internal scraping microservice that other teams call, rather than each team integrating a managed API separately
AI agent infrastructure — self-hosted fastCRW as the scraping backend for LLM agents that must keep browsing activity private

When Self-Hosting Is the Wrong Choice

Immediate throughput, minimal ops: If no one on your team wants to own infrastructure, managed is faster and simpler. Self-hosting requires initial setup and ongoing maintenance.
Tiny volume: Below ~10,000 scrapes per month, the time cost of operating a server likely exceeds the cost of a Hobby plan ($13/mo — launch price, was $19, 3,000 credits — CANONICAL-FACTS §2).
Elastic burst needs: If your scraping volume spikes unpredictably (seasonal campaign, viral traffic), managed cloud handles elasticity automatically. Self-hosted capacity is fixed at what you provisioned.
Aggressive anti-bot targets at scale: Running a fleet of Chrome browsers at scale is operationally significant. If your primary use case is bypassing heavy anti-bot protection at high volume, a managed API with a distributed browser fleet may be more effective.

Self-hosting guide — step-by-step deployment instructions including Docker Compose files
Self-hosting hardening guide — security, TLS, rate limiting, and access control
Firecrawl self-hosted Rust alternative — detailed comparison of fastCRW vs. Firecrawl when self-hosting
MCP integration — connect your self-hosted instance to Claude and AI agent frameworks
Pricing — managed cloud plan rates; useful for comparing managed vs. self-hosted cost at your volume
Benchmarks — accuracy and latency data for evaluating whether self-hosted performance meets your requirements
Lead enrichment — a common self-hosting use case: enrichment pipelines where CRM data can't leave the VPC
Market research — another self-hosting candidate: competitive intelligence pipelines with proprietary internal data

Sources

fastCRW open-source engine — README and self-hosting guide

https://github.com/us/crw

AGPL-3.0 license — what it means for self-hosting

https://www.gnu.org/licenses/agpl-3.0.html

Docker deployment best practices

https://docs.docker.com/engine/security/

FAQ

How small can my server be to run fastCRW?

For HTTP-only scraping (no JavaScript rendering), fastCRW runs on the smallest available VPS — a 1-CPU, 512 MB RAM instance is sufficient. The Docker image is a single ~8 MB static Rust binary (CANONICAL-FACTS §7); idle memory footprint is minimal. For LightPanda-rendered scraping, 1 CPU and 1 GB RAM is comfortable. For Chrome-rendered scraping (the opt-in heavy sidecar), budget ~1 GB resident RAM for Chrome alone — a 2 CPU, 2 GB RAM instance handles modest Chrome load.

What do I need to operate fastCRW — Redis, databases, queues?

Nothing. The default fastCRW container is a single static binary with no external service dependencies (CANONICAL-FACTS §7: single ~8 MB binary, no Redis, no Node.js, no containers required). It is stateless per request. For async crawl jobs, the job state is held in process memory — restart the container and in-flight crawl jobs clear. For persistence of crawl results, write results to your own database from your application code. If you need persistent job queues, wrap fastCRW with a standard queue (BullMQ, Celery) from your application layer.

Is self-hosting really free?

Yes — fastCRW is AGPL-3.0 licensed. Self-hosting the engine costs nothing in software fees. You pay only for your server (a $5–10/month VPS covers low-to-medium volume), bandwidth, and any optional browser sidecars. There are no per-scrape fees, no API credit meter, and no vendor billing. The only catch is the AGPL copyleft: if you modify the fastCRW source and deploy it as a network service, you must publish those modifications under AGPL-3.0. Most teams using fastCRW without modifications are unaffected.

How does self-hosted fastCRW compare to self-hosted Firecrawl?

Firecrawl's self-hosted stack requires 5 containers: the API server, a Redis instance, a Playwright/browser service, a worker service, and a Bull Queue dashboard — roughly 2–3 GB total image size (CANONICAL-FACTS §7). fastCRW's default stack is 1 container at ~8 MB. For teams with limited ops bandwidth or small infrastructure budgets, this difference is significant: fewer failure points, faster deploys, smaller attack surface, and much lower baseline RAM consumption.

Can I expose the self-hosted API outside my network?

Yes, but follow the hardening guide first. The key steps: put fastCRW behind a reverse proxy (Nginx, Caddy, or Traefik) that terminates TLS; set a strong `CRW_API_KEY`; restrict the port so the proxy is the only public entry point; and read the self-hosting hardening guide at `/docs/self-hosting-hardening` for rate limiting and request validation. Never expose the raw container port directly to the public internet.

What happens when I need more capacity than a single server?

fastCRW is stateless — each request is independent. Horizontal scaling is straightforward: run multiple container replicas behind a load balancer. Since there is no shared state between instances (no Redis, no shared queue), you can spin up additional replicas and route traffic to them immediately. For managed auto-scaling, the fastCRW cloud at fastcrw.com handles this for you — you can run self-hosted for cost-sensitive workloads and use the managed API for burst capacity without changing your integration code.

Recommended next step

Deploy the single-binary stack yourself.

Use the self-host guide when you want full infra control, lower spend, or private data handling.

Self-Host in 30 Seconds

Continue exploring

More from Use Cases

View all use cases

Previous in Use Cases

Web Scraping for Market Research

Next in Use Cases

Web Scraping API for AI Agents

Use Cases

Web Scraping for RAG and AI Agent Training Data

Collect, clean, and normalize web corpora for RAG knowledge bases and AI agent training datasets with fastCRW — high-fidelity markdown, 63.74% truth-recall, Firecrawl-compatible API, single Rust binary.

web scraping for rag training data63.74% truth-recall on Firecrawl's public 1,000-URL benchmark (`diagnose_3way.py`, 2026-05-08) — highest of three tools tested

Use Cases

Vector Database Ingestion with fastCRW — Pinecone, Chroma, Weaviate, Qdrant, pgvector, Milvus

Crawl any domain into clean markdown with fastCRW, chunk it, embed it, and bulk-insert into your vector database of choice — Pinecone, Chroma, Weaviate, Qdrant, pgvector/Supabase, or Milvus. One hub, six stores.

vector database ingestion pipelineAsync /v1/crawl returns a job id immediately — no long-lived HTTP connection to keep alive

Use Cases

Web Scraping for Deep Research Agents

Build Perplexity-style deep research pipelines with fastCRW — search to discover sources, scrape to extract full content, synthesize with an LLM. Firecrawl-compatible API, single Rust binary, AGPL-3.0.

web scraping for deep researchSearch + scrape + LLM synthesis in one API surface — the full Perplexity-style loop

Related hubs

Keep the crawl path moving

Alternatives

Compare fastCRW against adjacent tools for the same workload.

Benchmarks

Check where internal performance claims start and stop.

Docs

Move into route-level implementation guidance for this workflow.

Self-Hosted Web Scraping API

Why Teams Self-Host a Scraping API

The fastCRW Self-Hosting Architecture

Default: one container

Add LightPanda for JavaScript rendering

Add Chrome for heavy anti-bot targets

Comparison: fastCRW Self-Hosted vs. Alternatives

Deployment Walkthrough

Minimal: single container on a VPS

Standard: Docker Compose with LightPanda

Full: Docker Compose with Chrome

Reverse proxy with TLS (Caddy)

Migrating from Managed Firecrawl to Self-Hosted fastCRW

Using the Same API from Your Applications

Python

JavaScript/TypeScript

curl — all core endpoints

Choosing Renderer: HTTP vs. LightPanda vs. Chrome

MCP Integration: fastCRW as a Tool for AI Agents

Production Hardening

AGPL-3.0: What It Means for Self-Hosters

When to Self-Host vs. Use Managed Cloud

Good Fits for Self-Hosting

When Self-Hosting Is the Wrong Choice

Related Resources

More from Use Cases

Web Scraping for RAG and AI Agent Training Data

Vector Database Ingestion with fastCRW — Pinecone, Chroma, Weaviate, Qdrant, pgvector, Milvus

Web Scraping for Deep Research Agents

Keep the crawl path moving

Alternatives

Benchmarks

Docs