Integrations/Integration / Vercel AI SDK

Vercel AI SDK Web Scraping Integration — fastCRW [Firecrawl-Compatible]

Register fastCRW as a tool in Vercel AI SDK so generateText and streamText can scrape live web pages. Drop-in alternative to Firecrawl with 6.6 MB RAM runtime and 833 ms average latency on 1,000-URL benchmark.

Published

May 12, 2026

Updated

May 12, 2026

Why Vercel AI SDK + fastCRW

Vercel AI SDK is the TypeScript-first toolchain for building AI applications with Next.js, Svelte, and plain Node.js. The tool-calling feature lets models invoke arbitrary functions — perfect for web scraping. fastCRW integrates as a REST API tool, giving your LLM the ability to fetch and understand live web pages without standing up a separate scraping service.

The standard pattern: the model decides it needs to research a topic, calls your fastCRW tool, receives Markdown, and reasons about it inline. Unlike integrating Firecrawl directly, fastCRW runs as a 6.6 MB binary that deploys anywhere — your laptop, a serverless function, a container, or fastcrw.com. The Vercel AI SDK doesn't care where fastCRW lives; it just makes HTTP calls.

Setup

Install Vercel AI SDK in your Next.js or Node.js project.
Sign up at fastcrw.com and grab an API key.
Set FASTCRW_API_KEY in your .env.local file.
Define your fastCRW tools using the Vercel AI SDK tool() helper.
Pass tools to generateText or streamText.

npm install ai zod
export FASTCRW_API_KEY="fcrw_..."

Code Example: Scrape Tool Registration

Define a fastCRW scrape tool in your API route:

import { generateText, tool } from "ai";
import { openai } from "@ai-sdk/openai";
import { z } from "zod";

// Register the fastCRW scrape tool
const fastcrwScrape = tool({
  description:
    "Scrape a single URL via fastCRW and return Markdown content",
  parameters: z.object({
    url: z.string().url("Must be a valid URL"),
    formats: z
      .array(z.enum(["markdown", "html", "json"]))
      .optional()
      .default(["markdown"]),
  }),
  execute: async ({ url, formats }) => {
    const response = await fetch("https://fastcrw.com/api/v1/scrape", {
      method: "POST",
      headers: {
        Authorization: `Bearer ${process.env.FASTCRW_API_KEY}`,
        "Content-Type": "application/json",
      },
      body: JSON.stringify({
        url,
        formats,
      }),
    });

    if (!response.ok) {
      throw new Error(
        `fastCRW scrape failed: ${response.statusText}`
      );
    }

    const result = await response.json();
    return result.data.markdown || result.data.html;
  },
});

// Register the fastCRW search tool
const fastcrwSearch = tool({
  description: "Search the web via fastCRW and return top results",
  parameters: z.object({
    query: z.string(),
    limit: z.number().min(1).max(10).optional().default(5),
  }),
  execute: async ({ query, limit }) => {
    const response = await fetch("https://fastcrw.com/api/v1/search", {
      method: "POST",
      headers: {
        Authorization: `Bearer ${process.env.FASTCRW_API_KEY}`,
        "Content-Type": "application/json",
      },
      body: JSON.stringify({
        query,
        limit,
      }),
    });

    if (!response.ok) {
      throw new Error(`fastCRW search failed: ${response.statusText}`);
    }

    const result = await response.json();
    return JSON.stringify(result.data.results, null, 2);
  },
});

// Use both tools with generateText
export async function POST(request: Request) {
  const { userMessage } = await request.json();

  const result = await generateText({
    model: openai("gpt-4o-mini"),
    tools: {
      scrape: fastcrwScrape,
      search: fastcrwSearch,
    },
    system:
      "You are a research assistant. Use fastCRW tools to fetch live web content and answer questions based on current information.",
    messages: [
      {
        role: "user",
        content: userMessage,
      },
    ],
  });

  return Response.json({
    content: result.text,
  });
}

Streaming Example with streamText

For real-time streaming responses where the model calls fastCRW mid-stream:

import { streamText, tool } from "ai";
import { openai } from "@ai-sdk/openai";
import { z } from "zod";

const fastcrwScrape = tool({
  description: "Scrape a URL and return clean Markdown",
  parameters: z.object({
    url: z.string().url(),
  }),
  execute: async ({ url }) => {
    const response = await fetch("https://fastcrw.com/api/v1/scrape", {
      method: "POST",
      headers: {
        Authorization: `Bearer ${process.env.FASTCRW_API_KEY}`,
        "Content-Type": "application/json",
      },
      body: JSON.stringify({ url, formats: ["markdown"] }),
    });

    const data = await response.json();
    return data.data.markdown;
  },
});

export async function POST(request: Request) {
  const { userMessage } = await request.json();

  const stream = streamText({
    model: openai("gpt-4o-mini"),
    tools: {
      scrape: fastcrwScrape,
    },
    system:
      "Fetch live web content using fastCRW when needed to answer questions accurately.",
    messages: [
      {
        role: "user",
        content: userMessage,
      },
    ],
  });

  return stream.toDataStreamResponse();
}

API Route Example (Next.js App Router)

Create app/api/research/route.ts:

import { generateText, tool } from "ai";
import { openai } from "@ai-sdk/openai";
import { z } from "zod";

const scrapeUrl = tool({
  description: "Fetch and parse a web page",
  parameters: z.object({
    url: z.string().url(),
  }),
  execute: async ({ url }) => {
    const res = await fetch("https://fastcrw.com/api/v1/scrape", {
      method: "POST",
      headers: {
        Authorization: `Bearer ${process.env.FASTCRW_API_KEY}`,
        "Content-Type": "application/json",
      },
      body: JSON.stringify({ url }),
    });
    const data = await res.json();
    return data.data.markdown;
  },
});

export async function POST(request: Request) {
  const { topic } = await request.json();

  const response = await generateText({
    model: openai("gpt-4o-mini"),
    tools: { scrapeUrl },
    system:
      "Research topics by scraping relevant web pages. Be thorough and cite sources.",
    messages: [
      {
        role: "user",
        content: `Research this topic and summarize findings: ${topic}`,
      },
    ],
  });

  return Response.json({ result: response.text });
}

When to Use This

AI-powered research assistants — let the model fetch articles, documentation, and news in real time.
Summarization pipelines — scrape long-form content (blog posts, API docs) and summarize.
Q&A bots over live websites — the model scrapes your documentation site and answers questions.
Market research agents — scrape product pages, pricing, reviews, and synthesize competitive analysis.
Vercel deployments — fastCRW runs on Vercel Edge Functions or serverless containers.
Multimodal chat — combine fastCRW scrape with vision models to understand pages as text + images.

Limits + Gotchas

Rate limiting — fastCRW enforces per-minute and per-day rate limits. If the model calls scrape too frequently, implement throttling in your tool definition.
Context window — scraped Markdown can be large. Summarize or truncate before passing to the model so you don't blow the context budget.
Error handling — fastCRW returns non-200 status codes for blocked sites or timeouts. Wrap tool execution in try-catch and surface errors gracefully to the model.
API key exposure — Never call fastCRW from the browser. Always route through a Next.js API route or edge function.
Streaming latency — tool calls in streamText wait synchronously for the fastCRW response. For slow sites, the stream pauses. Consider caching or pre-fetching.
CORS — fastCRW API is backend-only. If you need frontend scraping, host a proxy endpoint.

Performance Notes

Median latency: 833 ms for HTTP scraping on the Firecrawl benchmark.
JS rendering: LightPanda adds ~2s, Chrome rendering adds ~4–6s.
Parallelism: The Vercel AI SDK can register multiple tools; the model decides which to call. Parallel fastCRW calls are subject to rate limits.
Caching: Implement URL-based caching in your API route to avoid redundant scrapes.

Sources

Vercel AI SDK documentation

https://sdk.vercel.ai

Vercel AI SDK tool use guide

https://sdk.vercel.ai/docs/ai-sdk-core/tools-and-tool-calling

fastCRW scrape API docs

/docs/scrape

fastCRW search API docs

/docs/search

FAQ

How does fastCRW integrate with Vercel AI SDK?

fastCRW exposes a REST API compatible with the Firecrawl standard. Register fastCRW as a tool using the Vercel AI SDK tool() helper, then pass it to generateText or streamText. The model invokes fastCRW just like any other tool.

Can both generateText and streamText call fastCRW?

Yes. The same tool definition works with both. For streaming, generateText returns tool call results in the message stream, so the model can react to scraped content mid-response.

What output formats does fastCRW support?

By default, Markdown. You can request JSON, HTML, or raw text. Markdown is idiomatic for LLM context because it preserves semantic structure while staying token-efficient.

Do I need a Next.js server or can I call fastCRW from the browser?

Call fastCRW from a Next.js API route or edge function only — never expose your API key to the browser. The Vercel AI SDK typically runs on the server side for this reason.

How do I handle rate limits when scraping multiple URLs?

fastCRW respects API rate limits. If you hit 429, implement exponential backoff. For batch workloads, use the asyncQueueing pattern in your model loop.

Can fastCRW scrape JavaScript-heavy sites?

Yes. By default, fastCRW does static HTTP scraping. For JS-rendered content, set formats to request LightPanda (lightweight) or Chrome (full browser) rendering — both are supported in the REST API.

What is the typical latency for a scrape via Vercel AI SDK?

Median 833 ms for HTTP scraping on our 1,000-URL benchmark. JS rendering (LightPanda) adds ~2–3s. The model can work in parallel with other tools, so total latency depends on tool orchestration.

How do I cache fastCRW responses in a Vercel AI SDK flow?

Use a cache layer (Redis, SQLite, or file) keyed by URL hash. Before calling fastCRW, check the cache. If hit, return the cached Markdown. If miss, call fastCRW, cache the result, and return.

Recommended next step

Run a live scrape before you commit.

Use the hosted demo to test scrape, crawl, or map output with fastCRW semantics.

Try Playground

Continue exploring

More from Integrations

View all integrations

Next in Integrations

Cursor Web Scraping Integration — fastCRW [Firecrawl-Compatible]

Integrations

Flowise Web Scraping Integration — fastCRW [Firecrawl-Compatible]

Add fastCRW to Flowise workflows with an HTTP node or custom tool definition. No-code web scraping for LangChain flows, RAG pipelines, and AI agents. 6.6 MB RAM runtime, 92% coverage on the 1,000-URL benchmark.

flowise web scrapingDrop fastCRW into any Flowise flow with the built-in HTTP node

Integrations

Pydantic AI Web Scraping Integration — fastCRW [Firecrawl-Compatible]

Register fastCRW scrape, crawl, and search as Pydantic AI agent tools via the @agent.tool decorator. Typed Python agents fetch live web pages and reason in a single loop. 6.6 MB RAM, 833 ms latency on 1,000-URL benchmark.

pydantic ai web scrapingRegister fastCRW as Pydantic AI tools using @agent.tool decorator

Integrations

LlamaIndex Web Scraping Integration — fastCRW [Firecrawl-Compatible]

Build a custom LlamaIndex web reader that calls fastCRW for live ingestion, or wrap fastCRW scrape/crawl in a LlamaIndex tool for agent workflows. Markdown output feeds directly into embeddings pipelines. 6.6 MB RAM, 92% coverage on 1,000-URL benchmark.

llamaindex web scrapingCustom SimpleWebPageReader-style LlamaIndex reader wrapping fastCRW REST API

Related hubs

Keep the crawl path moving

Docs

Drop into endpoint reference once your integration is wired up.

Use Cases

See where this integration shape fits common AI-agent workloads.

Alternatives

Compare fastCRW against other scraping APIs your stack might consider.