Vercel AI SDK Web Scraping Integration — fastCRW [Firecrawl-Compatible]
Register fastCRW as a tool in Vercel AI SDK so generateText and streamText can scrape live web pages. Drop-in alternative to Firecrawl with 6.6 MB RAM runtime and 833 ms average latency on 1,000-URL benchmark.
Register fastCRW as a tool in Vercel AI SDK with a two-minute setup. Both generateText and streamText can invoke scrape, crawl, and search operations — the same way they call any LLM tool. fastCRW outputs clean Markdown ready for LLM context windows.
Why Vercel AI SDK + fastCRW
Vercel AI SDK is the TypeScript-first toolchain for building AI applications with Next.js, Svelte, and plain Node.js. The tool-calling feature lets models invoke arbitrary functions — perfect for web scraping. fastCRW integrates as a REST API tool, giving your LLM the ability to fetch and understand live web pages without standing up a separate scraping service.
The standard pattern: the model decides it needs to research a topic, calls your fastCRW tool, receives Markdown, and reasons about it inline. Unlike integrating Firecrawl directly, fastCRW runs as a 6.6 MB binary that deploys anywhere — your laptop, a serverless function, a container, or fastcrw.com. The Vercel AI SDK doesn't care where fastCRW lives; it just makes HTTP calls.
Setup
- Install Vercel AI SDK in your Next.js or Node.js project.
- Sign up at fastcrw.com and grab an API key.
- Set
FASTCRW_API_KEYin your.env.localfile. - Define your fastCRW tools using the Vercel AI SDK
tool()helper. - Pass tools to
generateTextorstreamText.
npm install ai zod
export FASTCRW_API_KEY="fcrw_..."
Code Example: Scrape Tool Registration
Define a fastCRW scrape tool in your API route:
import { generateText, tool } from "ai";
import { openai } from "@ai-sdk/openai";
import { z } from "zod";
// Register the fastCRW scrape tool
const fastcrwScrape = tool({
description:
"Scrape a single URL via fastCRW and return Markdown content",
parameters: z.object({
url: z.string().url("Must be a valid URL"),
formats: z
.array(z.enum(["markdown", "html", "json"]))
.optional()
.default(["markdown"]),
}),
execute: async ({ url, formats }) => {
const response = await fetch("https://fastcrw.com/api/v1/scrape", {
method: "POST",
headers: {
Authorization: `Bearer ${process.env.FASTCRW_API_KEY}`,
"Content-Type": "application/json",
},
body: JSON.stringify({
url,
formats,
}),
});
if (!response.ok) {
throw new Error(
`fastCRW scrape failed: ${response.statusText}`
);
}
const result = await response.json();
return result.data.markdown || result.data.html;
},
});
// Register the fastCRW search tool
const fastcrwSearch = tool({
description: "Search the web via fastCRW and return top results",
parameters: z.object({
query: z.string(),
limit: z.number().min(1).max(10).optional().default(5),
}),
execute: async ({ query, limit }) => {
const response = await fetch("https://fastcrw.com/api/v1/search", {
method: "POST",
headers: {
Authorization: `Bearer ${process.env.FASTCRW_API_KEY}`,
"Content-Type": "application/json",
},
body: JSON.stringify({
query,
limit,
}),
});
if (!response.ok) {
throw new Error(`fastCRW search failed: ${response.statusText}`);
}
const result = await response.json();
return JSON.stringify(result.data.results, null, 2);
},
});
// Use both tools with generateText
export async function POST(request: Request) {
const { userMessage } = await request.json();
const result = await generateText({
model: openai("gpt-4o-mini"),
tools: {
scrape: fastcrwScrape,
search: fastcrwSearch,
},
system:
"You are a research assistant. Use fastCRW tools to fetch live web content and answer questions based on current information.",
messages: [
{
role: "user",
content: userMessage,
},
],
});
return Response.json({
content: result.text,
});
}
Streaming Example with streamText
For real-time streaming responses where the model calls fastCRW mid-stream:
import { streamText, tool } from "ai";
import { openai } from "@ai-sdk/openai";
import { z } from "zod";
const fastcrwScrape = tool({
description: "Scrape a URL and return clean Markdown",
parameters: z.object({
url: z.string().url(),
}),
execute: async ({ url }) => {
const response = await fetch("https://fastcrw.com/api/v1/scrape", {
method: "POST",
headers: {
Authorization: `Bearer ${process.env.FASTCRW_API_KEY}`,
"Content-Type": "application/json",
},
body: JSON.stringify({ url, formats: ["markdown"] }),
});
const data = await response.json();
return data.data.markdown;
},
});
export async function POST(request: Request) {
const { userMessage } = await request.json();
const stream = streamText({
model: openai("gpt-4o-mini"),
tools: {
scrape: fastcrwScrape,
},
system:
"Fetch live web content using fastCRW when needed to answer questions accurately.",
messages: [
{
role: "user",
content: userMessage,
},
],
});
return stream.toDataStreamResponse();
}
API Route Example (Next.js App Router)
Create app/api/research/route.ts:
import { generateText, tool } from "ai";
import { openai } from "@ai-sdk/openai";
import { z } from "zod";
const scrapeUrl = tool({
description: "Fetch and parse a web page",
parameters: z.object({
url: z.string().url(),
}),
execute: async ({ url }) => {
const res = await fetch("https://fastcrw.com/api/v1/scrape", {
method: "POST",
headers: {
Authorization: `Bearer ${process.env.FASTCRW_API_KEY}`,
"Content-Type": "application/json",
},
body: JSON.stringify({ url }),
});
const data = await res.json();
return data.data.markdown;
},
});
export async function POST(request: Request) {
const { topic } = await request.json();
const response = await generateText({
model: openai("gpt-4o-mini"),
tools: { scrapeUrl },
system:
"Research topics by scraping relevant web pages. Be thorough and cite sources.",
messages: [
{
role: "user",
content: `Research this topic and summarize findings: ${topic}`,
},
],
});
return Response.json({ result: response.text });
}
When to Use This
- AI-powered research assistants — let the model fetch articles, documentation, and news in real time.
- Summarization pipelines — scrape long-form content (blog posts, API docs) and summarize.
- Q&A bots over live websites — the model scrapes your documentation site and answers questions.
- Market research agents — scrape product pages, pricing, reviews, and synthesize competitive analysis.
- Vercel deployments — fastCRW runs on Vercel Edge Functions or serverless containers.
- Multimodal chat — combine fastCRW scrape with vision models to understand pages as text + images.
Limits + Gotchas
- Rate limiting — fastCRW enforces per-minute and per-day rate limits. If the model calls scrape too frequently, implement throttling in your tool definition.
- Context window — scraped Markdown can be large. Summarize or truncate before passing to the model so you don't blow the context budget.
- Error handling — fastCRW returns non-200 status codes for blocked sites or timeouts. Wrap tool execution in try-catch and surface errors gracefully to the model.
- API key exposure — Never call fastCRW from the browser. Always route through a Next.js API route or edge function.
- Streaming latency — tool calls in streamText wait synchronously for the fastCRW response. For slow sites, the stream pauses. Consider caching or pre-fetching.
- CORS — fastCRW API is backend-only. If you need frontend scraping, host a proxy endpoint.
Performance Notes
- Median latency: 833 ms for HTTP scraping on the Firecrawl benchmark.
- JS rendering: LightPanda adds ~2s, Chrome rendering adds ~4–6s.
- Parallelism: The Vercel AI SDK can register multiple tools; the model decides which to call. Parallel fastCRW calls are subject to rate limits.
- Caching: Implement URL-based caching in your API route to avoid redundant scrapes.
Related
Continue exploring
More from Integrations
Flowise Web Scraping Integration — fastCRW [Firecrawl-Compatible]
Add fastCRW to Flowise workflows with an HTTP node or custom tool definition. No-code web scraping for LangChain flows, RAG pipelines, and AI agents. 6.6 MB RAM runtime, 92% coverage on the 1,000-URL benchmark.
Pydantic AI Web Scraping Integration — fastCRW [Firecrawl-Compatible]
Register fastCRW scrape, crawl, and search as Pydantic AI agent tools via the @agent.tool decorator. Typed Python agents fetch live web pages and reason in a single loop. 6.6 MB RAM, 833 ms latency on 1,000-URL benchmark.
LlamaIndex Web Scraping Integration — fastCRW [Firecrawl-Compatible]
Build a custom LlamaIndex web reader that calls fastCRW for live ingestion, or wrap fastCRW scrape/crawl in a LlamaIndex tool for agent workflows. Markdown output feeds directly into embeddings pipelines. 6.6 MB RAM, 92% coverage on 1,000-URL benchmark.
Related hubs