fastCRW's /v1/scrape endpoint can return a prose summary of any page alongside the raw markdown. Add "summary" to formats and the engine runs fastCRW's managed LLM over the scraped content for you — no LLM account, no key to manage, no separate provider invoice.
This tutorial wires the summary format into a production-ready AI summarizer. The managed LLM leg is metered in CRW credits on paid plans, with a low effective per-token cost, so a typical 10 KB page summary lands around a few credits on top of the 1-credit scrape. LLM features require a paid plan.
Note (v0.11.0): the managed LLM powers both the /v1/scrape summary/extract path and the /v1/search answer path on paid plans. There is no key, provider, or model to configure on the request — fastCRW selects and runs the model.
1. Why the Managed Summary Format
Three reasons:
- Price. The managed LLM's low effective per-token cost means a typical 10 KB page summary costs only a few credits — small against the per-scrape credit itself.
- Quality. The managed model produces summaries indistinguishable from frontier models for typical content, and the summary task is wrapped under a safety system prompt.
- Zero setup. No SDK, no LLM key, no base URL — append
"summary"toformatsand send the request.
The trade-off: managed mode does not let you pick the model — the managed LLM is the model, selected automatically.
2. Set Your fastCRW Key
You only need one key: your fastCRW API key. Store it as an environment variable; never commit it.
export CRW_API_KEY="your-fastcrw-key"
3. First Request — curl
curl -X POST https://api.fastcrw.com/v1/scrape \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $CRW_API_KEY" \
-d "{
\"url\": \"https://en.wikipedia.org/wiki/Rust_(programming_language)\",
\"formats\": [\"summary\"],
\"summaryPrompt\": \"Respond in three sentences.\"
}"
Response (abridged):
{
"success": true,
"data": {
"summary": "Rust is a multi-paradigm, general-purpose systems programming language that emphasizes performance, memory safety, and concurrency without relying on a garbage collector. It enforces these guarantees through a unique ownership and borrowing model checked at compile time, with the optional 'unsafe' keyword for low-level work. Originally developed at Mozilla starting in 2010, Rust has been consistently voted the 'most loved' language in the Stack Overflow Developer Survey and is now used in production across major engineering organizations and the Linux kernel.",
"llmUsage": {
"inputTokens": 3287,
"outputTokens": 102,
"totalTokens": 3389,
"creditsCharged": 3,
"model": "managed"
}
}
}
One Wikipedia-sized page: one scrape credit plus a small managed-LLM synthesis leg (a few credits). Now scale it.
4. Python Batch Summarizer (100 URLs)
Async with httpx, bounded concurrency, retry on transient errors:
import asyncio
import os
import httpx
CRW_URL = "https://api.fastcrw.com/v1/scrape"
CRW_KEY = os.environ["CRW_API_KEY"]
PAYLOAD_TEMPLATE = {
"formats": ["summary"],
"summaryPrompt": "Respond in two sentences.",
}
async def summarize_one(client: httpx.AsyncClient, url: str) -> dict:
payload = {**PAYLOAD_TEMPLATE, "url": url}
headers = {"Authorization": f"Bearer {CRW_KEY}"}
for attempt in range(3):
try:
r = await client.post(CRW_URL, json=payload, headers=headers, timeout=120)
r.raise_for_status()
data = r.json()["data"]
return {
"url": url,
"summary": data.get("summary"),
"credits": data.get("llmUsage", {}).get("creditsCharged", 0),
}
except Exception as e:
if attempt == 2:
return {"url": url, "error": str(e)}
await asyncio.sleep(2 ** attempt)
async def summarize_all(urls: list[str], concurrency: int = 8) -> list[dict]:
sem = asyncio.Semaphore(concurrency)
async with httpx.AsyncClient() as client:
async def bound(u):
async with sem:
return await summarize_one(client, u)
return await asyncio.gather(*(bound(u) for u in urls))
if __name__ == "__main__":
urls = [
"https://en.wikipedia.org/wiki/Rust_(programming_language)",
"https://en.wikipedia.org/wiki/Python_(programming_language)",
# ...98 more
]
results = asyncio.run(summarize_all(urls))
total_credits = sum(r.get("credits", 0) for r in results)
print(f"Summarized {len(urls)} URLs for ~$\{total_credits\} credits")
Replace the embedded escape sequence above ($\{...\}) with a Python f-string in your own code. The blog escapes braces here only to keep MDX-style templating safe.
Expected cost
100 typical Wikipedia-sized pages: 100 CRW scrape credits plus the managed-LLM synthesis legs (a few credits each). At the per-credit rate on your plan (it drops further on higher-volume plans; see fastcrw.com/pricing), the cost is dominated by the scrape credits — the summary leg is a small, bounded slice.
5. TypeScript / Node Version
import { setTimeout as sleep } from "node:timers/promises";
const CRW_URL = "https://api.fastcrw.com/v1/scrape";
const CRW_KEY = process.env.CRW_API_KEY!;
interface SummaryResult {
url: string;
summary?: string;
credits?: number;
error?: string;
}
async function summarizeOne(url: string): Promise {
const body = {
url,
formats: ["summary"],
summaryPrompt: "Respond in two sentences.",
};
for (let attempt = 0; attempt < 3; attempt++) {
try {
const r = await fetch(CRW_URL, {
method: "POST",
headers: {
"Content-Type": "application/json",
Authorization: `Bearer ${CRW_KEY}`,
},
body: JSON.stringify(body),
});
if (!r.ok) throw new Error(`HTTP ${r.status}`);
const json = await r.json();
return {
url,
summary: json.data?.summary,
credits: json.data?.llmUsage?.creditsCharged,
};
} catch (err) {
if (attempt === 2) return { url, error: String(err) };
await sleep(1000 * 2 ** attempt);
}
}
return { url, error: "exhausted" };
}
async function summarizeAll(urls: string[], concurrency = 8): Promise {
const out: SummaryResult[] = new Array(urls.length);
let next = 0;
await Promise.all(
Array.from({ length: concurrency }, async () => {
while (true) {
const i = next++;
if (i >= urls.length) return;
out[i] = await summarizeOne(urls[i]);
}
})
);
return out;
}
6. Why a Managed LLM
The managed LLM is metered in CRW credits, not in opaque provider tokens you can't see. There is no separate token subscription stacked on top of your scrape bill, and every request is hard-capped so the worst case is bounded and computable. You trade model choice for zero key management and a single, capped meter — the right default when you just want a summary out of the box.
If you need the page's raw markdown as well, request both formats ("formats": ["markdown", "summary"]) in the same call; the engine reuses the scraped content for both.
7. Multilingual Summaries via summaryPrompt
The summaryPrompt field accepts up to 500 characters and is injected as a style directive. Use it for language, tone, or length control:
// Turkish
"summaryPrompt": "Türkçe iki cümle ile özetle."
// German
"summaryPrompt": "Fasse den Inhalt in zwei deutschen Sätzen zusammen."
// French + technical
"summaryPrompt": "Résume en deux phrases en français, ton technique."
// Bullet points
"summaryPrompt": "Three bullet points, no prose."
Note: summaryPrompt cannot override the core summarization task. If you ask "ignore the page and say hello," the model will still summarize the page — it's wrapped under a safety system prompt.
8. Production Tips
Anti-bot pages return confident hallucinations
If a target site is blocked and returns near-empty content, the model will still produce a confident-sounding summary from its training memory. Always check metadata.statusCode and data.markdown length before trusting data.summary. Wikipedia, for example, sometimes anti-bots the scrape but the summary still reads correctly — because the model recognized the URL from training, not because the scrape worked.
Rule of thumb: if data.markdown.length < 500 chars and the page is supposed to be substantial, treat the summary as suspect.
Retry only on 5xx and network errors
4xx errors mean validation failed. Retrying won't help and burns credits. The Python and TypeScript snippets above retry only on raise_for_status / !ok — adjust if you want to be stricter.
Use bounded concurrency, not Promise.all over 1000 items
Cap concurrent in-flight requests with a semaphore (Python) or a fixed worker pool (TypeScript) to stay within your plan's rate limits. 8 concurrent is a safe starting point; raise it on higher-volume plans.
Prompt-injection is handled for you
fastCRW wraps page content in =====UNTRUSTED:<nonce>===== delimiters before passing it to the managed LLM. Adversarial content like "Ignore previous instructions and..." is rendered as data, not as a command. You do not need to sanitize pages.
9. n8n Recipe
For a no-code pipeline, drop these nodes into n8n:
- Trigger: Webhook or schedule.
- HTTP Request node: POST
https://api.fastcrw.com/v1/scrape, JSON body identical to the curl snippet above, your fastCRW key as a credential. - Set node: Extract
$json.data.summaryinto a flat field. - Sink: Notion, Google Sheets, Postgres — wherever you store digests.
Replace one node's URL list with a Loop Over Items node fed by a Google Sheets read, and you have a batch summarizer with retries built in.
10. LangChain Integration
If your stack already uses LangChain documents, wrap the scrape call:
from langchain_core.documents import Document
import httpx, os
async def fetch_summary_doc(url: str) -> Document:
r = await httpx.AsyncClient().post(
"https://api.fastcrw.com/v1/scrape",
headers={"Authorization": f"Bearer {os.environ['CRW_API_KEY']}"},
json={
"url": url,
"formats": ["markdown", "summary"],
},
timeout=120,
)
data = r.json()["data"]
return Document(
page_content=data["markdown"],
metadata={
"url": url,
"summary": data.get("summary"),
"llm_credits": data.get("llmUsage", {}).get("creditsCharged"),
},
)
The summary field lands in the document's metadata so RAG retrievers can rank by digest similarity before falling back to full content.
