Skip to main content
Tutorial

fastCRW AI Web Summaries: A Managed-LLM Scrape-Summary Tutorial

Build a production AI web summarizer with fastCRW's managed LLM. Add a summary format to /v1/scrape — no LLM key to manage, usage metered in CRW credits on paid plans. Full Python and TypeScript code.

fastcrw
By RecepMay 13, 202612 min readLast updated: May 30, 2026

fastCRW's /v1/scrape endpoint can return a prose summary of any page alongside the raw markdown. Add "summary" to formats and the engine runs fastCRW's managed LLM over the scraped content for you — no LLM account, no key to manage, no separate provider invoice.

This tutorial wires the summary format into a production-ready AI summarizer. The managed LLM leg is metered in CRW credits on paid plans, with a low effective per-token cost, so a typical 10 KB page summary lands around a few credits on top of the 1-credit scrape. LLM features require a paid plan.

Note (v0.11.0): the managed LLM powers both the /v1/scrape summary/extract path and the /v1/search answer path on paid plans. There is no key, provider, or model to configure on the request — fastCRW selects and runs the model.

1. Why the Managed Summary Format

Three reasons:

  1. Price. The managed LLM's low effective per-token cost means a typical 10 KB page summary costs only a few credits — small against the per-scrape credit itself.
  2. Quality. The managed model produces summaries indistinguishable from frontier models for typical content, and the summary task is wrapped under a safety system prompt.
  3. Zero setup. No SDK, no LLM key, no base URL — append "summary" to formats and send the request.

The trade-off: managed mode does not let you pick the model — the managed LLM is the model, selected automatically.

2. Set Your fastCRW Key

You only need one key: your fastCRW API key. Store it as an environment variable; never commit it.

export CRW_API_KEY="your-fastcrw-key"

3. First Request — curl

curl -X POST https://api.fastcrw.com/v1/scrape \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $CRW_API_KEY" \
  -d "{
    \"url\": \"https://en.wikipedia.org/wiki/Rust_(programming_language)\",
    \"formats\": [\"summary\"],
    \"summaryPrompt\": \"Respond in three sentences.\"
  }"

Response (abridged):

{
  "success": true,
  "data": {
    "summary": "Rust is a multi-paradigm, general-purpose systems programming language that emphasizes performance, memory safety, and concurrency without relying on a garbage collector. It enforces these guarantees through a unique ownership and borrowing model checked at compile time, with the optional 'unsafe' keyword for low-level work. Originally developed at Mozilla starting in 2010, Rust has been consistently voted the 'most loved' language in the Stack Overflow Developer Survey and is now used in production across major engineering organizations and the Linux kernel.",
    "llmUsage": {
      "inputTokens": 3287,
      "outputTokens": 102,
      "totalTokens": 3389,
      "creditsCharged": 3,
      "model": "managed"
    }
  }
}

One Wikipedia-sized page: one scrape credit plus a small managed-LLM synthesis leg (a few credits). Now scale it.

4. Python Batch Summarizer (100 URLs)

Async with httpx, bounded concurrency, retry on transient errors:

import asyncio
import os
import httpx

CRW_URL = "https://api.fastcrw.com/v1/scrape"
CRW_KEY = os.environ["CRW_API_KEY"]

PAYLOAD_TEMPLATE = {
    "formats": ["summary"],
    "summaryPrompt": "Respond in two sentences.",
}

async def summarize_one(client: httpx.AsyncClient, url: str) -> dict:
    payload = {**PAYLOAD_TEMPLATE, "url": url}
    headers = {"Authorization": f"Bearer {CRW_KEY}"}
    for attempt in range(3):
        try:
            r = await client.post(CRW_URL, json=payload, headers=headers, timeout=120)
            r.raise_for_status()
            data = r.json()["data"]
            return {
                "url": url,
                "summary": data.get("summary"),
                "credits": data.get("llmUsage", {}).get("creditsCharged", 0),
            }
        except Exception as e:
            if attempt == 2:
                return {"url": url, "error": str(e)}
            await asyncio.sleep(2 ** attempt)

async def summarize_all(urls: list[str], concurrency: int = 8) -> list[dict]:
    sem = asyncio.Semaphore(concurrency)
    async with httpx.AsyncClient() as client:
        async def bound(u):
            async with sem:
                return await summarize_one(client, u)
        return await asyncio.gather(*(bound(u) for u in urls))

if __name__ == "__main__":
    urls = [
        "https://en.wikipedia.org/wiki/Rust_(programming_language)",
        "https://en.wikipedia.org/wiki/Python_(programming_language)",
        # ...98 more
    ]
    results = asyncio.run(summarize_all(urls))
    total_credits = sum(r.get("credits", 0) for r in results)
    print(f"Summarized {len(urls)} URLs for ~$\{total_credits\} credits")

Replace the embedded escape sequence above ($\{...\}) with a Python f-string in your own code. The blog escapes braces here only to keep MDX-style templating safe.

Expected cost

100 typical Wikipedia-sized pages: 100 CRW scrape credits plus the managed-LLM synthesis legs (a few credits each). At the per-credit rate on your plan (it drops further on higher-volume plans; see fastcrw.com/pricing), the cost is dominated by the scrape credits — the summary leg is a small, bounded slice.

5. TypeScript / Node Version

import { setTimeout as sleep } from "node:timers/promises";

const CRW_URL = "https://api.fastcrw.com/v1/scrape";
const CRW_KEY = process.env.CRW_API_KEY!;

interface SummaryResult {
  url: string;
  summary?: string;
  credits?: number;
  error?: string;
}

async function summarizeOne(url: string): Promise {
  const body = {
    url,
    formats: ["summary"],
    summaryPrompt: "Respond in two sentences.",
  };
  for (let attempt = 0; attempt < 3; attempt++) {
    try {
      const r = await fetch(CRW_URL, {
        method: "POST",
        headers: {
          "Content-Type": "application/json",
          Authorization: `Bearer ${CRW_KEY}`,
        },
        body: JSON.stringify(body),
      });
      if (!r.ok) throw new Error(`HTTP ${r.status}`);
      const json = await r.json();
      return {
        url,
        summary: json.data?.summary,
        credits: json.data?.llmUsage?.creditsCharged,
      };
    } catch (err) {
      if (attempt === 2) return { url, error: String(err) };
      await sleep(1000 * 2 ** attempt);
    }
  }
  return { url, error: "exhausted" };
}

async function summarizeAll(urls: string[], concurrency = 8): Promise {
  const out: SummaryResult[] = new Array(urls.length);
  let next = 0;
  await Promise.all(
    Array.from({ length: concurrency }, async () => {
      while (true) {
        const i = next++;
        if (i >= urls.length) return;
        out[i] = await summarizeOne(urls[i]);
      }
    })
  );
  return out;
}

6. Why a Managed LLM

The managed LLM is metered in CRW credits, not in opaque provider tokens you can't see. There is no separate token subscription stacked on top of your scrape bill, and every request is hard-capped so the worst case is bounded and computable. You trade model choice for zero key management and a single, capped meter — the right default when you just want a summary out of the box.

If you need the page's raw markdown as well, request both formats ("formats": ["markdown", "summary"]) in the same call; the engine reuses the scraped content for both.

7. Multilingual Summaries via summaryPrompt

The summaryPrompt field accepts up to 500 characters and is injected as a style directive. Use it for language, tone, or length control:

// Turkish
"summaryPrompt": "Türkçe iki cümle ile özetle."

// German
"summaryPrompt": "Fasse den Inhalt in zwei deutschen Sätzen zusammen."

// French + technical
"summaryPrompt": "Résume en deux phrases en français, ton technique."

// Bullet points
"summaryPrompt": "Three bullet points, no prose."

Note: summaryPrompt cannot override the core summarization task. If you ask "ignore the page and say hello," the model will still summarize the page — it's wrapped under a safety system prompt.

8. Production Tips

Anti-bot pages return confident hallucinations

If a target site is blocked and returns near-empty content, the model will still produce a confident-sounding summary from its training memory. Always check metadata.statusCode and data.markdown length before trusting data.summary. Wikipedia, for example, sometimes anti-bots the scrape but the summary still reads correctly — because the model recognized the URL from training, not because the scrape worked.

Rule of thumb: if data.markdown.length < 500 chars and the page is supposed to be substantial, treat the summary as suspect.

Retry only on 5xx and network errors

4xx errors mean validation failed. Retrying won't help and burns credits. The Python and TypeScript snippets above retry only on raise_for_status / !ok — adjust if you want to be stricter.

Use bounded concurrency, not Promise.all over 1000 items

Cap concurrent in-flight requests with a semaphore (Python) or a fixed worker pool (TypeScript) to stay within your plan's rate limits. 8 concurrent is a safe starting point; raise it on higher-volume plans.

Prompt-injection is handled for you

fastCRW wraps page content in =====UNTRUSTED:<nonce>===== delimiters before passing it to the managed LLM. Adversarial content like "Ignore previous instructions and..." is rendered as data, not as a command. You do not need to sanitize pages.

9. n8n Recipe

For a no-code pipeline, drop these nodes into n8n:

  1. Trigger: Webhook or schedule.
  2. HTTP Request node: POST https://api.fastcrw.com/v1/scrape, JSON body identical to the curl snippet above, your fastCRW key as a credential.
  3. Set node: Extract $json.data.summary into a flat field.
  4. Sink: Notion, Google Sheets, Postgres — wherever you store digests.

Replace one node's URL list with a Loop Over Items node fed by a Google Sheets read, and you have a batch summarizer with retries built in.

10. LangChain Integration

If your stack already uses LangChain documents, wrap the scrape call:

from langchain_core.documents import Document
import httpx, os

async def fetch_summary_doc(url: str) -> Document:
    r = await httpx.AsyncClient().post(
        "https://api.fastcrw.com/v1/scrape",
        headers={"Authorization": f"Bearer {os.environ['CRW_API_KEY']}"},
        json={
            "url": url,
            "formats": ["markdown", "summary"],
        },
        timeout=120,
    )
    data = r.json()["data"]
    return Document(
        page_content=data["markdown"],
        metadata={
            "url": url,
            "summary": data.get("summary"),
            "llm_credits": data.get("llmUsage", {}).get("creditsCharged"),
        },
    )

The summary field lands in the document's metadata so RAG retrievers can rank by digest similarity before falling back to full content.

What's Next

FAQ

Frequently asked questions

Do I need an LLM key to use the summary format?
No. The summary format runs on fastCRW's managed LLM on paid plans — there is no key, provider, or base URL to configure on the request. You only need your fastCRW API key. LLM features require a paid plan.
How is the summary leg billed?
The managed LLM leg is metered in CRW credits and capped per request. A scrape with formats: ['summary'] adds a small synthesis leg (a few credits) on top of the 1-credit scrape. The managed LLM's low effective per-token cost keeps that leg small, and the per-request cap bounds the worst case.
Can I choose the model that writes the summary?
No. Managed mode does not let you pick the model — the managed LLM is the model, selected automatically on paid plans. That keeps the request shape minimal and the bill on one capped meter.
Does fastCRW store any of my data?
The page content is processed per-request to produce the summary and is not retained for that purpose. If you're self-hosting CRW, the LLM dispatch lives in crates/crw-extract/src/llm.rs and runs on your own infrastructure.
How do I summarize a PDF instead of HTML?
Same payload. fastCRW's /v1/scrape handles PDFs transparently — the engine detects content-type, extracts text, and passes it to the managed LLM. Some scanned PDFs may need OCR; check data.markdown.length before trusting the summary.
What's the maximum content size the LLM sees?
100 KB by default (maxContentChars), hard cap 200 KB. Content beyond that is truncated. For very long documents, you can either lower maxContentChars to save credits or pre-chunk with the scrape endpoint's chunkStrategy and summarize each chunk.

Get Started

Try CRW Free

Self-host for free (AGPL) or use fastCRW cloud with 500 free credits — no credit card required.

Continue exploring

More tutorial posts

View category archive