How to use the fastCRW scrape flow to turn a single URL into markdown, HTML, or structured data.
Use scrape when you want one page turned into usable content without starting a wider crawl job. It is the right default for:
curl -X POST https://fastcrw.com/api/v1/scrape \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_API_KEY" \
-d '{"url":"https://example.com","formats":["markdown"]}'
If you are not sure where to start, use this shape first:
{
"url": "https://example.com",
"formats": ["markdown"],
"onlyMainContent": true,
"renderJs": null
}
That gives you a clean markdown output, keeps extraction focused on the main body, and leaves JavaScript rendering to the engine's default behavior.
| Field | Type | Default | Description |
|---|---|---|---|
url | string | required | The target page URL |
formats | string[] | ["markdown"] | Output formats: markdown, html, rawHtml, plainText, links, json, extract |
onlyMainContent | boolean | true | Extract primary content area only (removes nav, footer, sidebar) |
renderJs | boolean | null | null | true = force JS rendering, false = skip, null = auto-detect |
waitFor | number | — | Milliseconds to wait after JS rendering |
cssSelector | string | — | CSS selector to narrow content |
xpath | string | — | XPath expression to narrow content |
includeTags | string[] | [] | Only include these HTML tags |
excludeTags | string[] | [] | Remove these HTML tags |
jsonSchema | object | — | JSON Schema for structured extraction (requires formats to include json) |
headers | object | {} | Custom HTTP headers to send with the request |
stealth | boolean | — | Override stealth mode for this request. When true, rotates user-agent from a realistic browser pool and injects standard browser headers |
proxy | string | — | Per-request HTTP proxy URL |
chunkStrategy | object | — | Chunking config: { "type": "sentence" | "regex" | "topic", "maxChars": 1000 } |
query | string | — | Query for BM25/cosine chunk filtering |
filterMode | string | — | "bm25" (keyword density with saturation) or "cosine" (TF-IDF vector similarity). BM25 recommended for most use cases |
topK | number | 5 | Number of top chunks to return when filtering |
llmApiKey | string | — | Per-request LLM API key for structured extraction (BYOK). Overrides server config |
llmProvider | string | "anthropic" | LLM provider: "anthropic" or "openai" |
llmModel | string | "claude-sonnet-4-20250514" | Model to use for structured extraction |
Most integrations only need one of these patterns:
["markdown"] for retrieval, search, summarization, and LLM inputs.["markdown", "links"] when you want the content plus outbound link discovery.["html"] when you need cleaned markup instead of markdown.["rawHtml"] when downstream logic expects the original HTML source.["json"] when you are doing schema-driven extraction.Requesting more formats is convenient for debugging, but in production it is better to ask only for what you will actually store or process.
The default extraction path works well for many pages, but it is not magic. If you know the site structure, tighten the request:
cssSelector when there is a stable content container,xpath when selectors are easier to express that way,includeTags and excludeTags to keep or remove specific markup families,onlyMainContent on unless you explicitly want navigation, footer, or sidebar content.The common mistake is combining too many narrowing options at once. Start broad, inspect the result, then add one targeting primitive at a time.
Use renderJs: true only when the page clearly needs a browser. Browser rendering increases latency and operational cost, so treat it as a deliberate choice rather than the universal default.
When you do need it:
renderJs: true,waitFor: 1000 or 2000,waitFor only when the page still hydrates too slowly.If the response metadata shows an HTTP-only fallback or the output is suspiciously empty, read the JS rendering guide.
chunkStrategy alone splits the markdown and returns all chunks.chunkStrategy + query + filterMode scores and ranks chunks, returning the top topK.topK without query/filterMode still truncates the chunk array to topK items (no scoring).query or filterMode without chunkStrategy is silently ignored — chunking must be enabled first.In practice:
sentence when you want stable natural-language chunks,regex when you already know the structural separator,topic chunking as an advanced option that should be tested on real data before wide rollout.scrapeYou do not need a separate endpoint for extraction. scrape can also return schema-shaped JSON when formats includes json and jsonSchema is present.
That means a single API surface can support:
If your schema is the primary output, read the dedicated Structured extraction guide.
The main response pattern is:
success for overall request outcome,data for returned content,warning for degraded but non-fatal situations,metadata for context such as title, status code, final URL, and elapsed time.Do not ignore warnings. A page blocked by anti-bot protection can still produce content that looks valid at first glance.
The scrape flow is the foundation for:
Use the playground if you want to validate output before wiring the endpoint into production, then move to curl, scripts, or your application code once the payload shape looks right.