The exact output formats fastCRW accepts today, with one explicit non-goal.
| Format | Meaning |
|---|---|
markdown | Clean markdown output (default) |
html | Cleaned HTML after extraction pipeline |
rawHtml | Raw fetched HTML |
plainText | Plain text view |
links | Extracted absolute links |
json | Structured extraction result when jsonSchema is provided |
extract | Alias for json — accepted for Firecrawl compatibility |
You can request multiple formats in a single call: formats: ["markdown", "html", "links"].
The practical rule is simple:
markdown when the output is headed into search, RAG, summarization, or LLM prompts,html when you still want cleaned structure,rawHtml only when you truly need the original source,links when discovery matters as much as page content,json when the end result needs to be schema-shaped.For most product and retrieval workflows, markdown is the best default because it is compact, readable, and easier to inspect than raw markup.
| Combination | Good for |
|---|---|
["markdown"] | Default page extraction |
["markdown", "links"] | Content plus local link discovery |
["html", "rawHtml"] | Debugging the extraction pipeline |
["json"] | Structured extraction only |
["markdown", "json"] | Human-readable content plus structured fields |
Each format populates a corresponding field in the response data object:
| Format | Response field | Type |
|---|---|---|
markdown | markdown | string |
html | html | string |
rawHtml | rawHtml | string |
plainText | plainText | string |
links | links | string[] |
json / extract | json | object |
Every API response follows this envelope:
{
"success": true, // false if the request or target failed
"data": { ... }, // present on success (scrape/crawl data)
"error": "...", // present on failure — human-readable message
"warning": "..." // present when something non-fatal happened
}
The exact shape of data depends on what you requested. Do not assume every field is always present.
data object (scrape)| Field | Type | Present when |
|---|---|---|
markdown | string | null | formats includes markdown or json |
html | string | null | formats includes html |
rawHtml | string | null | formats includes rawHtml |
plainText | string | null | formats includes plainText |
links | string[] | null | formats includes links |
json | object | null | formats includes json AND jsonSchema provided AND LLM configured |
chunks | ChunkResult[] | null | chunkStrategy provided |
warning | string | null | Target returned error status, anti-bot detected, etc. |
metadata | object | Always |
metadata object| Field | Type | Description |
|---|---|---|
title | string | null | Page <title> |
description | string | null | Meta description |
ogTitle | string | null | Open Graph title |
ogDescription | string | null | Open Graph description |
ogImage | string | null | Open Graph image URL |
canonicalUrl | string | null | Canonical link |
sourceURL | string | Final URL after redirects |
language | string | null | <html lang> value |
statusCode | number | Target HTTP status code |
renderedWith | string | null | "cdp", "http_only", or "http_only_fallback" |
elapsedMs | number | Total processing time in ms |
ChunkResult object| Field | Type | Description |
|---|---|---|
content | string | Chunk text |
score | number | null | Relevance score (present when query + filterMode set) |
index | number | Original chunk position |
"extract" and "llm-extract" are accepted as aliases for "json". The canonical name is json. All three behave identically — they require jsonSchema for structured extraction.
Three habits keep format usage sane in production:
metadata with the stored output so later debugging is easier,data.json in your own application before trusting it as final truth.If you are debugging extraction quality, request both markdown and json for a while. That makes it easy to compare the page text against the structured output.
screenshot — not implemented. Requesting it will return a 422 error.actions — click/scroll/wait actions are not yet supported. Sending actions will return a 400 error with a message suggesting cssSelector or xpath as alternatives.If your workload depends on screenshot capture or browser actions, do not assume they exist in the managed cloud.