Docs/Docs / Formats

Formats Reference

The exact output formats fastCRW accepts today, with one explicit non-goal.

Published
March 11, 2026
Updated
March 11, 2026
Category
docs
Repo-truth format listScreenshot explicitly unsupportedGood for extraction planning

Supported values

FormatMeaning
markdownClean markdown output (default)
htmlCleaned HTML after extraction pipeline
rawHtmlRaw fetched HTML
plainTextPlain text view
linksExtracted absolute links
jsonStructured extraction result when jsonSchema is provided
extractAlias for json — accepted for Firecrawl compatibility

You can request multiple formats in a single call: formats: ["markdown", "html", "links"].

Which Format Should You Choose?

The practical rule is simple:

  • choose markdown when the output is headed into search, RAG, summarization, or LLM prompts,
  • choose html when you still want cleaned structure,
  • choose rawHtml only when you truly need the original source,
  • choose links when discovery matters as much as page content,
  • and choose json when the end result needs to be schema-shaped.

For most product and retrieval workflows, markdown is the best default because it is compact, readable, and easier to inspect than raw markup.

Common Format Combinations

CombinationGood for
["markdown"]Default page extraction
["markdown", "links"]Content plus local link discovery
["html", "rawHtml"]Debugging the extraction pipeline
["json"]Structured extraction only
["markdown", "json"]Human-readable content plus structured fields

Response shape

Each format populates a corresponding field in the response data object:

FormatResponse fieldType
markdownmarkdownstring
htmlhtmlstring
rawHtmlrawHtmlstring
plainTextplainTextstring
linkslinksstring[]
json / extractjsonobject

Full response schema

Every API response follows this envelope:

{
  "success": true,           // false if the request or target failed
  "data": { ... },           // present on success (scrape/crawl data)
  "error": "...",            // present on failure — human-readable message
  "warning": "..."           // present when something non-fatal happened
}

The exact shape of data depends on what you requested. Do not assume every field is always present.

data object (scrape)

FieldTypePresent when
markdownstring | nullformats includes markdown or json
htmlstring | nullformats includes html
rawHtmlstring | nullformats includes rawHtml
plainTextstring | nullformats includes plainText
linksstring[] | nullformats includes links
jsonobject | nullformats includes json AND jsonSchema provided AND LLM configured
chunksChunkResult[] | nullchunkStrategy provided
warningstring | nullTarget returned error status, anti-bot detected, etc.
metadataobjectAlways

metadata object

FieldTypeDescription
titlestring | nullPage <title>
descriptionstring | nullMeta description
ogTitlestring | nullOpen Graph title
ogDescriptionstring | nullOpen Graph description
ogImagestring | nullOpen Graph image URL
canonicalUrlstring | nullCanonical link
sourceURLstringFinal URL after redirects
languagestring | null<html lang> value
statusCodenumberTarget HTTP status code
renderedWithstring | null"cdp", "http_only", or "http_only_fallback"
elapsedMsnumberTotal processing time in ms

ChunkResult object

FieldTypeDescription
contentstringChunk text
scorenumber | nullRelevance score (present when query + filterMode set)
indexnumberOriginal chunk position

Format aliases

"extract" and "llm-extract" are accepted as aliases for "json". The canonical name is json. All three behave identically — they require jsonSchema for structured extraction.

Implementation Guidance

Three habits keep format usage sane in production:

  • request only the formats you really consume,
  • keep metadata with the stored output so later debugging is easier,
  • and validate data.json in your own application before trusting it as final truth.

If you are debugging extraction quality, request both markdown and json for a while. That makes it easy to compare the page text against the structured output.

Not supported in this release

  • screenshot — not implemented. Requesting it will return a 422 error.
  • actions — click/scroll/wait actions are not yet supported. Sending actions will return a 400 error with a message suggesting cssSelector or xpath as alternatives.

If your workload depends on screenshot capture or browser actions, do not assume they exist in the managed cloud.