How to turn a page into structured JSON with fastCRW when markdown alone is not enough.
Use extraction when you need shape, not just text. Send formats: ["json"] together with a jsonSchema to get structured output.
{
"url": "https://news.ycombinator.com",
"formats": ["json"],
"jsonSchema": {
"type": "object",
"properties": {
"stories": {
"type": "array",
"items": {
"type": "object",
"properties": {
"title": { "type": "string" },
"url": { "type": "string" }
}
}
}
}
}
}
Firecrawl compatibility:
formats: ["extract"]is accepted as an alias for"json". Both work identically, but"json"is the canonical format name.
Use structured extraction when the downstream consumer expects fields, not prose. Common examples:
If your next step is retrieval, summarization, or semantic search, markdown is often the better primary output. If your next step is validation, storage, enrichment, or analytics, JSON is usually the better fit.
curl -X POST https://fastcrw.com/api/v1/scrape \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url":"https://example.com/product/123",
"formats":["json"],
"jsonSchema":{
"type":"object",
"properties":{
"title":{"type":"string"},
"price":{"type":"string"},
"availability":{"type":"string"}
},
"required":["title"]
}
}'
Start with the smallest schema that is genuinely useful. Overspecified schemas fail more often and are harder to debug.
Use extraction when the downstream consumer expects fields, not a markdown blob.
Strong extraction schemas share a few traits:
Good first schema:
{
"type": "object",
"properties": {
"title": { "type": "string" },
"author": { "type": "string" },
"publishedAt": { "type": "string" }
}
}
Risky first schema:
{
"type": "object",
"properties": {
"sections": {
"type": "array",
"items": {
"type": "object",
"properties": {
"title": { "type": "string" },
"subsections": {
"type": "array",
"items": {
"type": "object",
"properties": {
"title": { "type": "string" },
"bullets": {
"type": "array",
"items": { "type": "string" }
}
}
}
}
}
}
}
}
}
The second schema may be valid, but it asks the model to infer a lot of structure that may not exist clearly on the page.
You can pass your own LLM API key per-request instead of relying on server configuration:
{
"url": "https://example.com",
"formats": ["json"],
"jsonSchema": {
"type": "object",
"properties": {
"title": { "type": "string" },
"description": { "type": "string" }
}
},
"llmApiKey": "sk-ant-your-key-here",
"llmProvider": "anthropic",
"llmModel": "claude-sonnet-4-20250514"
}
| Field | Default | Description |
|---|---|---|
llmApiKey | — | Your API key. Required if the server has no key configured |
llmProvider | "anthropic" | "anthropic" or "openai" |
llmModel | "claude-sonnet-4-20250514" | Model for structured extraction |
When a per-request key is provided, it takes priority over server configuration.
Extraction is usually best as a second step:
That saves time when a target page is blocked, incomplete, or structurally noisy.
jsonSchema: If you send formats: ["json"] without a jsonSchema, the API returns a 400 error. You must provide a schema.formats: ["extract"] works but "json" is preferred. formats: ["llm-extract"] is also accepted.llmApiKey is set, the API returns a 422 error with guidance.