Technical_Specifications_v0.2

Documentation.

Interface protocols for autonomous agents and data-driven systems.

92%
Coverage
1K URLs
833ms
Avg Latency
5.5x faster
6.6 MB
Idle RAM
75x less
$0
Cost
self-hosted

Benchmarked against Firecrawl scrape-content-dataset-v1 — 1,000 real-world URLs. Full benchmark comparison ↗

01

Authentication

Include your secret key in the Authorization header. All production endpoints require a valid Bearer token.

Authorization: Bearer YOUR_API_KEY
02

Power Terminal

The Playground is a live debugging interface.

Authenticated_Sync

Logged-in users execute real-time extractions. The terminal handles Job IDs and polling automatically, displaying live progress.

Credit_System

Successful playground executions deduct credits from your account. Insufficient credits will block execution until upgrade.

03

Endpoints

POST

/v1/scrape

Synchronous

Instantly extract content from a single URL. Returns the final data payload immediately.

Request_Sample

{
  "url": "https://example.com",
  "formats": ["markdown"],
  "onlyMainContent": true,
  // Optional: target specific elements
  "cssSelector": "article",
  "xpath": "//h1",
  // Optional: chunking for RAG
  "chunkStrategy": { "type": "topic" },"
  "query": "memory safety",
  "filterMode": "bm25",
  "topK": 5,
  // Optional: stealth & proxy
  "stealth": true,
  "proxy": "http://user:pass@host:8080"
}

Verified_Output

{
  "success": true,
  "data": { "markdown": "...", "metadata": { ... } }
}
POST

/v1/map

Async_Enabled

Build complete sitemaps. For larger domains, this endpoint returns an id for asynchronous polling.

POST

/v1/crawl

Asynchronous

Deep crawling engine. Always returns a Job ID immediately.

Polling_Request

curl https://api.fastcrw.com/v1/crawl/JOB_ID

Job_Response

{
  "success": true,
  "id": "589a4d6c-0e7a..."
}
STRATEGY

Job Polling

Extractions like Crawl and Map may take up to 2 minutes. Clients should poll the status endpoint every 2-5 seconds until status: "completed" is reached.

Job_Lifecycle

  • active/ Job queued or running
  • completed/ Data payload ready
  • failed/ Upstream error encountered
06

Advanced

CSS Selector & XPath

Extract a specific part of the page before conversion. Pass cssSelector or xpath to narrow the DOM — readability scoring is bypassed automatically.

CSS_Selector

{
  "url": "https://news.ycombinator.com",
  "formats": ["markdown"],
  "cssSelector": "td.title",
  "onlyMainContent": false
}

XPath

{
  "url": "https://news.ycombinator.com",
  "formats": ["markdown"],
  "xpath": "//span[@class='titleline']/a"
}

Chunking & Filtering

Split scraped Markdown into chunks for vector databases and RAG pipelines. Results appear in data.chunks. Combine with query + filterMode to rank by relevance.

Topic_Chunks

{
  "url": "https://en.wikipedia.org/wiki/Rust",
  "chunkStrategy": { "type": "topic" },
  "query": "memory safety ownership",
  "filterMode": "bm25",
  "topK": 5
}

Sentence_Chunks

{
  "chunkStrategy": {
    "type": "sentence",
    "maxChars": 500
  }
}

Strategy_Reference

  • topicSplit on # headings
  • sentenceSplit on .!? boundaries
  • regexCustom delimiter pattern

Filter_Mode

  • bm25Keyword relevance
  • cosineTF-IDF similarity

Stealth & Proxy

Reduce bot-detection fingerprinting. When stealth is true, CRW rotates the User-Agent from a pool of real Chrome, Firefox, and Safari strings and injects 12 browser-like headers.

Stealth_Mode

{
  "url": "https://example.com",
  "stealth": true
}

Per_Request_Proxy

{
  "url": "https://example.com",
  "proxy": "http://user:pass@host:8080"
}

Self-Hosting (AGPL-3.0)

Run the FASTCRW engine on your own hardware. Zero monthly fees, infinite concurrency.

cargo install crw-server crw-server --port 3000