Docs/Docs / Crawl

Crawl Endpoint Guide

How to use the fastCRW crawl flow for asynchronous multi-page extraction and job polling.

Published
March 11, 2026
Updated
March 11, 2026
Category
docs
Asynchronous job modelPolling-based progressGood fit for site sections and corpora

Overview

Use crawl when you need multiple pages instead of a single response payload. It is the right tool for:

  • documentation sections,
  • knowledge-base ingestion,
  • internal search refreshes,
  • and recursive collection jobs that start from one known URL.
curl -X POST https://fastcrw.com/api/v1/crawl \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{"url":"https://example.com","limit":50}'

The initial call returns a job identifier. Poll the job endpoint until the crawl is complete.

Start, Then Poll

The crawl API is asynchronous by design.

  1. POST /crawl starts the job.
  2. The API returns a crawl id.
  3. GET /crawl/:id returns progress and newly available results.
  4. Continue polling until the status becomes completed or a terminal error is returned.
curl https://fastcrw.com/api/v1/crawl/CRAWL_ID \
  -H "Authorization: Bearer YOUR_API_KEY"

That flow is easy to drive from shell scripts, job runners, background workers, and dashboards.

Common Request Fields

The exact crawl body can vary by workflow, but the most common fields are:

FieldDescription
urlRequired starting URL
limitMaximum number of pages to collect
maxPages / max_pagesSupported aliases for crawl caps during compatibility-oriented migrations

Start small. A crawl with limit: 5 is much easier to inspect than a crawl with limit: 500.

Best Uses

  • knowledge-base ingestion,
  • site audits,
  • internal search index refreshes,
  • and agent workflows that need to recurse beyond a starting page.

A Practical Evaluation Loop

The safest way to evaluate a new site is:

  1. run map first to understand the reachable section,
  2. launch a crawl with a low page cap,
  3. inspect the resulting markdown or extraction output,
  4. then widen the scope only after the first batch looks good.

That sequence saves credits and helps you catch bad starting URLs early.

Credit and Retry Behavior

crawl billing is different from scrape because results materialize over time.

  • starting a crawl consumes the initial crawl credit,
  • polling is tied to newly materialized pages,
  • and transient upstream failures should be handled with retry logic rather than blind rapid polling.

If the API returns 429, respect Retry-After. If the target site itself is slow or hostile, reducing crawl size usually gives you a clearer signal than hammering the same job harder.

Design Note

The polling model is explicit today. That keeps the API easy to understand from scripts, pipelines, and dashboard tooling. It also makes billing and progress reporting easier to reason about than hidden background behavior.