By the fastCRW team · Benchmark and credit facts verified 2026-05-18 · Numbers trace to the canonical 3-way scrape benchmark (diagnose_3way.py, 2026-05-08) · Verify independently before relying on them.
Concurrent requests are how you scale a stateless scraper
When you need to scrape thousands of URLs, the lever that matters is concurrent requests, not a faster single machine. A scraper spends almost all of its wall-clock time waiting on the network and on target sites rendering their pages — CPU is rarely the bottleneck. So the way to scale throughput is to have many requests in flight at once, then size that parallelism against two ceilings: the politeness limit of the sites you hit, and the slow latency tail of your own scraper. This guide walks through how to do that concretely against a Firecrawl-compatible API like fastCRW, including the part most guides skip: there is deliberately no batch endpoint, so you iterate /v1/scrape concurrently yourself.
Why parallelism beats a bigger machine
Doubling CPU cores does almost nothing for a workload that is 95% I/O wait. Doubling the number of in-flight requests roughly doubles throughput until you hit a real ceiling. The job is therefore to find that ceiling and park your worker count just below it.
The stateless model makes concurrency trivial
fastCRW is stateless per request: each /v1/scrape call is self-contained, carrying its own URL, renderer choice, and any cookies or headers you pass in. Nothing is held between calls. That means every worker is identical and there is no session affinity, no sticky routing, and no shared cookie jar to coordinate — you can fire N requests in parallel and they neither block nor corrupt each other. If you want the architectural argument for why this matters at scale, see why a stateless request model beats sessions.
Handling many URLs without a batch endpoint
This is the design decision to plan around before you write a line of code.
There is no /v1/batch/scrape
fastCRW deliberately does not expose a /v1/batch/scrape endpoint. You cannot hand it an array of 5,000 URLs and get one job back. This is an honest gap relative to Firecrawl Cloud's batch convenience — concede it plainly: if you specifically want the vendor to manage the fan-out and give you a single job handle, Firecrawl's batch path genuinely wins on ergonomics. fastCRW instead gives you two primitives and expects you to compose them.
Iterate /v1/scrape concurrently
For a known list of URLs, the pattern is a bounded worker pool that pulls from a queue and calls /v1/scrape once per URL. Each call is 1 credit regardless of which renderer is used — HTTP, Lightpanda, or Chrome. Because the engine is stateless, the pool needs no coordination beyond the queue itself:
- Put your URLs in an in-memory or durable queue.
- Start a fixed number of workers (the concurrency you tuned for).
- Each worker loops: dequeue a URL,
POST /v1/scrape, store the result, repeat. - Retry transient failures with backoff; record permanent failures and move on.
This is more code than a single batch call, but it gives you exact control over concurrency, retries, and timeouts — which is what you actually need at scale.
Or use /v1/crawl for whole sites
If your "many URLs" are really "every page under one domain," do not iterate scrape at all — use /v1/crawl. It runs an async breadth-first crawl and returns a job ID you poll, and it is governed by explicit maxDepth (cap 10) and maxPages (cap 1000) limits so a discovery pass can never run away. Crawl is billed 1 credit per page regardless of renderer. The rule of thumb: iterate scrape when you have a discrete URL list; crawl when you have a domain. For the crawl-first pattern, see crawl an entire website.
Rate limiting: polite and self-protective
Concurrency has two ceilings. The first is external — how hard you can hit a target site without being throttled, blocked, or simply being rude. The second is internal — how many requests your own scraper handles before its latency degrades. Rate limiting addresses both.
Respecting target-site limits
Concurrency is per-target, not global. Twenty workers all hammering one domain will get rate-limited or banned long before twenty workers spread across twenty domains. Practical defaults:
- Cap concurrency per host, not just globally. A common starting point is 2-5 simultaneous requests to any single domain.
- Add a small inter-request delay per host (a few hundred milliseconds) so you are not bursting.
- Treat HTTP 429 and 503 as backpressure signals, not errors to retry immediately.
robots.txt respected by default
fastCRW respects robots.txt by default and only lets you override it when you have the legal right to do so. That is a feature for scaled scraping, not a limitation: it stops a runaway worker pool from crawling paths a site has explicitly asked automated clients to avoid, which is exactly the behavior that gets an IP range blocked.
Backoff and retry patterns
Transient failures are normal at scale. The pattern that survives contact with real sites is exponential backoff with jitter: on a 429/503/timeout, wait base × 2^attempt plus a random jitter, retry up to a small fixed number of times, then mark the URL failed and continue. Jitter matters — without it, a wave of simultaneous failures retries in lockstep and re-creates the same spike. Cap retries (3 is reasonable) so one dead URL cannot stall a worker forever.
Tuning parallelism for latency and the slow tail
The single most important number for sizing a concurrent scraper is not the average latency — it is the tail.
Why the p90 tail shapes your timeout
On the canonical 3-way scrape benchmark (diagnose_3way.py, Firecrawl's public dataset, 819 labeled URLs, 2026-05-08), fastCRW's p50 latency is 1914 ms — the fastest median of the three, beating Firecrawl's 2305 ms. In fast mode, the p90 is 4348 ms — the lowest of the three tools tested (Crawl4AI 4754 ms, Firecrawl 6937 ms). The chrome-stealth fallback that recovers the URLs the other tools miss — and earns fastCRW the highest truth-recall of the three, 63.74% of 819 labeled URLs — is the same mechanism that produces a longer tail on the hardest pages. We disclose it rather than hide it because it directly determines how you tune.
The implication for concurrency: size your per-request timeout off the p90, not the p50. If you set a 3-second timeout because the median is under 2 seconds, you will kill some of the hard, valuable pages the fallback was busy rescuing. A timeout in the 8-15 second range covers the tail comfortably in fast mode. For the full latency breakdown, see scraping latency explained.
Bounded worker pools
A long tail also means a few slow requests will hold workers. With an unbounded "fire everything" approach, a burst of tail-latency pages can pin every worker simultaneously and stall the queue. A bounded pool of N workers, each with a tail-sized timeout, keeps throughput steady: fast requests cycle workers quickly, and at most N can be stuck on slow pages at once. Start N around 10-20 and adjust based on measured throughput and error rate.
Per-plan concurrency on managed cloud
On the managed fastCRW cloud, your effective concurrency ceiling is also a function of your plan's credit budget and any per-account limits — check /pricing for current tiers. Firecrawl publishes explicit per-tier concurrency limits, and if you depend on a hard, documented concurrency SLA per tier, that is a place to compare carefully. Self-hosting the AGPL-3.0 engine removes the credit meter entirely — your only concurrency ceiling becomes your own server — at $0 per 1,000 scrapes plus your infrastructure cost.
A scale-ready concurrent scrape pattern
Putting it together, here is the shape of a production fan-out that scales to thousands of URLs.
Worker pool over a URL queue
- Queue — load all target URLs into a queue (durable if the job must survive restarts).
- Bounded pool — spawn N workers (start at 10-20), each pulling one URL at a time.
- Per-host caps — limit simultaneous requests per domain (2-5) so concurrency is polite.
- Tail-sized timeout — set per-request timeouts off the p90/p99 (15-20 s), not the median.
- Backoff + jitter — retry 429/503/timeout with exponential backoff and jitter, capped at 3 attempts.
- Idempotent results — because each scrape is stateless, retries are safe; key results by URL so a retried call simply overwrites.
Credit budgeting at scale
Throughput and cost are the same dial. A run of 10,000 scrapes is 10,000 credits — every renderer costs the same flat 1 credit per page, so the renderer mix does not affect your total. Forecast the mix before you launch the pool, and if structured JSON extraction is in the loop, remember each formats: ["json"] request is 5 credits — that line item dominates an extraction-heavy job. For modeling a monthly bill at this scale, see the cost of web scraping at scale and Firecrawl credits and rate limits.
Where Firecrawl Cloud genuinely wins
To be candid: if you want the vendor to own the fan-out, Firecrawl's batch scrape endpoint and published per-tier concurrency are real ergonomic advantages over iterating /v1/scrape yourself. fastCRW trades that managed convenience for a stateless, self-hostable engine where you control the pool, the retries, and the timeouts directly. For most teams the worker-pool pattern above is a few dozen lines and worth the control; if it is not for you, that is a legitimate reason to use Firecrawl's batch path.
Sources
- fastCRW canonical fact sheet — credit costs, API surface and the absence of
/v1/batch/scrape, honest gaps and stateless model, self-host cost : github.com/us/crw - Canonical 3-way scrape benchmark — p50 1914 ms (fastest), fast-mode p90 4348 ms (lowest), truth-recall 63.74% of 819 labeled URLs (diagnose_3way.py, 2026-05-08): see /benchmarks
- Firecrawl per-tier concurrency and batch endpoint: docs.firecrawl.dev (verified 2026-05-18)
Related: Why a stateless request model beats sessions · Crawl an entire website · Cost of web scraping at scale · Firecrawl credits and rate limits
