Skip to main content
Engineering

Concurrent Requests & Rate Limiting: Scaling Scraping

Scale web scraping with concurrent requests and rate limiting: tune parallelism, respect target sites, and iterate /v1/scrape since there's no batch endpoint.

fastcrw
By RecepJune 30, 20269 min readLast updated: June 2, 2026

By the fastCRW team · Benchmark and credit facts verified 2026-05-18 · Numbers trace to the canonical 3-way scrape benchmark (diagnose_3way.py, 2026-05-08) · Verify independently before relying on them.

Concurrent requests are how you scale a stateless scraper

When you need to scrape thousands of URLs, the lever that matters is concurrent requests, not a faster single machine. A scraper spends almost all of its wall-clock time waiting on the network and on target sites rendering their pages — CPU is rarely the bottleneck. So the way to scale throughput is to have many requests in flight at once, then size that parallelism against two ceilings: the politeness limit of the sites you hit, and the slow latency tail of your own scraper. This guide walks through how to do that concretely against a Firecrawl-compatible API like fastCRW, including the part most guides skip: there is deliberately no batch endpoint, so you iterate /v1/scrape concurrently yourself.

Why parallelism beats a bigger machine

Doubling CPU cores does almost nothing for a workload that is 95% I/O wait. Doubling the number of in-flight requests roughly doubles throughput until you hit a real ceiling. The job is therefore to find that ceiling and park your worker count just below it.

The stateless model makes concurrency trivial

fastCRW is stateless per request: each /v1/scrape call is self-contained, carrying its own URL, renderer choice, and any cookies or headers you pass in. Nothing is held between calls. That means every worker is identical and there is no session affinity, no sticky routing, and no shared cookie jar to coordinate — you can fire N requests in parallel and they neither block nor corrupt each other. If you want the architectural argument for why this matters at scale, see why a stateless request model beats sessions.

Handling many URLs without a batch endpoint

This is the design decision to plan around before you write a line of code.

There is no /v1/batch/scrape

fastCRW deliberately does not expose a /v1/batch/scrape endpoint. You cannot hand it an array of 5,000 URLs and get one job back. This is an honest gap relative to Firecrawl Cloud's batch convenience — concede it plainly: if you specifically want the vendor to manage the fan-out and give you a single job handle, Firecrawl's batch path genuinely wins on ergonomics. fastCRW instead gives you two primitives and expects you to compose them.

Iterate /v1/scrape concurrently

For a known list of URLs, the pattern is a bounded worker pool that pulls from a queue and calls /v1/scrape once per URL. Each call is 1 credit regardless of which renderer is used — HTTP, Lightpanda, or Chrome. Because the engine is stateless, the pool needs no coordination beyond the queue itself:

  • Put your URLs in an in-memory or durable queue.
  • Start a fixed number of workers (the concurrency you tuned for).
  • Each worker loops: dequeue a URL, POST /v1/scrape, store the result, repeat.
  • Retry transient failures with backoff; record permanent failures and move on.

This is more code than a single batch call, but it gives you exact control over concurrency, retries, and timeouts — which is what you actually need at scale.

Or use /v1/crawl for whole sites

If your "many URLs" are really "every page under one domain," do not iterate scrape at all — use /v1/crawl. It runs an async breadth-first crawl and returns a job ID you poll, and it is governed by explicit maxDepth (cap 10) and maxPages (cap 1000) limits so a discovery pass can never run away. Crawl is billed 1 credit per page regardless of renderer. The rule of thumb: iterate scrape when you have a discrete URL list; crawl when you have a domain. For the crawl-first pattern, see crawl an entire website.

Rate limiting: polite and self-protective

Concurrency has two ceilings. The first is external — how hard you can hit a target site without being throttled, blocked, or simply being rude. The second is internal — how many requests your own scraper handles before its latency degrades. Rate limiting addresses both.

Respecting target-site limits

Concurrency is per-target, not global. Twenty workers all hammering one domain will get rate-limited or banned long before twenty workers spread across twenty domains. Practical defaults:

  • Cap concurrency per host, not just globally. A common starting point is 2-5 simultaneous requests to any single domain.
  • Add a small inter-request delay per host (a few hundred milliseconds) so you are not bursting.
  • Treat HTTP 429 and 503 as backpressure signals, not errors to retry immediately.

robots.txt respected by default

fastCRW respects robots.txt by default and only lets you override it when you have the legal right to do so. That is a feature for scaled scraping, not a limitation: it stops a runaway worker pool from crawling paths a site has explicitly asked automated clients to avoid, which is exactly the behavior that gets an IP range blocked.

Backoff and retry patterns

Transient failures are normal at scale. The pattern that survives contact with real sites is exponential backoff with jitter: on a 429/503/timeout, wait base × 2^attempt plus a random jitter, retry up to a small fixed number of times, then mark the URL failed and continue. Jitter matters — without it, a wave of simultaneous failures retries in lockstep and re-creates the same spike. Cap retries (3 is reasonable) so one dead URL cannot stall a worker forever.

Tuning parallelism for latency and the slow tail

The single most important number for sizing a concurrent scraper is not the average latency — it is the tail.

Why the p90 tail shapes your timeout

On the canonical 3-way scrape benchmark (diagnose_3way.py, Firecrawl's public dataset, 819 labeled URLs, 2026-05-08), fastCRW's p50 latency is 1914 ms — the fastest median of the three, beating Firecrawl's 2305 ms. In fast mode, the p90 is 4348 ms — the lowest of the three tools tested (Crawl4AI 4754 ms, Firecrawl 6937 ms). The chrome-stealth fallback that recovers the URLs the other tools miss — and earns fastCRW the highest truth-recall of the three, 63.74% of 819 labeled URLs — is the same mechanism that produces a longer tail on the hardest pages. We disclose it rather than hide it because it directly determines how you tune.

The implication for concurrency: size your per-request timeout off the p90, not the p50. If you set a 3-second timeout because the median is under 2 seconds, you will kill some of the hard, valuable pages the fallback was busy rescuing. A timeout in the 8-15 second range covers the tail comfortably in fast mode. For the full latency breakdown, see scraping latency explained.

Bounded worker pools

A long tail also means a few slow requests will hold workers. With an unbounded "fire everything" approach, a burst of tail-latency pages can pin every worker simultaneously and stall the queue. A bounded pool of N workers, each with a tail-sized timeout, keeps throughput steady: fast requests cycle workers quickly, and at most N can be stuck on slow pages at once. Start N around 10-20 and adjust based on measured throughput and error rate.

Per-plan concurrency on managed cloud

On the managed fastCRW cloud, your effective concurrency ceiling is also a function of your plan's credit budget and any per-account limits — check /pricing for current tiers. Firecrawl publishes explicit per-tier concurrency limits, and if you depend on a hard, documented concurrency SLA per tier, that is a place to compare carefully. Self-hosting the AGPL-3.0 engine removes the credit meter entirely — your only concurrency ceiling becomes your own server — at $0 per 1,000 scrapes plus your infrastructure cost.

A scale-ready concurrent scrape pattern

Putting it together, here is the shape of a production fan-out that scales to thousands of URLs.

Worker pool over a URL queue

  1. Queue — load all target URLs into a queue (durable if the job must survive restarts).
  2. Bounded pool — spawn N workers (start at 10-20), each pulling one URL at a time.
  3. Per-host caps — limit simultaneous requests per domain (2-5) so concurrency is polite.
  4. Tail-sized timeout — set per-request timeouts off the p90/p99 (15-20 s), not the median.
  5. Backoff + jitter — retry 429/503/timeout with exponential backoff and jitter, capped at 3 attempts.
  6. Idempotent results — because each scrape is stateless, retries are safe; key results by URL so a retried call simply overwrites.

Credit budgeting at scale

Throughput and cost are the same dial. A run of 10,000 scrapes is 10,000 credits — every renderer costs the same flat 1 credit per page, so the renderer mix does not affect your total. Forecast the mix before you launch the pool, and if structured JSON extraction is in the loop, remember each formats: ["json"] request is 5 credits — that line item dominates an extraction-heavy job. For modeling a monthly bill at this scale, see the cost of web scraping at scale and Firecrawl credits and rate limits.

Where Firecrawl Cloud genuinely wins

To be candid: if you want the vendor to own the fan-out, Firecrawl's batch scrape endpoint and published per-tier concurrency are real ergonomic advantages over iterating /v1/scrape yourself. fastCRW trades that managed convenience for a stateless, self-hostable engine where you control the pool, the retries, and the timeouts directly. For most teams the worker-pool pattern above is a few dozen lines and worth the control; if it is not for you, that is a legitimate reason to use Firecrawl's batch path.

Sources

  • fastCRW canonical fact sheet — credit costs, API surface and the absence of /v1/batch/scrape, honest gaps and stateless model, self-host cost : github.com/us/crw
  • Canonical 3-way scrape benchmark — p50 1914 ms (fastest), fast-mode p90 4348 ms (lowest), truth-recall 63.74% of 819 labeled URLs (diagnose_3way.py, 2026-05-08): see /benchmarks
  • Firecrawl per-tier concurrency and batch endpoint: docs.firecrawl.dev (verified 2026-05-18)

Related: Why a stateless request model beats sessions · Crawl an entire website · Cost of web scraping at scale · Firecrawl credits and rate limits

FAQ

Frequently asked questions

How many concurrent scrape requests can I run?
There is no single right number — tune it against two ceilings. Per target host, keep simultaneous requests low (2-5) to stay polite and avoid 429/503 throttling. For total worker-pool size, start around 10-20 and raise it while throughput keeps climbing and error rate stays flat. On managed fastCRW cloud your effective ceiling is also bounded by your plan's credit budget; self-hosting removes the credit meter so your only ceiling is your own server.
Is there a batch scrape endpoint for many URLs?
No. fastCRW deliberately has no /v1/batch/scrape endpoint (this is an honest gap versus Firecrawl Cloud's batch feature). For a discrete list of URLs you iterate POST /v1/scrape concurrently with your own bounded worker pool. For every page under one domain, use POST /v1/crawl instead — it runs an async breadth-first crawl and returns a job ID, governed by maxDepth (cap 10) and maxPages (cap 1000) limits.
How do I rate-limit a scraper politely?
Cap concurrency per host rather than only globally (2-5 simultaneous requests per domain is a common start), add a small inter-request delay so you are not bursting, and treat HTTP 429 and 503 as backpressure to back off from, not errors to retry instantly. fastCRW also respects robots.txt by default, which stops a runaway pool from hitting paths a site has asked automated clients to avoid.
How should the p90 tail affect my request timeout?
Size your per-request timeout off the p90, not the median. On the canonical 3-way benchmark (diagnose_3way.py, 819 labeled URLs, 2026-05-08) fastCRW's p50 is 1914 ms — the fastest of the three. In fast mode the p90 is 4348 ms — the lowest of the three — because the chrome-stealth fallback that recovers hard pages is what also lengthens the tail on stubborn URLs. A 3-second timeout would cut into valuable pages the fallback was rescuing; an 8-15 second timeout covers the fast-mode tail comfortably.
How do I scale a scrape job over thousands of URLs?
Use a bounded worker pool over a URL queue. Load the URLs into a queue, spawn a fixed number of workers (10-20 to start), cap simultaneous requests per host, set tail-sized timeouts, and retry transient failures with exponential backoff plus jitter (capped at ~3 attempts). Because every fastCRW scrape is stateless, retries are idempotent and safe — just key results by URL. Forecast credits up front: each scrape is 1 credit for any renderer and each JSON extraction is 5.

Get Started

Try CRW Free

Self-host for free (AGPL) or use fastCRW cloud with 500 free credits — no credit card required.

Continue exploring

More engineering posts

View category archive