Why is a stateless scraping architecture easier to scale?

Because no request depends on any prior request, you scale by adding identical, interchangeable nodes. The load balancer can route freely with plain round-robin — there is no warm context to preserve, so a cold node behaves exactly like a warm one. This is the same reason scaling a CDN is trivial while scaling a stateful database is hard. fastCRW is stateless per request, so horizontal scaling is just more replicas.

Does a stateless model need session affinity or sticky routing?

No. Session affinity and sticky routing exist only to send follow-up requests back to the node holding your live session. A stateless model has no live session, so it needs neither — no sticky cookies at the load balancer, no session store to provision. You can drain and replace a node the instant its in-flight requests finish.

How can a scraper run as a single binary with no Redis?

Redis in a scraping stack usually stores sessions, queues, and rate-limit counters. A stateless engine processes each request in-process and returns, so it has nothing to persist between calls and no need for Redis. fastCRW ships as a single static Rust binary — structurally a ~8 MB image running as 1 container — versus the roughly 5-container, 2–3 GB stack of the session-based reference architecture it is API-compatible with.

Are scrape requests retry-safe in a stateless model?

Yes. A scrape request carries no dependency on prior state, so re-issuing an identical request to any node yields an equivalent result with no half-open session to clean up. Size your retry timeout against the real latency tail, not the median: fastCRW's canonical benchmark reports p50 1914 ms (fastest) and, in fast mode, p90 4348 ms — the lowest of the three (diagnose_3way.py, 819 labeled URLs, 2026-05-08). Full distribution at /benchmarks.

What is the downside of a stateless request model?

You give up persistent interactive sessions. fastCRW cannot keep a logged-in browser context alive across separate API calls — that's a deliberate scope choice, not a gap in anti-bot or rendering capability, both of which run per request via the built-in escalation ladder and proxy rotation. For multi-step interactive flows, you forward cookies and headers per call, the same way you would for any auth-gated read. The one stateful exception in fastCRW is the async crawl job, which is keyed by an opaque job ID rather than a per-connection session.

Why a Stateless Request Model Beats Sessions

By the fastCRW team · Structural and benchmark facts verified 2026-05-29 · Verify independently.

Stateless web scraping architecture vs session-based architectures

A stateless web scraping architecture treats every request as self-contained: the server holds nothing about you between calls, so any node can serve any request and a retry is just another identical request. A session-based architecture does the opposite — it parks a live browser, a cookie jar, and per-user state on one specific machine, and every subsequent call has to find its way back to that same machine. That single design choice ripples through how you scale, how you recover from failure, and how heavy your deployment is. This post is the awareness-level case for the stateless model, using fastCRW as the worked example: it is stateless per request by design, which is exactly why it ships as a single static Rust binary rather than a multi-service stack.

What state forces on your infrastructure

The moment your scraping API keeps a session alive — a logged-in browser context, a warm cookie jar, an in-progress multi-step flow — that state lives somewhere specific. Now your load balancer cannot route freely; it has to send request N+1 to the same node that handled request N. You need a session store (often Redis), sticky routing rules, session timeouts, eviction policies, and a plan for what happens when the node holding a session crashes mid-flow. Each live session also pins memory: a headless browser context is hundreds of MB, and it stays resident for as long as the session is open, whether or not you are actively using it. State is not free; it is a standing operational tax.

What statelessness buys you

Drop the server-side session and the tax disappears. Every call to POST /v1/scrape carries everything it needs — the URL, the renderer choice, any headers or cookies you want forwarded — and the server processes it, returns the result, and forgets you. There is no session to lose, no affinity to honor, no warm context to keep resident. The trade is real and we name it plainly later: you give up persistent interactive sessions. But for the overwhelming majority of scraping and extraction workloads, that is a trade you never notice — and you get a system that is dramatically simpler to scale and operate in return.

Horizontal scaling without session affinity

Scaling is where the stateless model pays off most visibly. Because no request depends on any prior request, scaling out is the simplest pattern in distributed systems: add more identical nodes.

Any node can serve any request

With no session affinity, your load balancer can use plain round-robin or least-connections routing. Spin up three replicas or thirty — they are interchangeable. A request that lands on a cold node behaves exactly like one that lands on a warm node, because there is no "warm" in the session sense. This is the difference between scaling a CDN (trivial — every edge is identical) and scaling a stateful database (hard — replication, leader election, consistency). A stateless scraper sits firmly on the easy side of that line.

No sticky routing or session store

Sticky sessions are a well-known source of production pain: they create hotspots when one user's traffic concentrates on one node, they complicate rolling deploys (you cannot drain a node without breaking its live sessions), and they make autoscaling lag because new nodes start empty. A stateless model needs none of this. There is no Redis to provision for session storage, no sticky cookie to configure at the load balancer, and a node can be drained and replaced the instant its in-flight requests finish — typically seconds, not the lifetime of a session.

Retry-safe and idempotent by default

Because a scrape request carries no dependency on prior state, retrying it is safe. If a call times out or a node dies, you re-issue the identical request — to any node — and get an equivalent result. There is no half-open session to clean up, no "did the login step already happen?" ambiguity. This idempotence is what lets you build robust concurrent pipelines: a worker pool over a URL queue can retry failures aggressively without corrupting anything. (Sizing those retries against the real latency tail matters — fastCRW's canonical 3-way scrape benchmark reports a p50 of 1914 ms and, in fast mode, a p90 of 4348 ms — the lowest of the three tools tested; diagnose_3way.py, Firecrawl public dataset, 819 labeled URLs, 2026-05-08. See /benchmarks for the full distribution.)

A smaller operational footprint

Statelessness is not just a scaling property — it is the reason a scraper can be small. If you never have to hold sessions, coordinate workers across a queue, or persist intermediate state, you do not need the services that normally do those jobs.

Single ~8 MB binary, 1 container

fastCRW's engine is a single static Rust binary — no Redis, no Node.js, no separate worker tier required. As a structural fact (not a benchmark claim), that is a roughly 8 MB Docker image running as 1 container (plus an optional sidecar). The default Docker Compose ships the lightweight Lightpanda renderer; full Chrome is opt-in and adds roughly a 500 MB image and ~1 GB resident when you enable it. The point is that the baseline is tiny because the architecture does not demand more.

No Redis or Node sidecars required

In a session-based design, Redis is usually load-bearing: it is where sessions, queues, and rate-limit counters live. A stateless engine that processes each request in-process and returns has nowhere to put that state and no need for it. That removes an entire class of operational concerns — Redis memory pressure, eviction tuning, persistence configuration, and the failure mode where the session store goes down and takes your whole API with it.

Contrast with a 5-container stack

The reference architecture fastCRW is API-compatible with runs as a multi-service stack — as a structural comparison, roughly a 5-container deployment totaling ~2–3 GB versus fastCRW's 1 container and ~8 MB image. That gap is not an accident of optimization; it is a consequence of the design choice. A system that holds sessions needs the supporting cast. A stateless one does not. If you have ever self-hosted both, the difference is "one docker run" versus "a platform-team project," and the root cause is statelessness.

How fastCRW implements per-request execution

Each scrape is self-contained

A call to POST /v1/scrape takes a URL and options, renders the page (auto-selecting chrome → lightpanda → http, or a renderer you pin), extracts the content, and returns it. Nothing about that request is retained afterward. The same is true for /v1/map (discover a site's URLs, 1 credit) and /v1/search (query the web, 1 credit per query). Each is a pure function of its inputs from the caller's perspective.

Cookies and headers passed in, not stored

The natural question is how you reach anything behind a login if the server keeps no session. The answer is that you pass the relevant cookies and headers in the request, per call, rather than asking the server to remember them. The state lives on your side — in your client, your secrets store — and travels with each request. This keeps requests idempotent and keeps the server stateless, at the cost of you owning credential lifecycle. For more on the auth-gated trade-offs specifically, see stateless vs stateful scraping.

Async crawl jobs: the one stateful exception

Honesty requires naming the exception. POST /v1/crawl kicks off an asynchronous breadth-first crawl and returns a job ID; you then poll GET /v1/crawl/:id for status and results. That job is genuinely stateful — it has to be, because a crawl runs longer than a single request and produces results over time. But notice how it is scoped: the statefulness is confined to a job record keyed by an opaque ID, not a per-connection session with affinity requirements. You can poll from any client, the job survives independently of your connection, and the request/response calls around it stay stateless. It is the minimum state the problem actually requires, not state leaking into every request.

The trade-off by design

No persistent interactive session, on purpose

fastCRW does not hold a persistent session or expose an interactive-session endpoint: you cannot open a browser, log in, click through three pages, and have the server hold that exact live context open across separate API calls. That is a deliberate scope choice, not an oversight — every request stays a pure function of its inputs, which is what keeps the server stateless and horizontally scalable. Anti-bot handling (12-signal block detection, user-agent rotation, stealth fingerprints, and proxy rotation with a residential-proxy egress tier) runs per request as part of the render escalation ladder, independent of any session.

When that matters and when it does not

For the common cases — fetch a known URL, extract structured JSON, crawl a site, run a search, build a RAG index — you never touch persistent interactive state, so the trade costs you nothing and you keep all the scaling and ops wins. Most teams discover they are in this bucket: they thought they needed sessions, but forwarding cookies per request covers their auth-gated reads, and the stateless model is simply less to operate. For genuine multi-step interactive flows — driving a wizard, maintaining an authenticated SPA session across many actions — that state lives on your side and travels with each request, same as the cookie-forwarding pattern above.

Sources

fastCRW canonical fact sheet (structural footprint, stateless-per-request gap, endpoint surface, credit costs): github.com/us/crw — crw-opencore/README.md
3-way scrape benchmark of record (diagnose_3way.py, Firecrawl public dataset, 819 labeled URLs, 2026-05-08): bench/server-runs/RESULT_3WAY_1000_FULL.md
Firecrawl /interact persistent-session reference: docs.firecrawl.dev (verified 2026-05-18)