By the fastCRW team · Structural and benchmark facts verified 2026-05-29 against marketing/CANONICAL-FACTS.md · Verify independently.
Stateless web scraping architecture vs session-based architectures
A stateless web scraping architecture treats every request as self-contained: the server holds nothing about you between calls, so any node can serve any request and a retry is just another identical request. A session-based architecture does the opposite — it parks a live browser, a cookie jar, and per-user state on one specific machine, and every subsequent call has to find its way back to that same machine. That single design choice ripples through how you scale, how you recover from failure, and how heavy your deployment is. This post is the awareness-level case for the stateless model, using fastCRW as the worked example: it is stateless per request by design, which is exactly why it ships as a single static Rust binary rather than a multi-service stack.
What state forces on your infrastructure
The moment your scraping API keeps a session alive — a logged-in browser context, a warm cookie jar, an in-progress multi-step flow — that state lives somewhere specific. Now your load balancer cannot route freely; it has to send request N+1 to the same node that handled request N. You need a session store (often Redis), sticky routing rules, session timeouts, eviction policies, and a plan for what happens when the node holding a session crashes mid-flow. Each live session also pins memory: a headless browser context is hundreds of MB, and it stays resident for as long as the session is open, whether or not you are actively using it. State is not free; it is a standing operational tax.
What statelessness buys you
Drop the server-side session and the tax disappears. Every call to POST /v1/scrape carries everything it needs — the URL, the renderer choice, any headers or cookies you want forwarded — and the server processes it, returns the result, and forgets you. There is no session to lose, no affinity to honor, no warm context to keep resident. The trade is real and we name it plainly later: you give up persistent interactive sessions. But for the overwhelming majority of scraping and extraction workloads, that is a trade you never notice — and you get a system that is dramatically simpler to scale and operate in return.
Horizontal scaling without session affinity
Scaling is where the stateless model pays off most visibly. Because no request depends on any prior request, scaling out is the simplest pattern in distributed systems: add more identical nodes.
Any node can serve any request
With no session affinity, your load balancer can use plain round-robin or least-connections routing. Spin up three replicas or thirty — they are interchangeable. A request that lands on a cold node behaves exactly like one that lands on a warm node, because there is no "warm" in the session sense. This is the difference between scaling a CDN (trivial — every edge is identical) and scaling a stateful database (hard — replication, leader election, consistency). A stateless scraper sits firmly on the easy side of that line.
No sticky routing or session store
Sticky sessions are a well-known source of production pain: they create hotspots when one user's traffic concentrates on one node, they complicate rolling deploys (you cannot drain a node without breaking its live sessions), and they make autoscaling lag because new nodes start empty. A stateless model needs none of this. There is no Redis to provision for session storage, no sticky cookie to configure at the load balancer, and a node can be drained and replaced the instant its in-flight requests finish — typically seconds, not the lifetime of a session.
Retry-safe and idempotent by default
Because a scrape request carries no dependency on prior state, retrying it is safe. If a call times out or a node dies, you re-issue the identical request — to any node — and get an equivalent result. There is no half-open session to clean up, no "did the login step already happen?" ambiguity. This idempotence is what lets you build robust concurrent pipelines: a worker pool over a URL queue can retry failures aggressively without corrupting anything. (Sizing those retries against the real latency tail matters — fastCRW's canonical 3-way scrape benchmark reports a p50 of 1914 ms but a p90 of 14157 ms, the worst of the three tools tested, because the chrome-stealth fallback that recovers hard URLs is also what produces the slow tail; diagnose_3way.py, Firecrawl public dataset, 819 labeled URLs, 2026-05-08. Set your per-request timeout against that p90, not the median.)
A smaller operational footprint
Statelessness is not just a scaling property — it is the reason a scraper can be small. If you never have to hold sessions, coordinate workers across a queue, or persist intermediate state, you do not need the services that normally do those jobs.
Single ~8 MB binary, 1 container
fastCRW's engine is a single static Rust binary — no Redis, no Node.js, no separate worker tier required. As a structural fact (not a benchmark claim), that is a roughly 8 MB Docker image running as 1 container (plus an optional sidecar). The default Docker Compose ships the lightweight Lightpanda renderer; full Chrome is opt-in and adds roughly a 500 MB image and ~1 GB resident when you enable it. The point is that the baseline is tiny because the architecture does not demand more.
No Redis or Node sidecars required
In a session-based design, Redis is usually load-bearing: it is where sessions, queues, and rate-limit counters live. A stateless engine that processes each request in-process and returns has nowhere to put that state and no need for it. That removes an entire class of operational concerns — Redis memory pressure, eviction tuning, persistence configuration, and the failure mode where the session store goes down and takes your whole API with it.
Contrast with a 5-container stack
The reference architecture fastCRW is API-compatible with runs as a multi-service stack — as a structural comparison, roughly a 5-container deployment totaling ~2–3 GB versus fastCRW's 1 container and ~8 MB image. That gap is not an accident of optimization; it is a consequence of the design choice. A system that holds sessions needs the supporting cast. A stateless one does not. If you have ever self-hosted both, the difference is "one docker run" versus "a platform-team project," and the root cause is statelessness.
How fastCRW implements per-request execution
Each scrape is self-contained
A call to POST /v1/scrape takes a URL and options, renders the page (auto-selecting chrome → lightpanda → http, or a renderer you pin), extracts the content, and returns it. Nothing about that request is retained afterward. The same is true for /v1/map (discover a site's URLs, 1 credit) and /v1/search (query the web, 1 credit per query). Each is a pure function of its inputs from the caller's perspective.
Cookies and headers passed in, not stored
The natural question is how you reach anything behind a login if the server keeps no session. The answer is that you pass the relevant cookies and headers in the request, per call, rather than asking the server to remember them. The state lives on your side — in your client, your secrets store — and travels with each request. This keeps requests idempotent and keeps the server stateless, at the cost of you owning credential lifecycle. For more on the auth-gated trade-offs specifically, see stateless vs stateful scraping.
Async crawl jobs: the one stateful exception
Honesty requires naming the exception. POST /v1/crawl kicks off an asynchronous breadth-first crawl and returns a job ID; you then poll GET /v1/crawl/:id for status and results. That job is genuinely stateful — it has to be, because a crawl runs longer than a single request and produces results over time. But notice how it is scoped: the statefulness is confined to a job record keyed by an opaque ID, not a per-connection session with affinity requirements. You can poll from any client, the job survives independently of your connection, and the request/response calls around it stay stateless. It is the minimum state the problem actually requires, not state leaking into every request.
The trade-off you accept
No persistent interactive session
The honest gap: fastCRW has no persistent session or interactive-session endpoint. You cannot open a browser, log in, click through three pages, and have the server hold that exact live context open across separate API calls. A tool that keeps a browser alive between calls (Firecrawl's /interact is the obvious example) genuinely wins for multi-step interactive flows — driving a wizard, maintaining an authenticated SPA session across many actions, or any workflow that depends on accumulated in-browser state. fastCRW also has no Fire-engine-style built-in anti-bot layer. If your workload centers on those, that is a real reason to choose a stateful tool, and we say so plainly.
When that matters and when it does not
For the common cases — fetch a known URL, extract structured JSON, crawl a site, run a search, build a RAG index — you never touch persistent interactive state, so the trade costs you nothing and you keep all the scaling and ops wins. For genuine multi-step interactive sessions, it is a hard limit. Most teams discover they are in the first bucket: they thought they needed sessions, but forwarding cookies per request covers their auth-gated reads, and the stateless model is simply less to operate. The decision comes down to one question: does your workload depend on accumulated server-held browser state between calls, or not?
Sources
- fastCRW canonical fact sheet (structural footprint, stateless-per-request gap, endpoint surface, credit costs): github.com/us/crw —
crw-opencore/README.md - 3-way scrape benchmark of record (
diagnose_3way.py, Firecrawl public dataset, 819 labeled URLs, 2026-05-08):bench/server-runs/RESULT_3WAY_1000_FULL.md - Firecrawl
/interactpersistent-session reference: docs.firecrawl.dev (verified 2026-05-18)
Related: Stateless vs stateful scraping · Single-binary scraping infrastructure · fastCRW architecture · Low-memory scraping · Self-host fastCRW with Docker Compose
