Does CRW need a database or volume on Kubernetes?

No. Scrape, search, and map are stateless, so no PersistentVolume is required. You can scale replicas, drain nodes, and recycle pods freely. Only your API key Secret needs managing, ideally via Sealed Secrets or External Secrets.

How aggressively can the HPA scale CRW?

Very. The container requests ~64Mi and cold start is around 85 ms, so the HorizontalPodAutoscaler can add replicas quickly under bursty agent traffic with a low baseline. The example scales 2→20 on 65% CPU utilization.

Deploy CRW on Kubernetes (2026): Manifests, HPA, and Ingress

Why CRW Fits Kubernetes Well

CRW is a single stateless ~6MB Rust binary. That makes it close to ideal for Kubernetes: fast cold starts (~85 ms), tiny memory footprint, no sidecar database, and trivial horizontal scaling. This guide gives you production manifests — Deployment, Service, Secret, probes, HPA, and a TLS Ingress.

Prerequisites

A cluster and kubectl context configured
ingress-nginx and cert-manager installed (for TLS), and metrics-server (for the HPA)

Step 1: Namespace and API Key Secret

apiVersion: v1
kind: Namespace
metadata:
  name: crw
---
apiVersion: v1
kind: Secret
metadata:
  name: crw-secrets
  namespace: crw
type: Opaque
stringData:
  CRW_API_KEY: "fc-replace-with-a-long-random-string"
  OPENAI_API_KEY: "sk-...optional-for-extract"

Apply it, or better, manage the secret with Sealed Secrets / External Secrets so it never sits in plain git:

kubectl apply -f namespace-secret.yaml

Step 2: Deployment

Three replicas, tight resource requests, and both liveness and readiness probes on /health:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: crw
  namespace: crw
  labels: { app: crw }
spec:
  replicas: 3
  selector:
    matchLabels: { app: crw }
  template:
    metadata:
      labels: { app: crw }
    spec:
      containers:
        - name: crw
          image: ghcr.io/us/crw:latest   # pin a version tag in prod
          ports:
            - containerPort: 3000
          envFrom:
            - secretRef:
                name: crw-secrets
          resources:
            requests:
              cpu: "100m"
              memory: "64Mi"
            limits:
              cpu: "1000m"
              memory: "512Mi"
          readinessProbe:
            httpGet: { path: /health, port: 3000 }
            initialDelaySeconds: 3
            periodSeconds: 10
          livenessProbe:
            httpGet: { path: /health, port: 3000 }
            initialDelaySeconds: 10
            periodSeconds: 20
          securityContext:
            runAsNonRoot: true
            allowPrivilegeEscalation: false
            readOnlyRootFilesystem: true

The 64Mi request reflects reality — CRW has a low idle footprint. You can pack many replicas per node.

Step 3: Service

apiVersion: v1
kind: Service
metadata:
  name: crw
  namespace: crw
spec:
  selector: { app: crw }
  ports:
    - name: http
      port: 80
      targetPort: 3000

Step 4: HorizontalPodAutoscaler

Scale on CPU. Because cold start is ~85 ms, scale-up is responsive and you can run a low baseline:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: crw
  namespace: crw
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: crw
  minReplicas: 2
  maxReplicas: 20
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 65
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 120

Step 5: TLS Ingress

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: crw
  namespace: crw
  annotations:
    cert-manager.io/cluster-issuer: letsencrypt-prod
    nginx.ingress.kubernetes.io/limit-rps: "20"
spec:
  ingressClassName: nginx
  tls:
    - hosts: [scrape.yourdomain.com]
      secretName: crw-tls
  rules:
    - host: scrape.yourdomain.com
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: crw
                port: { number: 80 }

Step 6: Deploy and Verify

kubectl apply -f .
kubectl -n crw rollout status deploy/crw
kubectl -n crw get pods,hpa,ingress

# Smoke test through the Ingress
curl -s -X POST https://scrape.yourdomain.com/v1/scrape \
  -H "Authorization: Bearer fc-replace-with-a-long-random-string" \
  -H "Content-Type: application/json" \
  -d '{"url":"https://example.com","formats":["markdown"]}' | head

Step 7: Use It From Your App

It is a Firecrawl-compatible endpoint — point any SDK at the Ingress host:

from firecrawl import FirecrawlApp

app = FirecrawlApp(
    api_key="fc-YOUR-KEY",
    api_url="https://scrape.yourdomain.com",
)
print(app.scrape_url("https://example.com",
                      params={"formats": ["markdown"]})["markdown"][:200])

Step 8: Zero-Downtime Updates

# Pin and bump the tag for a controlled rollout
kubectl -n crw set image deploy/crw crw=ghcr.io/us/crw:v0.7.0
kubectl -n crw rollout status deploy/crw
# Roll back instantly if needed
kubectl -n crw rollout undo deploy/crw

PodDisruptionBudget and Graceful Rollouts

Stateless does not mean disruption-free. During a node drain or a cluster upgrade, Kubernetes will evict pods, and without a PodDisruptionBudget it can take every CRW replica down at once, dropping in-flight scrapes. Add a PDB so at least one pod always stays serving:

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: crw
  namespace: crw
spec:
  minAvailable: 1
  selector:
    matchLabels: { app: crw }

Pair this with a sane rollout strategy. The default RollingUpdate is correct for CRW; set maxUnavailable: 0 and maxSurge: 1 so a new pod is healthy before an old one is removed, giving zero-downtime image bumps. Because readiness is gated on /health, traffic only reaches a pod once it can actually serve, so a slow image pull never causes a brief 502 window.

NetworkPolicy: Egress Is Required, Ingress Is Not

CRW is unusual among internal services in that it must reach the open internet — that is its whole job. Your NetworkPolicy therefore allows broad egress while tightly restricting who can call it. A common mistake is applying a default-deny egress policy cluster-wide and then wondering why every scrape times out. Be explicit:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: crw
  namespace: crw
spec:
  podSelector:
    matchLabels: { app: crw }
  policyTypes: [Ingress, Egress]
  ingress:
    - from:
        - namespaceSelector:
            matchLabels: { name: ingress-nginx }
  egress:
    - {}   # allow all egress: CRW fetches arbitrary public URLs

Restricting ingress to the ingress-controller namespace means nothing inside the cluster can hit CRW directly and bypass auth and rate limiting at the edge. Egress stays open because you cannot enumerate the web in advance.

Observability on Kubernetes

The questions you will actually ask in an incident are: is CRW up, is it slow, and is it the bottleneck or is the target site? Wire three things and you can answer all three. First, a Prometheus ServiceMonitor (or a sidecar scrape) to track request rate and latency over time. Second, structured logs shipped to Loki or your stack via the node log agent — CRW's per-request logs let you separate "we are slow" from "this origin is slow." Third, an alert on the HPA pinning at maxReplicas while latency rises, which is the signal to raise the cap or investigate a pathological target rather than to keep scaling blindly. None of this is CRW-specific, which is the point: because the engine is a plain stateless HTTP service, your existing Kubernetes observability stack covers it with no special handling.

Operational Notes

Stateless — scrape/search/map need no persistent volume; scale and recycle pods freely.
PodDisruptionBudget — add a PDB with minAvailable: 1 so node drains never take the service fully down.
Pin the image — never run :latest in production; a tag makes rollouts and rollbacks deterministic.
NetworkPolicy — egress is required (CRW fetches the open web); restrict ingress to the controller namespace.

Why CRW Fits the Kubernetes Cost Model

Kubernetes bills you in bin-packing efficiency: the denser you can pack pods onto nodes without contention, the less you pay for the same throughput. This is where CRW's footprint is a structural advantage rather than a talking point. A pod that requests ~64Mi and a fraction of a CPU lets the scheduler place many replicas per node, so a burst of agent traffic is absorbed by adding lightweight pods rather than scaling out expensive nodes. Contrast a browser-based scraper, where each pod drags a Chromium process and hundreds of megabytes of resident memory — the same traffic forces far fewer pods per node and a much larger, costlier cluster. The HPA configuration in this guide can scale 2→20 aggressively precisely because each replica is cheap to start (~85 ms) and cheap to hold, so you run a low baseline and let autoscaling handle spikes instead of over-provisioning for peak.

The operational corollary is that capacity planning becomes simple. You are not modeling memory pressure or browser pool exhaustion; the binding constraint is CPU spent fetching and cleaning pages, which the HPA tracks directly. Set a low minReplicas for resilience, a high maxReplicas for headroom, and a CPU target around 65%, and the cluster right-sizes itself against real traffic. The lean engine turns autoscaling from a delicate tuning exercise into a default that works, which is the property you want from infrastructure that sits in the hot path of every agent request.

Why CRW on Kubernetes

Dense packing — ~6MB binary, ~64Mi request; dozens of replicas per node.
Fast scale — ~85 ms cold start makes HPA responsive under bursty agent traffic.
No lock-in — open-core Rust, lower-latency, local-first, AGPL-3.0 + Managed Cloud; identical API either way.

Helm-Style Values for Multiple Environments

Hand-applied YAML is fine for one cluster but does not scale to dev, staging, and prod. Even without full Helm, parameterize the parts that differ per environment with Kustomize overlays so the base manifests stay identical and only a small patch changes:

# base/kustomization.yaml
resources:
  - namespace-secret.yaml
  - deployment.yaml
  - service.yaml
  - hpa.yaml
  - ingress.yaml
---
# overlays/prod/kustomization.yaml
resources:
  - ../../base
patches:
  - target: { kind: Deployment, name: crw }
    patch: |
      - op: replace
        path: /spec/replicas
        value: 5
      - op: replace
        path: /spec/template/spec/containers/0/image
        value: ghcr.io/us/crw:v0.7.0
  - target: { kind: HorizontalPodAutoscaler, name: crw }
    patch: |
      - op: replace
        path: /spec/maxReplicas
        value: 40

kubectl apply -k overlays/prod
kubectl apply -k overlays/staging   # same base, smaller numbers

Dev runs two replicas on :latest; prod runs five on a pinned tag with a higher HPA ceiling — from one base. This keeps "what is different between environments" to a tiny, reviewable patch instead of divergent copies of the manifests that inevitably drift apart.

Next Steps

See Self-Host CRW With Docker Compose for the simpler single-box path
Read The Single-Binary Infrastructure Advantage

Self-host CRW from GitHub for free, or use fastCRW for managed cloud scraping.