Why CRW Fits Kubernetes Well
CRW is a single stateless ~6MB Rust binary. That makes it close to ideal for Kubernetes: fast cold starts (~85 ms), tiny memory footprint, no sidecar database, and trivial horizontal scaling. This guide gives you production manifests — Deployment, Service, Secret, probes, HPA, and a TLS Ingress.
Prerequisites
- A cluster and
kubectlcontext configured ingress-nginxandcert-managerinstalled (for TLS), andmetrics-server(for the HPA)
Step 1: Namespace and API Key Secret
apiVersion: v1
kind: Namespace
metadata:
name: crw
---
apiVersion: v1
kind: Secret
metadata:
name: crw-secrets
namespace: crw
type: Opaque
stringData:
CRW_API_KEY: "fc-replace-with-a-long-random-string"
OPENAI_API_KEY: "sk-...optional-for-extract"
Apply it, or better, manage the secret with Sealed Secrets / External Secrets so it never sits in plain git:
kubectl apply -f namespace-secret.yaml
Step 2: Deployment
Three replicas, tight resource requests, and both liveness and readiness probes on /health:
apiVersion: apps/v1
kind: Deployment
metadata:
name: crw
namespace: crw
labels: { app: crw }
spec:
replicas: 3
selector:
matchLabels: { app: crw }
template:
metadata:
labels: { app: crw }
spec:
containers:
- name: crw
image: ghcr.io/us/crw:latest # pin a version tag in prod
ports:
- containerPort: 3000
envFrom:
- secretRef:
name: crw-secrets
resources:
requests:
cpu: "100m"
memory: "64Mi"
limits:
cpu: "1000m"
memory: "512Mi"
readinessProbe:
httpGet: { path: /health, port: 3000 }
initialDelaySeconds: 3
periodSeconds: 10
livenessProbe:
httpGet: { path: /health, port: 3000 }
initialDelaySeconds: 10
periodSeconds: 20
securityContext:
runAsNonRoot: true
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
The 64Mi request reflects reality — CRW has a low idle footprint. You can pack many replicas per node.
Step 3: Service
apiVersion: v1
kind: Service
metadata:
name: crw
namespace: crw
spec:
selector: { app: crw }
ports:
- name: http
port: 80
targetPort: 3000
Step 4: HorizontalPodAutoscaler
Scale on CPU. Because cold start is ~85 ms, scale-up is responsive and you can run a low baseline:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: crw
namespace: crw
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: crw
minReplicas: 2
maxReplicas: 20
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 65
behavior:
scaleDown:
stabilizationWindowSeconds: 120
Step 5: TLS Ingress
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: crw
namespace: crw
annotations:
cert-manager.io/cluster-issuer: letsencrypt-prod
nginx.ingress.kubernetes.io/limit-rps: "20"
spec:
ingressClassName: nginx
tls:
- hosts: [scrape.yourdomain.com]
secretName: crw-tls
rules:
- host: scrape.yourdomain.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: crw
port: { number: 80 }
Step 6: Deploy and Verify
kubectl apply -f .
kubectl -n crw rollout status deploy/crw
kubectl -n crw get pods,hpa,ingress
# Smoke test through the Ingress
curl -s -X POST https://scrape.yourdomain.com/v1/scrape \
-H "Authorization: Bearer fc-replace-with-a-long-random-string" \
-H "Content-Type: application/json" \
-d '{"url":"https://example.com","formats":["markdown"]}' | head
Step 7: Use It From Your App
It is a Firecrawl-compatible endpoint — point any SDK at the Ingress host:
from firecrawl import FirecrawlApp
app = FirecrawlApp(
api_key="fc-YOUR-KEY",
api_url="https://scrape.yourdomain.com",
)
print(app.scrape_url("https://example.com",
params={"formats": ["markdown"]})["markdown"][:200])
Step 8: Zero-Downtime Updates
# Pin and bump the tag for a controlled rollout
kubectl -n crw set image deploy/crw crw=ghcr.io/us/crw:v0.7.0
kubectl -n crw rollout status deploy/crw
# Roll back instantly if needed
kubectl -n crw rollout undo deploy/crw
PodDisruptionBudget and Graceful Rollouts
Stateless does not mean disruption-free. During a node drain or a cluster upgrade, Kubernetes will evict pods, and without a PodDisruptionBudget it can take every CRW replica down at once, dropping in-flight scrapes. Add a PDB so at least one pod always stays serving:
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: crw
namespace: crw
spec:
minAvailable: 1
selector:
matchLabels: { app: crw }
Pair this with a sane rollout strategy. The default RollingUpdate is correct for CRW; set maxUnavailable: 0 and maxSurge: 1 so a new pod is healthy before an old one is removed, giving zero-downtime image bumps. Because readiness is gated on /health, traffic only reaches a pod once it can actually serve, so a slow image pull never causes a brief 502 window.
NetworkPolicy: Egress Is Required, Ingress Is Not
CRW is unusual among internal services in that it must reach the open internet — that is its whole job. Your NetworkPolicy therefore allows broad egress while tightly restricting who can call it. A common mistake is applying a default-deny egress policy cluster-wide and then wondering why every scrape times out. Be explicit:
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: crw
namespace: crw
spec:
podSelector:
matchLabels: { app: crw }
policyTypes: [Ingress, Egress]
ingress:
- from:
- namespaceSelector:
matchLabels: { name: ingress-nginx }
egress:
- {} # allow all egress: CRW fetches arbitrary public URLs
Restricting ingress to the ingress-controller namespace means nothing inside the cluster can hit CRW directly and bypass auth and rate limiting at the edge. Egress stays open because you cannot enumerate the web in advance.
Observability on Kubernetes
The questions you will actually ask in an incident are: is CRW up, is it slow, and is it the bottleneck or is the target site? Wire three things and you can answer all three. First, a Prometheus ServiceMonitor (or a sidecar scrape) to track request rate and latency over time. Second, structured logs shipped to Loki or your stack via the node log agent — CRW's per-request logs let you separate "we are slow" from "this origin is slow." Third, an alert on the HPA pinning at maxReplicas while latency rises, which is the signal to raise the cap or investigate a pathological target rather than to keep scaling blindly. None of this is CRW-specific, which is the point: because the engine is a plain stateless HTTP service, your existing Kubernetes observability stack covers it with no special handling.
Operational Notes
- Stateless — scrape/search/map need no persistent volume; scale and recycle pods freely.
- PodDisruptionBudget — add a PDB with
minAvailable: 1so node drains never take the service fully down. - Pin the image — never run
:latestin production; a tag makes rollouts and rollbacks deterministic. - NetworkPolicy — egress is required (CRW fetches the open web); restrict ingress to the controller namespace.
Why CRW Fits the Kubernetes Cost Model
Kubernetes bills you in bin-packing efficiency: the denser you can pack pods onto nodes without contention, the less you pay for the same throughput. This is where CRW's footprint is a structural advantage rather than a talking point. A pod that requests ~64Mi and a fraction of a CPU lets the scheduler place many replicas per node, so a burst of agent traffic is absorbed by adding lightweight pods rather than scaling out expensive nodes. Contrast a browser-based scraper, where each pod drags a Chromium process and hundreds of megabytes of resident memory — the same traffic forces far fewer pods per node and a much larger, costlier cluster. The HPA configuration in this guide can scale 2→20 aggressively precisely because each replica is cheap to start (~85 ms) and cheap to hold, so you run a low baseline and let autoscaling handle spikes instead of over-provisioning for peak.
The operational corollary is that capacity planning becomes simple. You are not modeling memory pressure or browser pool exhaustion; the binding constraint is CPU spent fetching and cleaning pages, which the HPA tracks directly. Set a low minReplicas for resilience, a high maxReplicas for headroom, and a CPU target around 65%, and the cluster right-sizes itself against real traffic. The lean engine turns autoscaling from a delicate tuning exercise into a default that works, which is the property you want from infrastructure that sits in the hot path of every agent request.
Why CRW on Kubernetes
- Dense packing — ~6MB binary, ~64Mi request; dozens of replicas per node.
- Fast scale — ~85 ms cold start makes HPA responsive under bursty agent traffic.
- No lock-in — open-core Rust, lower-latency, local-first, AGPL-3.0 + Managed Cloud; identical API either way.
Helm-Style Values for Multiple Environments
Hand-applied YAML is fine for one cluster but does not scale to dev, staging, and prod. Even without full Helm, parameterize the parts that differ per environment with Kustomize overlays so the base manifests stay identical and only a small patch changes:
# base/kustomization.yaml
resources:
- namespace-secret.yaml
- deployment.yaml
- service.yaml
- hpa.yaml
- ingress.yaml
---
# overlays/prod/kustomization.yaml
resources:
- ../../base
patches:
- target: { kind: Deployment, name: crw }
patch: |
- op: replace
path: /spec/replicas
value: 5
- op: replace
path: /spec/template/spec/containers/0/image
value: ghcr.io/us/crw:v0.7.0
- target: { kind: HorizontalPodAutoscaler, name: crw }
patch: |
- op: replace
path: /spec/maxReplicas
value: 40
kubectl apply -k overlays/prod
kubectl apply -k overlays/staging # same base, smaller numbers
Dev runs two replicas on :latest; prod runs five on a pinned tag with a higher HPA ceiling — from one base. This keeps "what is different between environments" to a tiny, reviewable patch instead of divergent copies of the manifests that inevitably drift apart.
Next Steps
- See Self-Host CRW With Docker Compose for the simpler single-box path
- Read The Single-Binary Infrastructure Advantage
Self-host CRW from GitHub for free, or use fastCRW for managed cloud scraping.
