Docs/Docs / Hardening

Self-Hosting Hardening

Operational hardening guidance for teams running fastCRW on their own infrastructure.

Published
March 11, 2026
Updated
March 11, 2026
Category
docs
TLS and firewall postureNon-root runtimeRenderer isolation

Minimum hardening baseline

  • Terminate TLS in front of the API.
  • Run the service as a non-root user.
  • Restrict inbound access to required ports only.
  • Isolate renderer sidecars from unnecessary network paths.

That baseline is the starting point, not the finish line. A self-hosted scraper talks to untrusted public pages and can sit close to valuable internal systems, so it deserves the same discipline as any other internet-facing API.

Network and Access Control

  • Put a reverse proxy or gateway in front of the service.
  • Restrict who can reach the API by network, identity, or both.
  • Avoid exposing internal health or admin surfaces to the public internet.
  • If browser rendering is enabled, isolate the renderer from internal systems it does not need to reach.

Runtime Isolation

Treat page fetching and browser rendering as higher-risk components than your application logic.

  • run them with the least privilege possible,
  • keep filesystem access narrow,
  • and isolate sidecars so a renderer problem does not automatically become a broader platform problem.

Secrets and Keys

  • Keep API keys, proxy credentials, and LLM keys out of image builds.
  • Inject secrets at runtime through your platform's secret store.
  • Rotate keys during environment changes or incident response, not only on a fixed calendar.

Operational guidance

  • Rotate API keys during deployment cutovers.
  • Keep browser-rendering dependencies on the smallest possible surface area.
  • Expose /health only where your load balancer or monitoring needs it.
  • Review warning-heavy targets separately; they often indicate anti-bot defenses rather than renderer bugs.

Monitoring and Auditability

At minimum, watch:

  • API error rate,
  • warning frequency,
  • crawl job duration,
  • renderer availability,
  • and resource spikes on the browser sidecar.

Keep enough logs to answer three questions after an incident:

  1. what URL or workload triggered the issue,
  2. whether it was an engine problem or a target-site problem,
  3. and what data, if any, was still returned.