Minimum hardening baseline

Terminate TLS in front of the API.
Run the service as a non-root user.
Restrict inbound access to required ports only.
Isolate renderer sidecars from unnecessary network paths.

That baseline is the starting point, not the finish line. A self-hosted scraper talks to untrusted public pages and can sit close to valuable internal systems, so it deserves the same discipline as any other internet-facing API.

Network and Access Control

Put a reverse proxy or gateway in front of the service.
Restrict who can reach the API by network, identity, or both.
Avoid exposing internal health or admin surfaces to the public internet.
If browser rendering is enabled, isolate the renderer from internal systems it does not need to reach.

Runtime Isolation

Treat page fetching and browser rendering as higher-risk components than your application logic.

run them with the least privilege possible,
keep filesystem access narrow,
and isolate sidecars so a renderer problem does not automatically become a broader platform problem.

Secrets and Keys

Keep API keys, proxy credentials, and LLM keys out of image builds.
Inject secrets at runtime through your platform's secret store.
Rotate keys during environment changes or incident response, not only on a fixed calendar.

Operational guidance

Rotate API keys during deployment cutovers.
Keep browser-rendering dependencies on the smallest possible surface area.
Expose /health only where your load balancer or monitoring needs it.
Review warning-heavy targets separately; they often indicate anti-bot defenses rather than renderer bugs.

Monitoring and Auditability

At minimum, watch:

API error rate,
warning frequency,
crawl job duration,
renderer availability,
and resource spikes on the browser sidecar.

Keep enough logs to answer three questions after an incident:

what URL or workload triggered the issue,
whether it was an engine problem or a target-site problem,
and what data, if any, was still returned.

Self-Hosting Hardening