Self-Hosted Search API — A Devops Guide (2026)
Self-host search for AI agents: data residency, hardening, threat model, and operational concerns. Compares fastCRW, raw SearXNG, OrioSearch, agent-search, Vane.
If you need a search API in your perimeter — for data residency, vendor risk, regulatory, or cost — fastCRW is the smallest credible OSS path. Raw SearXNG works if you bring the auth and rate-limiting yourself.
Why self-host search at all
Self-hosting a search API is rarely fun. The reasons people do it anyway:
- Data residency. User queries are PII in regulated industries. Sending them to a US cloud breaks GDPR, HIPAA, or SOC 2 in ways your security team won't sign off on.
- Vendor risk. Tavily was acquired by Nebius in February 2026; the deal hadn't closed by May. Healthy companies still get acquired, sunset products, change pricing. Self-hosted means you control the timeline.
- Cost at scale. The break-even between API credits and a self-hosted server lands somewhere around 5K–10K req/mo. Below that, paid APIs win. Above that, the math compounds in your favor.
- Regulatory. Some workloads (gov, defense, finance) literally cannot ship outbound queries.
- Operational simplicity. One fewer API key, one fewer rate-limit page, one fewer external SLA in your dependency graph.
If none of these apply, don't self-host search. Use Tavily, Serper, or SerpAPI and move on. The ops cost is real; this page is for teams where the trade-off is favorable.
OSS self-hosted search APIs — comparison matrix
| Project | License | Architecture | Auth/Limits | Content extraction | MCP | Hardening config |
|---|---|---|---|---|---|---|
| fastCRW | AGPL-3.0 | Rust + bundled SearXNG | Bearer token (optional self-host) | Yes (/v1/scrape) | Yes | Read-only rootfs, dropped caps, mem limits, pinned image |
| SearXNG (raw) | AGPL-3.0 | Python aggregator | None | No | No | You build it |
| OrioSearch | MIT | Python + SearXNG + Redis | Bearer token, timing-safe | Yes (trafilatura/readability) | No | Compose default |
| agent-search | MIT | FastAPI + SearXNG | Bearer token | Yes (9-strategy fallback chain) | Yes | Compose default + optional Tor |
| Vane (was Perplexica) | MIT | Chat UI + SearXNG | UI auth | Built-in | No | Chat product, not API |
The matrix highlights the gap: SearXNG-direct gives you search aggregation but ships none of the wrapper concerns. Each of the other projects is a different opinion on what wrapper to build.
Hardening — what fastCRW's compose stack actually does
# Excerpt from docker-compose.yml — full file in the repo
services:
searxng:
image: searxng/searxng:2026.4.27-... # pinned tag, NOT latest
read_only: true # rootfs read-only
cap_drop:
- ALL # drop all Linux caps
security_opt:
- no-new-privileges:true # no privilege escalation
mem_limit: 512m # memory cap
pids_limit: 256 # PID cap (fork bomb mitigation)
tmpfs:
- /tmp:size=64m # writable tmp on tmpfs
volumes:
- ./config/searxng/settings.yml:/etc/searxng/settings.yml:ro # config RO
What this gets you:
- Compromised SearXNG can't write to its own filesystem (read-only rootfs).
- Compromised SearXNG can't gain privileges (
no-new-privileges, dropped caps). - Compromised SearXNG can't fork-bomb the host (
pids_limit). - Memory pressure is bounded (
mem_limit). - Image pinning prevents supply-chain drift — no auto-updating to a
:latesttag.
What it does NOT cover:
- Application-level vulnerabilities in SearXNG or fastCRW (those need patching via image bumps).
- Prompt injection in scraped content — that's an application concern, not infrastructure.
- DDoS at the edge — put fastCRW behind a CDN or load balancer with rate limiting.
Threat model
User query → fastCRW HTTP layer → SearXNG sidecar → upstream engines
↓ ↓
(auth, rate limit) (no internet egress
(input validation) beyond search engines)
| Threat | Mitigation |
|---|---|
| Prompt injection in scraped content reaching LLM | Content sanitization on /v1/scrape (agent-search calls this "prompt-injection scrubbing"; fastCRW does its own version) |
SSRF — /v1/scrape accepting http://localhost, http://169.254.169.254/, etc. | URL validation, reject loopback/link-local/private, configurable allowlist |
| Resource exhaustion — 10 GB page download | Response size cap (fastCRW caps at 10MB by default) |
| Compromised sidecar attempts host escape | cap_drop, no-new-privileges, read-only rootfs |
| Vulnerable SearXNG version | Pinned image tag forces deliberate version bumps; subscribe to upstream security feed |
| API key leakage | Bearer auth optional in self-host (intentional — many self-hosters run inside a private network) |
If your environment requires more, layers above this stack (network policies, mTLS, egress firewalls) are where to add them — none are blocked by the compose default.
How to deploy fastCRW (the 2-minute path)
# 1. Clone
git clone https://github.com/us/crw && cd crw
# 2. Configure
cp .env.example .env
# Optional: set CRW_API_TOKEN if you want bearer auth on
vim .env
# 3. Boot
docker compose up --build
# Stack: fastCRW (:8080) + SearXNG sidecar + Redis
# 4. Smoke test
curl -X POST http://localhost:8080/v1/search \
-H "Content-Type: application/json" \
-d '{"query": "site:nist.gov password rotation guidance", "limit": 5}'
# 5. Verify health
curl http://localhost:8080/health
For production: put it behind your existing reverse proxy / WAF / CDN. The fastCRW binary speaks plain HTTP on :8080 by design — TLS termination is your edge layer's job.
Operational concerns
Upstream rate limits
SearXNG queries Google/Bing/DuckDuckGo/Brave directly from your server's IP. At meaningful QPS, two things happen:
- Captchas. Google in particular fingerprints repeat-querier traffic. Mitigation: enable engine rotation in
settings.yml, configure Brave Search API for a paid lane. - Soft bans. Some engines simply rate-limit known IPs. Mitigation: rotate egress IPs (residential proxies, multiple VPS), or accept lower throughput.
This is the operational tax. fastCRW Cloud absorbs it; self-host means you own it.
Observability
The compose stack ships:
/healthendpoint (open, JSON)/tool-schemaendpoint (open, JSON, lists MCP tool surface)- structured tracing via OpenTelemetry env vars (set
OTEL_EXPORTER_OTLP_ENDPOINT)
If you have a Grafana stack, point it at the OTLP endpoint and you get latency, error rate, and per-engine breakdown out of the box.
Backup and disaster recovery
Stateless. fastCRW + SearXNG + Redis are all stateless within a deployment — no database to back up. Configuration lives in your .env and config/searxng/settings.yml, both of which should live in your config-management repo. To DR: re-deploy the compose stack on a new host.
Where each option fails
- fastCRW: at high QPS, upstream rate limits hit (above). At very low resource budgets, the Rust runtime is overkill — but at ~8 MB image and ~6.6 MB idle RAM, it's not really a constraint anyone hits.
- SearXNG-direct: zero auth, zero rate limit, zero extraction. You build all of it. That's not a flaw — it's the explicit shape — but plan for it.
- OrioSearch: small project (~22 stars). Maintenance risk. If your team has Python expertise and the project's compatibility shape matches your needs, it's a real option; budget for forking later.
- agent-search: ~25 stars, similar maintenance concern. The Tor stack adds operational surface area.
- Vane: not an API. Chat UI. Different shape entirely.
When to pay for hosted instead
Self-hosting is the right call when one of the five reasons at the top of this page applies. If none does, paying for Tavily, Serper, SerpAPI, or fastCRW Cloud is cheaper than your team's time. The ops tax of self-hosting is real:
- ~30 minutes of attention per week (engine rotation, image bumps, captcha investigation),
- on-call rotation if it's user-facing,
- one engineer who actually understands the stack when it breaks.
Most teams underestimate this until they're three months in. Plan for it honestly.
Recommended next reads
- Tavily alternative hub — Tavily-shape API surface, hosted plan option.
- Open-source Tavily alternatives — narrower OSS-focused comparison.
- Tavily vs Serper — when paid APIs make sense.
Three calls to action
- Deploy now —
docker compose up. Quickstart → - Audit the hardening config — docker-compose.yml on GitHub →
- Skip the ops — hosted plan, same API surface →
Continue exploring
More from Alternatives
Firecrawl Self-Hosted Rust Crate — Two Paths in 2026
Tavily-Style Search API — Free to Self-Host (2026)
Apify vs fastCRW: When to Migrate (2026)
A 1:1 deep comparison for teams already on Apify and evaluating fastCRW. Migration triggers, request-shape diff, rental-Actor sunset checklist, pricing math at three scales, and the cases where Apify is still the right call.
Firecrawl Alternative in 2026 — fastCRW (Self-Host, Compatibility Matrix)
Firecrawl alternative comparison: fastCRW is Firecrawl-compatible on the /scrape, /crawl, /map, /search overlap surface, runs as a single Rust binary (~6.6 MB idle), and is honest about what Firecrawl does that we don't. Migration matrix + branded long-tail Q&A.
Open-Source Tavily Alternatives — What They Actually Do
Tavily is closed-source. Here's an honest comparison of OSS search APIs you can self-host: fastCRW, OrioSearch, agent-search, SearXNG-direct, and Vane (formerly Perplexica).
Related hubs