Skip to main content
Alternatives/Alternative / Self-hosted search API

Self-Hosted Search API — A Devops Guide (2026)

Self-host search for AI agents: data residency, hardening, threat model, and operational concerns. Compares fastCRW, raw SearXNG, OrioSearch, agent-search, Vane.

Published
May 9, 2026
Updated
May 9, 2026
Category
alternatives
Verdict

If you need a search API in your perimeter — for data residency, vendor risk, regulatory, or cost — fastCRW is the smallest credible OSS path. Raw SearXNG works if you bring the auth and rate-limiting yourself.

Honest comparison of OSS self-hosted search APIs — fastCRW, SearXNG-direct, OrioSearch, agent-searchThreat model + hardening checklist (read-only rootfs, dropped caps, mem limits, image pinning)Operational concerns: upstream rate limits, captchas, scraping ratelimits, observability

Why self-host search at all

Self-hosting a search API is rarely fun. The reasons people do it anyway:

  1. Data residency. User queries are PII in regulated industries. Sending them to a US cloud breaks GDPR, HIPAA, or SOC 2 in ways your security team won't sign off on.
  2. Vendor risk. Tavily was acquired by Nebius in February 2026; the deal hadn't closed by May. Healthy companies still get acquired, sunset products, change pricing. Self-hosted means you control the timeline.
  3. Cost at scale. The break-even between API credits and a self-hosted server lands somewhere around 5K–10K req/mo. Below that, paid APIs win. Above that, the math compounds in your favor.
  4. Regulatory. Some workloads (gov, defense, finance) literally cannot ship outbound queries.
  5. Operational simplicity. One fewer API key, one fewer rate-limit page, one fewer external SLA in your dependency graph.

If none of these apply, don't self-host search. Use Tavily, Serper, or SerpAPI and move on. The ops cost is real; this page is for teams where the trade-off is favorable.

OSS self-hosted search APIs — comparison matrix

ProjectLicenseArchitectureAuth/LimitsContent extractionMCPHardening config
fastCRWAGPL-3.0Rust + bundled SearXNGBearer token (optional self-host)Yes (/v1/scrape)YesRead-only rootfs, dropped caps, mem limits, pinned image
SearXNG (raw)AGPL-3.0Python aggregatorNoneNoNoYou build it
OrioSearchMITPython + SearXNG + RedisBearer token, timing-safeYes (trafilatura/readability)NoCompose default
agent-searchMITFastAPI + SearXNGBearer tokenYes (9-strategy fallback chain)YesCompose default + optional Tor
Vane (was Perplexica)MITChat UI + SearXNGUI authBuilt-inNoChat product, not API

The matrix highlights the gap: SearXNG-direct gives you search aggregation but ships none of the wrapper concerns. Each of the other projects is a different opinion on what wrapper to build.

Hardening — what fastCRW's compose stack actually does

# Excerpt from docker-compose.yml — full file in the repo
services:
  searxng:
    image: searxng/searxng:2026.4.27-... # pinned tag, NOT latest
    read_only: true                      # rootfs read-only
    cap_drop:
      - ALL                              # drop all Linux caps
    security_opt:
      - no-new-privileges:true           # no privilege escalation
    mem_limit: 512m                      # memory cap
    pids_limit: 256                      # PID cap (fork bomb mitigation)
    tmpfs:
      - /tmp:size=64m                    # writable tmp on tmpfs
    volumes:
      - ./config/searxng/settings.yml:/etc/searxng/settings.yml:ro  # config RO

What this gets you:

  • Compromised SearXNG can't write to its own filesystem (read-only rootfs).
  • Compromised SearXNG can't gain privileges (no-new-privileges, dropped caps).
  • Compromised SearXNG can't fork-bomb the host (pids_limit).
  • Memory pressure is bounded (mem_limit).
  • Image pinning prevents supply-chain drift — no auto-updating to a :latest tag.

What it does NOT cover:

  • Application-level vulnerabilities in SearXNG or fastCRW (those need patching via image bumps).
  • Prompt injection in scraped content — that's an application concern, not infrastructure.
  • DDoS at the edge — put fastCRW behind a CDN or load balancer with rate limiting.

Threat model

User query → fastCRW HTTP layer → SearXNG sidecar → upstream engines
                ↓                         ↓
       (auth, rate limit)         (no internet egress
       (input validation)          beyond search engines)
ThreatMitigation
Prompt injection in scraped content reaching LLMContent sanitization on /v1/scrape (agent-search calls this "prompt-injection scrubbing"; fastCRW does its own version)
SSRF — /v1/scrape accepting http://localhost, http://169.254.169.254/, etc.URL validation, reject loopback/link-local/private, configurable allowlist
Resource exhaustion — 10 GB page downloadResponse size cap (fastCRW caps at 10MB by default)
Compromised sidecar attempts host escapecap_drop, no-new-privileges, read-only rootfs
Vulnerable SearXNG versionPinned image tag forces deliberate version bumps; subscribe to upstream security feed
API key leakageBearer auth optional in self-host (intentional — many self-hosters run inside a private network)

If your environment requires more, layers above this stack (network policies, mTLS, egress firewalls) are where to add them — none are blocked by the compose default.

How to deploy fastCRW (the 2-minute path)

# 1. Clone
git clone https://github.com/us/crw && cd crw

# 2. Configure
cp .env.example .env
# Optional: set CRW_API_TOKEN if you want bearer auth on
vim .env

# 3. Boot
docker compose up --build
# Stack: fastCRW (:8080) + SearXNG sidecar + Redis

# 4. Smoke test
curl -X POST http://localhost:8080/v1/search \
  -H "Content-Type: application/json" \
  -d '{"query": "site:nist.gov password rotation guidance", "limit": 5}'

# 5. Verify health
curl http://localhost:8080/health

For production: put it behind your existing reverse proxy / WAF / CDN. The fastCRW binary speaks plain HTTP on :8080 by design — TLS termination is your edge layer's job.

Operational concerns

Upstream rate limits

SearXNG queries Google/Bing/DuckDuckGo/Brave directly from your server's IP. At meaningful QPS, two things happen:

  1. Captchas. Google in particular fingerprints repeat-querier traffic. Mitigation: enable engine rotation in settings.yml, configure Brave Search API for a paid lane.
  2. Soft bans. Some engines simply rate-limit known IPs. Mitigation: rotate egress IPs (residential proxies, multiple VPS), or accept lower throughput.

This is the operational tax. fastCRW Cloud absorbs it; self-host means you own it.

Observability

The compose stack ships:

  • /health endpoint (open, JSON)
  • /tool-schema endpoint (open, JSON, lists MCP tool surface)
  • structured tracing via OpenTelemetry env vars (set OTEL_EXPORTER_OTLP_ENDPOINT)

If you have a Grafana stack, point it at the OTLP endpoint and you get latency, error rate, and per-engine breakdown out of the box.

Backup and disaster recovery

Stateless. fastCRW + SearXNG + Redis are all stateless within a deployment — no database to back up. Configuration lives in your .env and config/searxng/settings.yml, both of which should live in your config-management repo. To DR: re-deploy the compose stack on a new host.

Where each option fails

  • fastCRW: at high QPS, upstream rate limits hit (above). At very low resource budgets, the Rust runtime is overkill — but at ~8 MB image and ~6.6 MB idle RAM, it's not really a constraint anyone hits.
  • SearXNG-direct: zero auth, zero rate limit, zero extraction. You build all of it. That's not a flaw — it's the explicit shape — but plan for it.
  • OrioSearch: small project (~22 stars). Maintenance risk. If your team has Python expertise and the project's compatibility shape matches your needs, it's a real option; budget for forking later.
  • agent-search: ~25 stars, similar maintenance concern. The Tor stack adds operational surface area.
  • Vane: not an API. Chat UI. Different shape entirely.

When to pay for hosted instead

Self-hosting is the right call when one of the five reasons at the top of this page applies. If none does, paying for Tavily, Serper, SerpAPI, or fastCRW Cloud is cheaper than your team's time. The ops tax of self-hosting is real:

  • ~30 minutes of attention per week (engine rotation, image bumps, captcha investigation),
  • on-call rotation if it's user-facing,
  • one engineer who actually understands the stack when it breaks.

Most teams underestimate this until they're three months in. Plan for it honestly.

Three calls to action

Continue exploring

More from Alternatives

View all alternatives

Related hubs

Keep the crawl path moving