Go Web Scraping API — fastCRW [Firecrawl-Compatible]
Scrape, crawl, and search from Go with fastCRW — a Firecrawl-compatible REST API backed by a single Rust binary. Worker pools, errgroup, rate limiting, per-request context timeouts. AGPL-3.0, self-host free.
Call fastCRW from Go with net/http and encoding/json — no SDK needed. Compose a bounded errgroup pool, per-host rate limiting, and per-request context timeouts around /v1/scrape calls. The Rust engine handles rendering and extraction; your Go service handles concurrency.
Verdict
fastCRW is Firecrawl-compatible REST — POST /v1/scrape, get back clean Markdown. Go has no official SDK, and it does not need one: net/http plus encoding/json is the complete client. What makes Go a strong fit is the concurrency story — compose errgroup.SetLimit(n), a per-host rate.Limiter, and per-request context.WithTimeout around /v1/scrape calls, and the engine handles rendering and extraction while your Go service handles concurrency. Under the hood: a single ~8 MB Rust binary that deploys alongside your Go service as just another binary.
Who This Is For
- Go developers who want clean Markdown from the web — without parsing raw HTML or running a headless browser.
- Teams building high-throughput scrapers — you want bounded concurrency with proper error propagation and rate limiting.
- Backend services that need web enrichment — a Go API that calls fastCRW to enrich records on demand.
- Self-hosting shops — a static Rust binary is a natural fit alongside Go services; AGPL-3.0, $0 per 1,000 scrapes.
Setup
1. Start the engine
Local (Docker):
docker run -p 3000:3000 ghcr.io/us/crw:latest
curl -s http://localhost:3000/health
Or use the managed cloud:
export FASTCRW_BASE_URL="https://api.fastcrw.com"
export FASTCRW_API_KEY="fcrw_..."
2. Build the client
Create a scraper package with a shared *http.Client. A default-transport client re-opens connections per request and quietly bottlenecks a high-concurrency scraper:
// scraper/client.go
package scraper
import (
"net/http"
"time"
)
// NewHTTPClient returns an *http.Client tuned for concurrent scraping.
// A single instance should be shared across all workers.
func NewHTTPClient() *http.Client {
return &http.Client{
Timeout: 0, // we use per-request context timeouts instead
Transport: &http.Transport{
MaxIdleConnsPerHost: 32,
IdleConnTimeout: 90 * time.Second,
},
}
}
Quickstart: Scrape a Page
package main
import (
"bytes"
"context"
"encoding/json"
"fmt"
"net/http"
"os"
"time"
)
const (
baseURL = "https://api.fastcrw.com"
timeoutSec = 25 // set above p90 (14157 ms on 2026-05-08 benchmark)
)
type scrapeRequest struct {
URL string `json:"url"`
Formats []string `json:"formats"`
OnlyMainContent bool `json:"onlyMainContent"`
}
type scrapeResponse struct {
Success bool `json:"success"`
Data struct {
Markdown string `json:"markdown"`
Metadata struct {
StatusCode int `json:"statusCode"`
Title string `json:"title"`
} `json:"metadata"`
} `json:"data"`
Error string `json:"error,omitempty"`
}
func scrapeURL(ctx context.Context, client *http.Client, apiKey, url string) (string, error) {
ctx, cancel := context.WithTimeout(ctx, timeoutSec*time.Second)
defer cancel()
body, _ := json.Marshal(scrapeRequest{
URL: url,
Formats: []string{"markdown"},
OnlyMainContent: true,
})
req, err := http.NewRequestWithContext(ctx, http.MethodPost, baseURL+"/v1/scrape", bytes.NewReader(body))
if err != nil {
return "", err
}
req.Header.Set("Authorization", "Bearer "+apiKey)
req.Header.Set("Content-Type", "application/json")
resp, err := client.Do(req)
if err != nil {
return "", err
}
defer resp.Body.Close()
var result scrapeResponse
if err := json.NewDecoder(resp.Body).Decode(&result); err != nil {
return "", err
}
if !result.Success {
return "", fmt.Errorf("scrape failed: %s", result.Error)
}
return result.Data.Markdown, nil
}
func main() {
client := &http.Client{
Transport: &http.Transport{MaxIdleConnsPerHost: 32},
}
apiKey := os.Getenv("FASTCRW_API_KEY")
md, err := scrapeURL(context.Background(), client, apiKey, "https://docs.fastcrw.com")
if err != nil {
panic(err)
}
fmt.Printf("scraped %d chars\n", len(md))
}
Concurrent Batch Scraping with errgroup
errgroup.WithContext + SetLimit(n) is the idiomatic Go bounded fan-out: g.Go() blocks when all slots are occupied, so the limit IS your concurrency cap. The first non-nil error cancels the shared context so all siblings stop wasting work:
package main
import (
"context"
"fmt"
"net/http"
"os"
"sync"
"golang.org/x/sync/errgroup"
)
func batchScrape(ctx context.Context, client *http.Client, apiKey string, urls []string, concurrency int) ([]string, error) {
results := make([]string, len(urls))
var mu sync.Mutex
g, gctx := errgroup.WithContext(ctx)
g.SetLimit(concurrency) // blocks g.Go() until a slot is free
for i, url := range urls {
i, url := i, url // capture loop variables
g.Go(func() error {
md, err := scrapeURL(gctx, client, apiKey, url)
if err != nil {
return fmt.Errorf("url %s: %w", url, err)
}
mu.Lock()
results[i] = md
mu.Unlock()
return nil
})
}
if err := g.Wait(); err != nil {
return nil, err
}
return results, nil
}
func main() {
client := &http.Client{
Transport: &http.Transport{MaxIdleConnsPerHost: 32},
}
apiKey := os.Getenv("FASTCRW_API_KEY")
urls := []string{
"https://docs.fastcrw.com",
"https://fastcrw.com/pricing",
"https://fastcrw.com/alternatives/firecrawl",
}
markdowns, err := batchScrape(context.Background(), client, apiKey, urls, 4)
if err != nil {
panic(err)
}
for i, md := range markdowns {
fmt.Printf("%s: %d chars\n", urls[i], len(md))
}
}
Install errgroup:
go get golang.org/x/sync/errgroup
Latency note: fastCRW's p50 was 1914 ms and p90 14157 ms on the 2026-05-08 benchmark (819 labeled URLs,
diagnose_3way.py). The chrome-stealth fallback that produces the slow tail is also what gives fastCRW the highest truth-recall of three tools tested (63.74%). Size your pool andtimeoutSecfrom the p90, not the median. Full breakdown at /benchmarks/firecrawl-dataset.
Per-Host Rate Limiting
Bounded concurrency limits how many requests overlap; rate limiting limits how many start per second. You need both — a pool of N workers can still hammer a single host at N requests the instant they all finish:
package main
import (
"context"
"net/url"
"sync"
"golang.org/x/time/rate"
)
// HostLimiter is a thread-safe per-host rate limiter.
type HostLimiter struct {
mu sync.RWMutex
limiters map[string]*rate.Limiter
r rate.Limit // requests per second per host
b int // burst size
}
func NewHostLimiter(rps float64, burst int) *HostLimiter {
return &HostLimiter{
limiters: make(map[string]*rate.Limiter),
r: rate.Limit(rps),
b: burst,
}
}
func (h *HostLimiter) Wait(ctx context.Context, rawURL string) error {
u, err := url.Parse(rawURL)
if err != nil {
return err
}
host := u.Hostname()
h.mu.RLock()
limiter, ok := h.limiters[host]
h.mu.RUnlock()
if !ok {
h.mu.Lock()
if limiter, ok = h.limiters[host]; !ok {
limiter = rate.NewLimiter(h.r, h.b)
h.limiters[host] = limiter
}
h.mu.Unlock()
}
return limiter.Wait(ctx) // respects context cancellation
}
Call hostLimiter.Wait(ctx, url) inside each worker before the scrape request. The Wait method respects context cancellation so a shutdown drains gracefully rather than hanging.
Install rate:
go get golang.org/x/time/rate
Crawl a Whole Site
For BFS traversal of an entire domain, use /v1/crawl — it returns a job ID to poll, handles deduplication and politeness, and is bounded by maxDepth (cap 10) and maxPages (cap 1000). Do not reimplement BFS with a goroutine pool and a visited-set when the engine already provides it:
package main
import (
"bytes"
"context"
"encoding/json"
"fmt"
"net/http"
"os"
"time"
)
type crawlStartResponse struct {
ID string `json:"id"`
}
type crawlStatusResponse struct {
Status string `json:"status"`
Data []struct {
Markdown string `json:"markdown"`
Metadata struct {
SourceURL string `json:"sourceURL"`
} `json:"metadata"`
} `json:"data"`
}
func crawlSite(ctx context.Context, client *http.Client, apiKey, seedURL string, maxPages, maxDepth int) (*crawlStatusResponse, error) {
apiKey = os.Getenv("FASTCRW_API_KEY")
base := "https://api.fastcrw.com"
headers := map[string]string{
"Authorization": "Bearer " + apiKey,
"Content-Type": "application/json",
}
startBody, _ := json.Marshal(map[string]interface{}{
"url": seedURL,
"limit": maxPages,
"maxDepth": maxDepth,
"scrapeOptions": map[string]interface{}{
"formats": []string{"markdown"},
"onlyMainContent": true,
},
})
startReq, _ := http.NewRequestWithContext(ctx, http.MethodPost, base+"/v1/crawl", bytes.NewReader(startBody))
for k, v := range headers {
startReq.Header.Set(k, v)
}
startResp, err := client.Do(startReq)
if err != nil {
return nil, err
}
defer startResp.Body.Close()
var start crawlStartResponse
json.NewDecoder(startResp.Body).Decode(&start)
// Poll until complete
for {
pollReq, _ := http.NewRequestWithContext(ctx, http.MethodGet, base+"/v1/crawl/"+start.ID, nil)
for k, v := range headers {
pollReq.Header.Set(k, v)
}
pollResp, err := client.Do(pollReq)
if err != nil {
return nil, err
}
var status crawlStatusResponse
json.NewDecoder(pollResp.Body).Decode(&status)
pollResp.Body.Close()
if status.Status == "completed" {
return &status, nil
}
time.Sleep(2 * time.Second)
}
}
func main() {
client := &http.Client{Transport: &http.Transport{MaxIdleConnsPerHost: 16}}
result, err := crawlSite(context.Background(), client, os.Getenv("FASTCRW_API_KEY"), "https://docs.fastcrw.com", 25, 3)
if err != nil {
panic(err)
}
for _, page := range result.Data {
fmt.Printf("%d chars %s\n", len(page.Markdown), page.Metadata.SourceURL)
}
}
Web Search
package main
import (
"bytes"
"context"
"encoding/json"
"fmt"
"net/http"
"os"
)
func search(ctx context.Context, client *http.Client, query string, limit int) ([]map[string]interface{}, error) {
body, _ := json.Marshal(map[string]interface{}{
"query": query,
"limit": limit,
})
req, _ := http.NewRequestWithContext(ctx, http.MethodPost, "https://api.fastcrw.com/v1/search", bytes.NewReader(body))
req.Header.Set("Authorization", "Bearer "+os.Getenv("FASTCRW_API_KEY"))
req.Header.Set("Content-Type", "application/json")
resp, err := client.Do(req)
if err != nil {
return nil, err
}
defer resp.Body.Close()
var result struct {
Data []map[string]interface{} `json:"data"`
}
json.NewDecoder(resp.Body).Decode(&result)
return result.Data, nil
}
func main() {
client := &http.Client{}
results, err := search(context.Background(), client, "go web scraping api 2026", 5)
if err != nil {
panic(err)
}
for _, r := range results {
fmt.Println(r["title"], "→", r["url"])
}
}
MCP Setup
fastCRW ships an MCP integration (crw-mcp on npm) for AI agents that need live web data from Go-based tools:
{
"mcpServers": {
"fastcrw": {
"command": "npx",
"args": ["-y", "crw-mcp@latest"],
"env": {
"FASTCRW_API_KEY": "fcrw_...",
"FASTCRW_API_URL": "https://api.fastcrw.com"
}
}
}
}
See /integrations/mcp for full configuration options.
Self-Hosting Next to Your Go Service
The engine is a single ~8 MB static Rust binary — no Redis, no Node.js, no multi-container stack. For Go teams that already deploy static binaries, "the scrape backend is just another binary next to ours" is an easy operational fit:
# docker-compose.yml (excerpt)
services:
crw:
image: ghcr.io/us/crw:latest
ports: ["3000:3000"]
go-scraper:
build: .
environment:
FASTCRW_BASE_URL: "http://crw:3000"
FASTCRW_API_KEY: "" # not required for self-hosted
depends_on: [crw]
Self-hosting the AGPL-3.0 engine is $0 per 1,000 scrapes — you pay only your server. See /pricing for managed cloud tiers.
Limits and Honest Gaps
- No official Go SDK — use
net/http+encoding/jsondirectly (shown above). - No screenshot output —
formats: ["screenshot"]returns HTTP 422. - Stateless per request — no persistent session or cookie jar across calls.
- LLM extraction — supports OpenAI and Anthropic providers only.
- No
/v1/batch/scrape— fan outg.Go()calls over/v1/scrapeor use/v1/crawl.
Related
Continue exploring
More from Integrations
Python Web Scraping API — fastCRW [Firecrawl-Compatible]
OpenAI Agents SDK Web Scraping Integration — fastCRW [Firecrawl-Compatible]
TypeScript Web Scraping API — fastCRW [Firecrawl-Compatible]
Type-safe web scraping with TypeScript and fastCRW — a Firecrawl-compatible REST API. Use Zod to derive types from JSON schemas, validate extraction output at the boundary, and catch schema drift at the hour it breaks. AGPL-3.0, self-host free.
Make Web Scraping Integration — fastCRW [Firecrawl-Compatible]
Add fastCRW to Make scenarios with the HTTP module. Firecrawl-compatible scrape and search, small single static binary, local-first, self-host free under AGPL-3.0.
Flowise Web Scraping Integration — fastCRW [Firecrawl-Compatible]
Add fastCRW to Flowise workflows with an HTTP node or custom tool definition. No-code web scraping for LangChain flows, RAG pipelines, and AI agents. Small single static binary, local-first, self-host free under AGPL-3.0.
Related hubs
