Skip to main content
Integrations/Integration / Go

Go Web Scraping API — fastCRW [Firecrawl-Compatible]

Scrape, crawl, and search from Go with fastCRW — a Firecrawl-compatible REST API backed by a single Rust binary. Worker pools, errgroup, rate limiting, per-request context timeouts. AGPL-3.0, self-host free.

Published
June 13, 2026
Updated
June 13, 2026
Category
integrations
Verdict

Call fastCRW from Go with net/http and encoding/json — no SDK needed. Compose a bounded errgroup pool, per-host rate limiting, and per-request context timeouts around /v1/scrape calls. The Rust engine handles rendering and extraction; your Go service handles concurrency.

Firecrawl-compatible REST — net/http + encoding/json, no external SDK requirederrgroup.SetLimit(n) for bounded fan-out with automatic error propagation and context cancellationPer-host rate.Limiter (golang.org/x/time/rate) for polite per-host throttling63.74% truth-recall on Firecrawl''s public 1,000-URL dataset (diagnose_3way.py, 2026-05-08) — highest of three tools testedSingle ~8 MB Rust binary — deploy it alongside your Go service as just another binary

Verdict

fastCRW is Firecrawl-compatible RESTPOST /v1/scrape, get back clean Markdown. Go has no official SDK, and it does not need one: net/http plus encoding/json is the complete client. What makes Go a strong fit is the concurrency story — compose errgroup.SetLimit(n), a per-host rate.Limiter, and per-request context.WithTimeout around /v1/scrape calls, and the engine handles rendering and extraction while your Go service handles concurrency. Under the hood: a single ~8 MB Rust binary that deploys alongside your Go service as just another binary.

Who This Is For

  • Go developers who want clean Markdown from the web — without parsing raw HTML or running a headless browser.
  • Teams building high-throughput scrapers — you want bounded concurrency with proper error propagation and rate limiting.
  • Backend services that need web enrichment — a Go API that calls fastCRW to enrich records on demand.
  • Self-hosting shops — a static Rust binary is a natural fit alongside Go services; AGPL-3.0, $0 per 1,000 scrapes.

Setup

1. Start the engine

Local (Docker):

docker run -p 3000:3000 ghcr.io/us/crw:latest
curl -s http://localhost:3000/health

Or use the managed cloud:

export FASTCRW_BASE_URL="https://api.fastcrw.com"
export FASTCRW_API_KEY="fcrw_..."

2. Build the client

Create a scraper package with a shared *http.Client. A default-transport client re-opens connections per request and quietly bottlenecks a high-concurrency scraper:

// scraper/client.go
package scraper

import (
    "net/http"
    "time"
)

// NewHTTPClient returns an *http.Client tuned for concurrent scraping.
// A single instance should be shared across all workers.
func NewHTTPClient() *http.Client {
    return &http.Client{
        Timeout: 0, // we use per-request context timeouts instead
        Transport: &http.Transport{
            MaxIdleConnsPerHost: 32,
            IdleConnTimeout:     90 * time.Second,
        },
    }
}

Quickstart: Scrape a Page

package main

import (
    "bytes"
    "context"
    "encoding/json"
    "fmt"
    "net/http"
    "os"
    "time"
)

const (
    baseURL    = "https://api.fastcrw.com"
    timeoutSec = 25 // set above p90 (14157 ms on 2026-05-08 benchmark)
)

type scrapeRequest struct {
    URL             string   `json:"url"`
    Formats         []string `json:"formats"`
    OnlyMainContent bool     `json:"onlyMainContent"`
}

type scrapeResponse struct {
    Success bool `json:"success"`
    Data    struct {
        Markdown string `json:"markdown"`
        Metadata struct {
            StatusCode int    `json:"statusCode"`
            Title      string `json:"title"`
        } `json:"metadata"`
    } `json:"data"`
    Error string `json:"error,omitempty"`
}

func scrapeURL(ctx context.Context, client *http.Client, apiKey, url string) (string, error) {
    ctx, cancel := context.WithTimeout(ctx, timeoutSec*time.Second)
    defer cancel()

    body, _ := json.Marshal(scrapeRequest{
        URL:             url,
        Formats:         []string{"markdown"},
        OnlyMainContent: true,
    })

    req, err := http.NewRequestWithContext(ctx, http.MethodPost, baseURL+"/v1/scrape", bytes.NewReader(body))
    if err != nil {
        return "", err
    }
    req.Header.Set("Authorization", "Bearer "+apiKey)
    req.Header.Set("Content-Type", "application/json")

    resp, err := client.Do(req)
    if err != nil {
        return "", err
    }
    defer resp.Body.Close()

    var result scrapeResponse
    if err := json.NewDecoder(resp.Body).Decode(&result); err != nil {
        return "", err
    }
    if !result.Success {
        return "", fmt.Errorf("scrape failed: %s", result.Error)
    }
    return result.Data.Markdown, nil
}

func main() {
    client := &http.Client{
        Transport: &http.Transport{MaxIdleConnsPerHost: 32},
    }
    apiKey := os.Getenv("FASTCRW_API_KEY")

    md, err := scrapeURL(context.Background(), client, apiKey, "https://docs.fastcrw.com")
    if err != nil {
        panic(err)
    }
    fmt.Printf("scraped %d chars\n", len(md))
}

Concurrent Batch Scraping with errgroup

errgroup.WithContext + SetLimit(n) is the idiomatic Go bounded fan-out: g.Go() blocks when all slots are occupied, so the limit IS your concurrency cap. The first non-nil error cancels the shared context so all siblings stop wasting work:

package main

import (
    "context"
    "fmt"
    "net/http"
    "os"
    "sync"

    "golang.org/x/sync/errgroup"
)

func batchScrape(ctx context.Context, client *http.Client, apiKey string, urls []string, concurrency int) ([]string, error) {
    results := make([]string, len(urls))
    var mu sync.Mutex

    g, gctx := errgroup.WithContext(ctx)
    g.SetLimit(concurrency) // blocks g.Go() until a slot is free

    for i, url := range urls {
        i, url := i, url // capture loop variables
        g.Go(func() error {
            md, err := scrapeURL(gctx, client, apiKey, url)
            if err != nil {
                return fmt.Errorf("url %s: %w", url, err)
            }
            mu.Lock()
            results[i] = md
            mu.Unlock()
            return nil
        })
    }

    if err := g.Wait(); err != nil {
        return nil, err
    }
    return results, nil
}

func main() {
    client := &http.Client{
        Transport: &http.Transport{MaxIdleConnsPerHost: 32},
    }
    apiKey := os.Getenv("FASTCRW_API_KEY")

    urls := []string{
        "https://docs.fastcrw.com",
        "https://fastcrw.com/pricing",
        "https://fastcrw.com/alternatives/firecrawl",
    }

    markdowns, err := batchScrape(context.Background(), client, apiKey, urls, 4)
    if err != nil {
        panic(err)
    }
    for i, md := range markdowns {
        fmt.Printf("%s: %d chars\n", urls[i], len(md))
    }
}

Install errgroup: go get golang.org/x/sync/errgroup

Latency note: fastCRW's p50 was 1914 ms and p90 14157 ms on the 2026-05-08 benchmark (819 labeled URLs, diagnose_3way.py). The chrome-stealth fallback that produces the slow tail is also what gives fastCRW the highest truth-recall of three tools tested (63.74%). Size your pool and timeoutSec from the p90, not the median. Full breakdown at /benchmarks/firecrawl-dataset.

Per-Host Rate Limiting

Bounded concurrency limits how many requests overlap; rate limiting limits how many start per second. You need both — a pool of N workers can still hammer a single host at N requests the instant they all finish:

package main

import (
    "context"
    "net/url"
    "sync"

    "golang.org/x/time/rate"
)

// HostLimiter is a thread-safe per-host rate limiter.
type HostLimiter struct {
    mu       sync.RWMutex
    limiters map[string]*rate.Limiter
    r        rate.Limit // requests per second per host
    b        int        // burst size
}

func NewHostLimiter(rps float64, burst int) *HostLimiter {
    return &HostLimiter{
        limiters: make(map[string]*rate.Limiter),
        r:        rate.Limit(rps),
        b:        burst,
    }
}

func (h *HostLimiter) Wait(ctx context.Context, rawURL string) error {
    u, err := url.Parse(rawURL)
    if err != nil {
        return err
    }
    host := u.Hostname()

    h.mu.RLock()
    limiter, ok := h.limiters[host]
    h.mu.RUnlock()

    if !ok {
        h.mu.Lock()
        if limiter, ok = h.limiters[host]; !ok {
            limiter = rate.NewLimiter(h.r, h.b)
            h.limiters[host] = limiter
        }
        h.mu.Unlock()
    }

    return limiter.Wait(ctx) // respects context cancellation
}

Call hostLimiter.Wait(ctx, url) inside each worker before the scrape request. The Wait method respects context cancellation so a shutdown drains gracefully rather than hanging.

Install rate: go get golang.org/x/time/rate

Crawl a Whole Site

For BFS traversal of an entire domain, use /v1/crawl — it returns a job ID to poll, handles deduplication and politeness, and is bounded by maxDepth (cap 10) and maxPages (cap 1000). Do not reimplement BFS with a goroutine pool and a visited-set when the engine already provides it:

package main

import (
    "bytes"
    "context"
    "encoding/json"
    "fmt"
    "net/http"
    "os"
    "time"
)

type crawlStartResponse struct {
    ID string `json:"id"`
}

type crawlStatusResponse struct {
    Status string `json:"status"`
    Data   []struct {
        Markdown string `json:"markdown"`
        Metadata struct {
            SourceURL string `json:"sourceURL"`
        } `json:"metadata"`
    } `json:"data"`
}

func crawlSite(ctx context.Context, client *http.Client, apiKey, seedURL string, maxPages, maxDepth int) (*crawlStatusResponse, error) {
    apiKey = os.Getenv("FASTCRW_API_KEY")
    base := "https://api.fastcrw.com"
    headers := map[string]string{
        "Authorization": "Bearer " + apiKey,
        "Content-Type":  "application/json",
    }

    startBody, _ := json.Marshal(map[string]interface{}{
        "url":      seedURL,
        "limit":    maxPages,
        "maxDepth": maxDepth,
        "scrapeOptions": map[string]interface{}{
            "formats":         []string{"markdown"},
            "onlyMainContent": true,
        },
    })

    startReq, _ := http.NewRequestWithContext(ctx, http.MethodPost, base+"/v1/crawl", bytes.NewReader(startBody))
    for k, v := range headers {
        startReq.Header.Set(k, v)
    }

    startResp, err := client.Do(startReq)
    if err != nil {
        return nil, err
    }
    defer startResp.Body.Close()

    var start crawlStartResponse
    json.NewDecoder(startResp.Body).Decode(&start)

    // Poll until complete
    for {
        pollReq, _ := http.NewRequestWithContext(ctx, http.MethodGet, base+"/v1/crawl/"+start.ID, nil)
        for k, v := range headers {
            pollReq.Header.Set(k, v)
        }

        pollResp, err := client.Do(pollReq)
        if err != nil {
            return nil, err
        }

        var status crawlStatusResponse
        json.NewDecoder(pollResp.Body).Decode(&status)
        pollResp.Body.Close()

        if status.Status == "completed" {
            return &status, nil
        }
        time.Sleep(2 * time.Second)
    }
}

func main() {
    client := &http.Client{Transport: &http.Transport{MaxIdleConnsPerHost: 16}}
    result, err := crawlSite(context.Background(), client, os.Getenv("FASTCRW_API_KEY"), "https://docs.fastcrw.com", 25, 3)
    if err != nil {
        panic(err)
    }
    for _, page := range result.Data {
        fmt.Printf("%d chars  %s\n", len(page.Markdown), page.Metadata.SourceURL)
    }
}
package main

import (
    "bytes"
    "context"
    "encoding/json"
    "fmt"
    "net/http"
    "os"
)

func search(ctx context.Context, client *http.Client, query string, limit int) ([]map[string]interface{}, error) {
    body, _ := json.Marshal(map[string]interface{}{
        "query": query,
        "limit": limit,
    })

    req, _ := http.NewRequestWithContext(ctx, http.MethodPost, "https://api.fastcrw.com/v1/search", bytes.NewReader(body))
    req.Header.Set("Authorization", "Bearer "+os.Getenv("FASTCRW_API_KEY"))
    req.Header.Set("Content-Type", "application/json")

    resp, err := client.Do(req)
    if err != nil {
        return nil, err
    }
    defer resp.Body.Close()

    var result struct {
        Data []map[string]interface{} `json:"data"`
    }
    json.NewDecoder(resp.Body).Decode(&result)
    return result.Data, nil
}

func main() {
    client := &http.Client{}
    results, err := search(context.Background(), client, "go web scraping api 2026", 5)
    if err != nil {
        panic(err)
    }
    for _, r := range results {
        fmt.Println(r["title"], "→", r["url"])
    }
}

MCP Setup

fastCRW ships an MCP integration (crw-mcp on npm) for AI agents that need live web data from Go-based tools:

{
  "mcpServers": {
    "fastcrw": {
      "command": "npx",
      "args": ["-y", "crw-mcp@latest"],
      "env": {
        "FASTCRW_API_KEY": "fcrw_...",
        "FASTCRW_API_URL": "https://api.fastcrw.com"
      }
    }
  }
}

See /integrations/mcp for full configuration options.

Self-Hosting Next to Your Go Service

The engine is a single ~8 MB static Rust binary — no Redis, no Node.js, no multi-container stack. For Go teams that already deploy static binaries, "the scrape backend is just another binary next to ours" is an easy operational fit:

# docker-compose.yml (excerpt)
services:
  crw:
    image: ghcr.io/us/crw:latest
    ports: ["3000:3000"]

  go-scraper:
    build: .
    environment:
      FASTCRW_BASE_URL: "http://crw:3000"
      FASTCRW_API_KEY: ""  # not required for self-hosted
    depends_on: [crw]

Self-hosting the AGPL-3.0 engine is $0 per 1,000 scrapes — you pay only your server. See /pricing for managed cloud tiers.

Limits and Honest Gaps

  • No official Go SDK — use net/http + encoding/json directly (shown above).
  • No screenshot outputformats: ["screenshot"] returns HTTP 422.
  • Stateless per request — no persistent session or cookie jar across calls.
  • LLM extraction — supports OpenAI and Anthropic providers only.
  • No /v1/batch/scrape — fan out g.Go() calls over /v1/scrape or use /v1/crawl.

Continue exploring

More from Integrations

View all integrations

Related hubs

Keep the crawl path moving