What We're Building

Automated web scraping workflows in n8n using CRW as the scraping backend. n8n is an open-source workflow automation platform — think Zapier but self-hosted and with full HTTP request support. We'll connect n8n's HTTP Request nodes to CRW's REST API to build: (1) a scheduled scraper that monitors pages for changes, (2) a data extraction pipeline that feeds into Google Sheets, and (3) a see the guide workflow with Slack notifications.

No coding required — just n8n's visual workflow builder and CRW's API endpoints.

Prerequisites

CRW running locally (docker run -p 3000:3000 ghcr.io/us/crw:latest) or a fastCRW API key
n8n running locally (docker run -p 5678:5678 n8nio/n8n) or n8n cloud
Basic familiarity with n8n's visual workflow editor

CRW API Endpoints for n8n

CRW exposes a Firecrawl-compatible REST API. Here are the endpoints you'll use in n8n:

Endpoint	Method	Purpose
`/v1/scrape`	POST	Scrape a single page → markdown
`/v1/crawl`	POST	Start async crawl of a site
`/v1/crawl/{id}`	GET	Check crawl status / get results
`/v1/map`	POST	Discover URLs on a site
`/v1/extract`	POST	Extract structured data

Base URL: http://localhost:3000 (self-hosted) or https://api.fastcrw.com (fastCRW cloud).

Step 1: Create a CRW Credential in n8n

First, set up a reusable credential for CRW's API:

In n8n, go to Credentials → Add Credential → Header Auth
Set Name: CRW API
Set Header Name: Authorization
Set Header Value: Bearer fc-YOUR-API-KEY

This credential will be reused across all CRW nodes in your workflows.

Step 2: Basic Scrape Workflow

The simplest workflow: scrape a page and output the content.

Create a new workflow with these nodes:

Manual Trigger — click to run
HTTP Request — calls CRW's scrape endpoint

Configure the HTTP Request node:

{
  "method": "POST",
  "url": "http://localhost:3000/v1/scrape",
  "authentication": "genericCredentialType",
  "genericAuthType": "httpHeaderAuth",
  "sendHeaders": true,
  "headerParameters": {
    "parameters": [
      { "name": "Content-Type", "value": "application/json" }
    ]
  },
  "sendBody": true,
  "bodyParameters": {
    "parameters": [
      { "name": "url", "value": "https://example.com" },
      { "name": "formats", "value": "=["markdown"]" }
    ]
  }
}

The response will contain data.markdown with the clean page content.

Step 3: Scheduled Scraping Workflow

Monitor a page for changes on a schedule:

Schedule Trigger — runs every hour (or any interval)
HTTP Request — scrape the target page
Code — compare with previous version
IF — branch on whether content changed
Slack / Email — notify on changes

n8n workflow JSON for the scrape + compare pattern:

{
  "nodes": [
    {
      "parameters": {
        "rule": { "interval": [{ "field": "hours", "hoursInterval": 1 }] }
      },
      "name": "Every Hour",
      "type": "n8n-nodes-base.scheduleTrigger",
      "position": [250, 300]
    },
    {
      "parameters": {
        "method": "POST",
        "url": "http://localhost:3000/v1/scrape",
        "authentication": "genericCredentialType",
        "genericAuthType": "httpHeaderAuth",
        "sendBody": true,
        "specifyBody": "json",
        "jsonBody": "{ "url": "https://competitor.com/pricing", "formats": ["markdown"] }"
      },
      "name": "Scrape Page",
      "type": "n8n-nodes-base.httpRequest",
      "position": [450, 300],
      "credentials": { "httpHeaderAuth": { "id": "1", "name": "CRW API" } }
    },
    {
      "parameters": {
        "jsCode": "const currentContent = $input.first().json.data.markdown;\nconst staticData = $getWorkflowStaticData('global');\nconst previousContent = staticData.lastContent || '';\nstaticData.lastContent = currentContent;\nconst changed = currentContent !== previousContent;\nreturn [{ json: { changed, currentContent, previousContent } }];"
      },
      "name": "Compare",
      "type": "n8n-nodes-base.code",
      "position": [650, 300]
    },
    {
      "parameters": {
        "conditions": {
          "boolean": [{ "value1": "={{ $json.changed }}", "value2": true }]
        }
      },
      "name": "Changed?",
      "type": "n8n-nodes-base.if",
      "position": [850, 300]
    }
  ]
}

The Code node uses n8n's static data to persist the last scraped content between runs. When the content changes, the IF node routes to your notification node.

Step 4: Multi-Page Crawl Workflow

Crawl an entire site and process each page:

Manual Trigger
HTTP Request — start crawl via /v1/crawl
Wait — pause for 5 seconds
HTTP Request — check crawl status via /v1/crawl/{id}
IF — is crawl completed?
Split In Batches — process each page

Start the crawl:

// HTTP Request node: Start Crawl
{
  "method": "POST",
  "url": "http://localhost:3000/v1/crawl",
  "jsonBody": {
    "url": "https://docs.example.com",
    "limit": 50,
    "scrapeOptions": { "formats": ["markdown"] }
  }
}
// Returns: { "id": "crawl-abc123" }

Check status in a loop:

// HTTP Request node: Check Status
{
  "method": "GET",
  "url": "=http://localhost:3000/v1/crawl/{{ $json.id }}"
}
// Returns: { "status": "completed", "data": [...pages] }

Connect the IF node's "not completed" output back to the Wait node to create a polling loop. When completed, the data array contains all scraped pages.

Step 5: Data Extraction to Google Sheets

Extract structured data from multiple pages and save to a spreadsheet:

Schedule Trigger — daily at 9 AM
HTTP Request — map the target site
Code — filter URLs to product pages
Split In Batches — process each URL
HTTP Request — scrape each page with the json format
Google Sheets — append extracted data

The extraction request:

// HTTP Request: Extract Data
{
  "method": "POST",
  "url": "http://localhost:3000/v1/scrape",
  "jsonBody": {
    "url": "={{ $json.url }}",
    "formats": ["json"],
    "jsonSchema": {
      "type": "object",
      "properties": {
        "product_name": { "type": "string" },
        "price": { "type": "string" },
        "description": { "type": "string" },
        "in_stock": { "type": "boolean" }
      }
    }
  }
}

CRW returns structured JSON matching your schema — no regex or HTML parsing needed. Pipe the output directly to a Google Sheets Append Row node.

Step 6: Content Aggregation with Slack Alerts

Aggregate content from multiple sites and send a daily digest:

// Workflow: Daily Content Digest
//
// Schedule (9 AM) → Map Site A → Scrape New Pages → Map Site B → Scrape New Pages
//     → Code (combine + format) → Slack (post digest)

// Code node: Format Digest
const pages = $input.all().map(item => item.json);
const digest = pages
  .map(p => `*${p.data.metadata.title}*\n${p.data.metadata.sourceURL}\n${p.data.markdown.substring(0, 200)}...\n`)
  .join("\n---\n");

return [{ json: { digest, pageCount: pages.length } }];

Tips for n8n + CRW Workflows

Use the Wait node for crawl polling. Set it to 3-5 seconds between status checks.
Use Static Data ($getWorkflowStaticData) to persist state between workflow runs — like the last scraped content for change detection.
Batch requests with Split In Batches to avoid overwhelming CRW with concurrent requests. A batch size of 5 works well.
Error handling: add an Error Trigger node and connect it to a Slack/email notification so you know when scraping fails.
Use expressions like ={{ $json.data.markdown }} to reference scraped content in downstream nodes.

Self-Hosted vs fastCRW for n8n

Both n8n and CRW can be self-hosted, making this a fully open-source stack. Run them together with Docker Compose:

# docker-compose.yml
services:
  crw:
    image: ghcr.io/us/crw:latest
    ports:
      - "3000:3000"

  n8n:
    image: n8nio/n8n
    ports:
      - "5678:5678"
    environment:
      - N8N_BASIC_AUTH_ACTIVE=true
      - N8N_BASIC_AUTH_USER=admin
      - N8N_BASIC_AUTH_PASSWORD=changeme
    volumes:
      - n8n_data:/home/node/.n8n

volumes:
  n8n_data:

For production or when scraping diverse external sites, switch to fastCRW:

// Change the URL in your HTTP Request nodes:
// From: http://localhost:3000/v1/scrape
// To:   https://api.fastcrw.com/v1/scrape

fastCRW handles scaling and reliability, which is important for workflows that scrape many different external sites.

Why CRW for n8n Workflows?

REST API fits n8n natively. CRW's Firecrawl-compatible REST API works directly with n8n's HTTP Request node — no custom integrations or community nodes needed. Any endpoint that Firecrawl supports, CRW supports at the same URLs.

Low latency matters for scheduled workflows. When a workflow runs on a schedule and scrapes many pages, a local-first engine keeps each fetch quick so the run finishes well within its window instead of overlapping with the next one.

Lightweight self-hosting. CRW is a single small static binary in a lean Docker image. It runs comfortably alongside n8n on a single small VPS without competing for resources.

Next Steps

Build a RAG pipeline from your scraped data
Use CRW's MCP server for AI agent integration
Compare CRW vs Firecrawl for performance benchmarks

Get Started

Run CRW and n8n together:

docker run -p 3000:3000 ghcr.io/us/crw:latest
docker run -p 5678:5678 n8nio/n8n

Or use fastCRW as the scraping backend and skip the CRW container entirely — just point your n8n HTTP Request nodes at https://api.fastcrw.com.

How to Connect CRW to n8n for Automated Scraping Workflows