Skip to main content
Integrations/Integration / Dify

Dify Web Scraping Integration — fastCRW [Firecrawl-Compatible]

Integrate fastCRW into Dify workflows via HTTP node or native plugin. Call scrape and search endpoints from Dify LLM apps, knowledge bases, and agents. Small single static binary, local-first, self-host free under AGPL-3.0.

Published
May 12, 2026
Updated
June 13, 2026
Category
integrations
Verdict

Install the official crw-dify-plugin for native fastCRW scrape, search, crawl, and map tools inside any Dify LLM app or AI agent — or fall back to a single HTTP Request node.

Official crw-dify-plugin for one-click native setupHTTP Request node available as a fallback when plugins are disabledPull live web data into Dify knowledge bases and RAG pipelinesPairs with Dify AI agents for autonomous web browsing

Why Dify + fastCRW

Dify is the no-code platform for building LLM applications, agents, and knowledge bases. It sits between GPT and your data, orchestrating prompts, tools, and retrieval. The friction point is web scraping — Dify's knowledge base can ingest files and text, but pulling live web data requires hand-written Python or external tools. fastCRW fits inside Dify through the standard HTTP Request node or a native Dify plugin, turning any Dify workflow into a live web scraper. The fastCRW runtime is a small single static binary, local-first and easy to run locally if you self-host Dify, and the Firecrawl-compatible API means workflows built on Firecrawl port to fastCRW with one URL change.

Setup: HTTP Request Method

The simplest path is the built-in HTTP Request node. No plugin installation needed.

  1. Open your Dify workflow or agent.
  2. Add a Tool node and select HTTP Request.
  3. Set the method to POST and URL to https://api.fastcrw.com/v1/scrape.
  4. Create a secret variable for your fastCRW API key: go to Settings → Variables and add a secret named FASTCRW_API_KEY.
  5. Add a Header with key Authorization and value Bearer {{ env.FASTCRW_API_KEY }} (or Bearer {{ secret.FASTCRW_API_KEY }} depending on your Dify version).
  6. Set Body Type to JSON and configure the request payload.

You can now test the node by passing a URL and calling fastCRW.

Setup: Native Dify Plugin

For a visual, form-based experience:

  1. Open your Dify workspace.
  2. Go to Tools → Plugin Marketplace (or your self-hosted plugin directory).
  3. Search for fastCRW or navigate to the fastCRW plugin.
  4. Click Install and authenticate with your fastCRW API key.
  5. The plugin adds a fastCRW tool to your workflow builder.
  6. In any workflow, add the fastCRW tool and select your operation: Scrape, Search, Crawl, or Map.

The plugin abstracts away HTTP headers and JSON payloads, surfacing parameters as form fields.

Code Example: HTTP Request Node

Here's a Dify workflow configuration using the HTTP Request node:

{
  "method": "POST",
  "url": "https://api.fastcrw.com/v1/scrape",
  "headers": {
    "Authorization": "Bearer {{ env.FASTCRW_API_KEY }}",
    "Content-Type": "application/json"
  },
  "body": {
    "url": "{{ workflow_variable.target_url }}",
    "formats": ["markdown"],
    "timeout": 30
  }
}

In the Dify editor, connect this node after a Trigger or Code node that provides target_url. The HTTP node will return:

{
  "success": true,
  "data": {
    "url": "https://example.com",
    "markdown": "# Page Title\n\nContent here...",
    "metadata": {
      "title": "Page Title",
      "description": "..."
    }
  }
}

Pipe the data.markdown output into downstream nodes for text splitting, embeddings, or LLM processing.

Example Workflow: Web Scraping RAG

A typical Dify workflow combining fastCRW, text processing, and knowledge base sync:

  1. Trigger node: Accept a user input (URL or search query).
  2. HTTP Request node (fastCRW Scrape): POST to https://api.fastcrw.com/v1/scrape with the user's URL. Fetch markdown.
  3. Text Splitter node: Break the markdown into chunks (512–1024 tokens).
  4. Knowledge Base Write node: Sync the chunks into a Dify knowledge base collection.
  5. End node: Return success message.

This workflow enables self-serve document ingestion — users submit any URL and Dify automatically scrapes, chunks, and indexes it for RAG.

For a more advanced use case with Dify AI agents:

  1. Agent Start node: Define fastCRW Scrape and fastCRW Search as available tools.
  2. LLM node: The agent's reasoning loop.
  3. When the agent decides it needs web data, it calls the fastCRW tool automatically.
  4. Agent End node: Return the agent's final response.

The agent can reason over live web data without explicit workflow steps — it orchestrates fastCRW calls on the fly.

Dify Plugin Repository

The recommended path is the official crw-dify-plugin, which provides:

  • A Scrape tool that wraps https://api.fastcrw.com/v1/scrape with parameter dropdowns for format selection.
  • A Search tool for web search combined with scrape (query + limit).
  • A Crawl tool for spidering (set a page limit; the crawl runs async and the plugin polls the job for you).
  • A Map tool for discovering URLs on a domain (sitemap parsing + crawling).

Install crw-dify-plugin once, then use any of these tools in any Dify workflow without manually writing HTTP requests. The HTTP Request node remains a fallback when plugins are disabled.

When to Use This

  • No-code RAG ingestion — let non-technical users scrape any URL into a Dify knowledge base.
  • Dify agent web browsing — give your Dify AI agent a scraping tool so it can fetch live data during reasoning.
  • Knowledge base auto-sync — schedule a Dify workflow to periodically scrape and update a knowledge base.
  • Competitive monitoring — build a Dify agent that scrapes competitor pricing pages and summarizes changes.
  • fastCRW for aggregation — scrape multiple sources and feed them into a Dify summarization workflow.

Troubleshooting

"Authorization header missing" Make sure your HTTP Request node includes the Authorization: Bearer fcrw_... header. If using an env variable, verify it's set in Dify's Settings → Variables and the syntax is correct for your Dify version.

"Request timeout" fastCRW's default timeout is 30 seconds. For large pages, increase the timeout in the HTTP node body: "timeout": 60. Or switch to async crawl mode with polling.

"Knowledge base write failed" Dify's knowledge base expects text chunks with metadata. Make sure the HTTP node output is piped to a Text Splitter before the knowledge base write node.

"Plugin not found in marketplace" If the fastCRW plugin is not in your Dify marketplace, you're on a self-hosted instance without plugin discovery enabled. Install crw-dify-plugin manually by placing it in your Dify plugins directory, then restart Dify.

"Rate limit exceeded" fastCRW applies rate limits per API key. If you hit a 429 error, slow down your request rate or upgrade to a higher-tier plan for more credits.

When to Choose fastCRW

  • Local-first runtime: fastCRW is a small single static binary with no exit cost, and the lightweight HTTP interface fits naturally into Dify workflows.
  • Self-hosting: fastCRW's single-binary design runs on a VPS, Raspberry Pi, or inside your Dify container without Redis or PostgreSQL.
  • MCP compatibility: If you're also using Claude Code or other MCP-compatible tools, fastCRW provides a unified scraping endpoint.
  • Cost: fastCRW's consumption-based pricing and local deployment model mean lower TCO for heavy scraping workloads.
  • Firecrawl migration: Dify workflows built on Firecrawl's HTTP API port to fastCRW by changing the domain and adding a Bearer token.

Continue exploring

More from Integrations

View all integrations

Related hubs

Keep the crawl path moving