Skip to main content
Integrations/Integration / Dify

Dify Web Scraping Integration — fastCRW [Firecrawl-Compatible]

Integrate fastCRW into Dify workflows via HTTP node or native plugin. Call scrape and search endpoints from Dify LLM apps, knowledge bases, and agents. 6.6 MB RAM runtime, 92% coverage on the 1,000-URL benchmark.

Published
May 12, 2026
Updated
May 12, 2026
Category
integrations
Verdict

Add fastCRW to Dify workflows with a single HTTP Request node or install the fastCRW Dify plugin for native integration. Scrape, search, and extract web data inside any Dify LLM app or AI agent.

Works with the standard HTTP Request node — no custom plugin requiredNative fastCRW Dify plugin available for one-click setupPull live web data into Dify knowledge bases and RAG pipelinesPairs with Dify AI agents for autonomous web browsing

Why Dify + fastCRW

Dify is the no-code platform for building LLM applications, agents, and knowledge bases. It sits between GPT and your data, orchestrating prompts, tools, and retrieval. The friction point is web scraping — Dify's knowledge base can ingest files and text, but pulling live web data requires hand-written Python or external tools. fastCRW fits inside Dify through the standard HTTP Request node or a native Dify plugin, turning any Dify workflow into a live web scraper. The 6.6 MB RAM fastCRW runtime is lightweight enough to run locally if you self-host Dify, and the Firecrawl-compatible API means workflows built on Firecrawl port to fastCRW with one URL change.

Setup: HTTP Request Method

The simplest path is the built-in HTTP Request node. No plugin installation needed.

  1. Open your Dify workflow or agent.
  2. Add a Tool node and select HTTP Request.
  3. Set the method to POST and URL to https://fastcrw.com/api/v1/scrape.
  4. Create a secret variable for your fastCRW API key: go to Settings → Variables and add a secret named FASTCRW_API_KEY.
  5. Add a Header with key Authorization and value Bearer {{ env.FASTCRW_API_KEY }} (or Bearer {{ secret.FASTCRW_API_KEY }} depending on your Dify version).
  6. Set Body Type to JSON and configure the request payload.

You can now test the node by passing a URL and calling fastCRW.

Setup: Native Dify Plugin

For a visual, form-based experience:

  1. Open your Dify workspace.
  2. Go to Tools → Plugin Marketplace (or your self-hosted plugin directory).
  3. Search for fastCRW or navigate to the fastCRW plugin.
  4. Click Install and authenticate with your fastCRW API key.
  5. The plugin adds a fastCRW tool to your workflow builder.
  6. In any workflow, add the fastCRW tool and select your operation: Scrape, Search, Crawl, or Map.

The plugin abstracts away HTTP headers and JSON payloads, surfacing parameters as form fields.

Code Example: HTTP Request Node

Here's a Dify workflow configuration using the HTTP Request node:

{
  "method": "POST",
  "url": "https://fastcrw.com/api/v1/scrape",
  "headers": {
    "Authorization": "Bearer {{ env.FASTCRW_API_KEY }}",
    "Content-Type": "application/json"
  },
  "body": {
    "url": "{{ workflow_variable.target_url }}",
    "formats": ["markdown"],
    "timeout": 30
  }
}

In the Dify editor, connect this node after a Trigger or Code node that provides target_url. The HTTP node will return:

{
  "success": true,
  "data": {
    "url": "https://example.com",
    "markdown": "# Page Title\n\nContent here...",
    "metadata": {
      "title": "Page Title",
      "description": "..."
    }
  }
}

Pipe the data.markdown output into downstream nodes for text splitting, embeddings, or LLM processing.

Example Workflow: Web Scraping RAG

A typical Dify workflow combining fastCRW, text processing, and knowledge base sync:

  1. Trigger node: Accept a user input (URL or search query).
  2. HTTP Request node (fastCRW Scrape): POST to https://fastcrw.com/api/v1/scrape with the user's URL. Fetch markdown.
  3. Text Splitter node: Break the markdown into chunks (512–1024 tokens).
  4. Knowledge Base Write node: Sync the chunks into a Dify knowledge base collection.
  5. End node: Return success message.

This workflow enables self-serve document ingestion — users submit any URL and Dify automatically scrapes, chunks, and indexes it for RAG.

For a more advanced use case with Dify AI agents:

  1. Agent Start node: Define fastCRW Scrape and fastCRW Search as available tools.
  2. LLM node: The agent's reasoning loop.
  3. When the agent decides it needs web data, it calls the fastCRW tool automatically.
  4. Agent End node: Return the agent's final response.

The agent can reason over live web data without explicit workflow steps — it orchestrates fastCRW calls on the fly.

Dify Plugin Repository

The fastCRW Dify plugin is maintained in the crw-saas monorepo under /dify-plugin-crw/. It provides:

  • A Scrape tool that wraps fastcrw.com/api/v1/scrape with parameter dropdowns for format selection.
  • A Search tool for web search combined with scrape (query + limit).
  • A Crawl tool for spidering (max depth, max pages).
  • A Map tool for discovering URLs on a domain (sitemap parsing + crawling).

Install the plugin once, then use any of these tools in any Dify workflow without manually writing HTTP requests.

When to Use This

  • No-code RAG ingestion — let non-technical users scrape any URL into a Dify knowledge base.
  • Dify agent web browsing — give your Dify AI agent a scraping tool so it can fetch live data during reasoning.
  • Knowledge base auto-sync — schedule a Dify workflow to periodically scrape and update a knowledge base.
  • Competitive monitoring — build a Dify agent that scrapes competitor pricing pages and summarizes changes.
  • Content aggregation — scrape multiple sources and feed them into a Dify summarization workflow.

Troubleshooting

"Authorization header missing" Make sure your HTTP Request node includes the Authorization: Bearer fcrw_... header. If using an env variable, verify it's set in Dify's Settings → Variables and the syntax is correct for your Dify version.

"Request timeout" fastCRW's default timeout is 30 seconds. For large pages, increase the timeout in the HTTP node body: "timeout": 60. Or switch to async crawl mode with polling.

"Knowledge base write failed" Dify's knowledge base expects text chunks with metadata. Make sure the HTTP node output is piped to a Text Splitter before the knowledge base write node.

"Plugin not found in marketplace" If the fastCRW plugin is not in your Dify marketplace, you're on a self-hosted instance without plugin discovery enabled. Install manually by cloning the plugin from /dify-plugin-crw/ and placing it in your Dify plugins directory, then restart Dify.

"Rate limit exceeded" fastCRW applies rate limits per API key. If you hit a 429 error, slow down your request rate or upgrade to a higher-tier plan for more credits.

When to Choose fastCRW

  • Speed: fastCRW is 5.5x faster than Firecrawl for most pages, and the lightweight HTTP interface fits naturally into Dify workflows.
  • Self-hosting: fastCRW's single-binary design runs on a VPS, Raspberry Pi, or inside your Dify container without Redis or PostgreSQL.
  • MCP compatibility: If you're also using Claude Code or other MCP-compatible tools, fastCRW provides a unified scraping endpoint.
  • Cost: fastCRW's consumption-based pricing and local deployment model mean lower TCO for heavy scraping workloads.
  • Firecrawl migration: Dify workflows built on Firecrawl's HTTP API port to fastCRW by changing the domain and adding a Bearer token.

Continue exploring

More from Integrations

View all integrations

Related hubs

Keep the crawl path moving