Integrations/Integration / Dify

Dify Web Scraping Integration — fastCRW [Firecrawl-Compatible]

Integrate fastCRW into Dify workflows via HTTP node or native plugin. Call scrape and search endpoints from Dify LLM apps, knowledge bases, and agents. 6.6 MB RAM runtime, 92% coverage on the 1,000-URL benchmark.

Published

May 12, 2026

Updated

May 12, 2026

Why Dify + fastCRW

Dify is the no-code platform for building LLM applications, agents, and knowledge bases. It sits between GPT and your data, orchestrating prompts, tools, and retrieval. The friction point is web scraping — Dify's knowledge base can ingest files and text, but pulling live web data requires hand-written Python or external tools. fastCRW fits inside Dify through the standard HTTP Request node or a native Dify plugin, turning any Dify workflow into a live web scraper. The 6.6 MB RAM fastCRW runtime is lightweight enough to run locally if you self-host Dify, and the Firecrawl-compatible API means workflows built on Firecrawl port to fastCRW with one URL change.

Setup: HTTP Request Method

The simplest path is the built-in HTTP Request node. No plugin installation needed.

Open your Dify workflow or agent.
Add a Tool node and select HTTP Request.
Set the method to POST and URL to https://fastcrw.com/api/v1/scrape.
Create a secret variable for your fastCRW API key: go to Settings → Variables and add a secret named FASTCRW_API_KEY.
Add a Header with key Authorization and value Bearer {{ env.FASTCRW_API_KEY }} (or Bearer {{ secret.FASTCRW_API_KEY }} depending on your Dify version).
Set Body Type to JSON and configure the request payload.

You can now test the node by passing a URL and calling fastCRW.

Setup: Native Dify Plugin

For a visual, form-based experience:

Open your Dify workspace.
Go to Tools → Plugin Marketplace (or your self-hosted plugin directory).
Search for fastCRW or navigate to the fastCRW plugin.
Click Install and authenticate with your fastCRW API key.
The plugin adds a fastCRW tool to your workflow builder.
In any workflow, add the fastCRW tool and select your operation: Scrape, Search, Crawl, or Map.

The plugin abstracts away HTTP headers and JSON payloads, surfacing parameters as form fields.

Code Example: HTTP Request Node

Here's a Dify workflow configuration using the HTTP Request node:

{
  "method": "POST",
  "url": "https://fastcrw.com/api/v1/scrape",
  "headers": {
    "Authorization": "Bearer {{ env.FASTCRW_API_KEY }}",
    "Content-Type": "application/json"
  },
  "body": {
    "url": "{{ workflow_variable.target_url }}",
    "formats": ["markdown"],
    "timeout": 30
  }
}

In the Dify editor, connect this node after a Trigger or Code node that provides target_url. The HTTP node will return:

{
  "success": true,
  "data": {
    "url": "https://example.com",
    "markdown": "# Page Title\n\nContent here...",
    "metadata": {
      "title": "Page Title",
      "description": "..."
    }
  }
}

Pipe the data.markdown output into downstream nodes for text splitting, embeddings, or LLM processing.

Example Workflow: Web Scraping RAG

A typical Dify workflow combining fastCRW, text processing, and knowledge base sync:

Trigger node: Accept a user input (URL or search query).
HTTP Request node (fastCRW Scrape): POST to https://fastcrw.com/api/v1/scrape with the user's URL. Fetch markdown.
Text Splitter node: Break the markdown into chunks (512–1024 tokens).
Knowledge Base Write node: Sync the chunks into a Dify knowledge base collection.
End node: Return success message.

This workflow enables self-serve document ingestion — users submit any URL and Dify automatically scrapes, chunks, and indexes it for RAG.

For a more advanced use case with Dify AI agents:

Agent Start node: Define fastCRW Scrape and fastCRW Search as available tools.
LLM node: The agent's reasoning loop.
When the agent decides it needs web data, it calls the fastCRW tool automatically.
Agent End node: Return the agent's final response.

The agent can reason over live web data without explicit workflow steps — it orchestrates fastCRW calls on the fly.

Dify Plugin Repository

The fastCRW Dify plugin is maintained in the crw-saas monorepo under /dify-plugin-crw/. It provides:

A Scrape tool that wraps fastcrw.com/api/v1/scrape with parameter dropdowns for format selection.
A Search tool for web search combined with scrape (query + limit).
A Crawl tool for spidering (max depth, max pages).
A Map tool for discovering URLs on a domain (sitemap parsing + crawling).

Install the plugin once, then use any of these tools in any Dify workflow without manually writing HTTP requests.

When to Use This

No-code RAG ingestion — let non-technical users scrape any URL into a Dify knowledge base.
Dify agent web browsing — give your Dify AI agent a scraping tool so it can fetch live data during reasoning.
Knowledge base auto-sync — schedule a Dify workflow to periodically scrape and update a knowledge base.
Competitive monitoring — build a Dify agent that scrapes competitor pricing pages and summarizes changes.
Content aggregation — scrape multiple sources and feed them into a Dify summarization workflow.

Troubleshooting

"Authorization header missing" Make sure your HTTP Request node includes the Authorization: Bearer fcrw_... header. If using an env variable, verify it's set in Dify's Settings → Variables and the syntax is correct for your Dify version.

"Request timeout" fastCRW's default timeout is 30 seconds. For large pages, increase the timeout in the HTTP node body: "timeout": 60. Or switch to async crawl mode with polling.

"Knowledge base write failed" Dify's knowledge base expects text chunks with metadata. Make sure the HTTP node output is piped to a Text Splitter before the knowledge base write node.

"Plugin not found in marketplace" If the fastCRW plugin is not in your Dify marketplace, you're on a self-hosted instance without plugin discovery enabled. Install manually by cloning the plugin from /dify-plugin-crw/ and placing it in your Dify plugins directory, then restart Dify.

"Rate limit exceeded" fastCRW applies rate limits per API key. If you hit a 429 error, slow down your request rate or upgrade to a higher-tier plan for more credits.

When to Choose fastCRW

Speed: fastCRW is 5.5x faster than Firecrawl for most pages, and the lightweight HTTP interface fits naturally into Dify workflows.
Self-hosting: fastCRW's single-binary design runs on a VPS, Raspberry Pi, or inside your Dify container without Redis or PostgreSQL.
MCP compatibility: If you're also using Claude Code or other MCP-compatible tools, fastCRW provides a unified scraping endpoint.
Cost: fastCRW's consumption-based pricing and local deployment model mean lower TCO for heavy scraping workloads.
Firecrawl migration: Dify workflows built on Firecrawl's HTTP API port to fastCRW by changing the domain and adding a Bearer token.

Sources

Dify HTTP Request node documentation

https://docs.dify.ai/advanced/tools/http-request

Dify Plugin ecosystem

https://docs.dify.ai/plugins/introduction

fastCRW scrape API docs

/docs/scrape

fastCRW search API docs

/docs/search

FAQ

Do I need to install a plugin to use fastCRW in Dify?

No. The built-in HTTP Request node works out of the box. POST to https://fastcrw.com/api/v1/scrape with your Bearer token. Alternatively, install the fastCRW Dify plugin from the plugin marketplace for a visual UI.

Can I use fastCRW to build a RAG knowledge base in Dify?

Yes. Use an HTTP Request node to scrape or search URLs, pipe the Markdown output into a Text Splitter node, then sync into Dify's Knowledge Base for embeddings and retrieval.

Does fastCRW work with Dify AI agents?

Yes. Define fastCRW scrape or search as a Tool in your Dify agent workflow and the agent can call it autonomously during reasoning over web data.

What's the difference between the HTTP node and the fastCRW plugin?

Both call the same fastCRW API. The plugin offers a visual form with dropdown menus for formats and parameters; the HTTP node requires you to write JSON. Use whichever feels more natural.

Can I self-host fastCRW inside Dify?

Yes. Self-hosted fastCRW runs as a standalone service, and you point the HTTP Request node or plugin at your internal URL instead of fastcrw.com.

Does Dify's knowledge base sync work with fastCRW?

Yes. Use an HTTP node to fetch and parse web content, then use Dify's file upload or text sync to populate your knowledge base with live web data.

Recommended next step

Run a live scrape before you commit.

Use the hosted demo to test scrape, crawl, or map output with fastCRW semantics.

Try Playground

Continue exploring

More from Integrations

View all integrations

Previous in Integrations

LlamaIndex Web Scraping Integration — fastCRW [Firecrawl-Compatible]

Next in Integrations

Cline Web Scraping Integration — fastCRW [Firecrawl-Compatible]

Integrations

Vercel AI SDK Web Scraping Integration — fastCRW [Firecrawl-Compatible]

Register fastCRW as a tool in Vercel AI SDK so generateText and streamText can scrape live web pages. Drop-in alternative to Firecrawl with 6.6 MB RAM runtime and 833 ms average latency on 1,000-URL benchmark.

vercel ai sdk web scrapingRegister fastCRW scrape/crawl/search as native Vercel AI SDK tools via tool() helper

Integrations

Cursor Web Scraping Integration — fastCRW [Firecrawl-Compatible]

Add fastCRW as an MCP server in Cursor IDE. Configure ~/.cursor/mcp.json, then scrape, search, crawl, and extract web pages from within your agent prompts. 6.6 MB RAM runtime.

cursor web scrapingRegister fastCRW MCP server in ~/.cursor/mcp.json

Integrations

Flowise Web Scraping Integration — fastCRW [Firecrawl-Compatible]

Add fastCRW to Flowise workflows with an HTTP node or custom tool definition. No-code web scraping for LangChain flows, RAG pipelines, and AI agents. 6.6 MB RAM runtime, 92% coverage on the 1,000-URL benchmark.

flowise web scrapingDrop fastCRW into any Flowise flow with the built-in HTTP node

Related hubs

Keep the crawl path moving

Docs

Drop into endpoint reference once your integration is wired up.

Use Cases

See where this integration shape fits common AI-agent workloads.

Alternatives

Compare fastCRW against other scraping APIs your stack might consider.