Dify Web Scraping Integration — fastCRW [Firecrawl-Compatible]
Integrate fastCRW into Dify workflows via HTTP node or native plugin. Call scrape and search endpoints from Dify LLM apps, knowledge bases, and agents. Small single static binary, local-first, self-host free under AGPL-3.0.
Install the official crw-dify-plugin for native fastCRW scrape, search, crawl, and map tools inside any Dify LLM app or AI agent — or fall back to a single HTTP Request node.
Why Dify + fastCRW
Dify is the no-code platform for building LLM applications, agents, and knowledge bases. It sits between GPT and your data, orchestrating prompts, tools, and retrieval. The friction point is web scraping — Dify's knowledge base can ingest files and text, but pulling live web data requires hand-written Python or external tools. fastCRW fits inside Dify through the standard HTTP Request node or a native Dify plugin, turning any Dify workflow into a live web scraper. The fastCRW runtime is a small single static binary, local-first and easy to run locally if you self-host Dify, and the Firecrawl-compatible API means workflows built on Firecrawl port to fastCRW with one URL change.
Setup: HTTP Request Method
The simplest path is the built-in HTTP Request node. No plugin installation needed.
- Open your Dify workflow or agent.
- Add a Tool node and select HTTP Request.
- Set the method to POST and URL to
https://api.fastcrw.com/v1/scrape. - Create a secret variable for your fastCRW API key: go to Settings → Variables and add a secret named
FASTCRW_API_KEY. - Add a Header with key
Authorizationand valueBearer {{ env.FASTCRW_API_KEY }}(orBearer {{ secret.FASTCRW_API_KEY }}depending on your Dify version). - Set Body Type to JSON and configure the request payload.
You can now test the node by passing a URL and calling fastCRW.
Setup: Native Dify Plugin
For a visual, form-based experience:
- Open your Dify workspace.
- Go to Tools → Plugin Marketplace (or your self-hosted plugin directory).
- Search for fastCRW or navigate to the fastCRW plugin.
- Click Install and authenticate with your fastCRW API key.
- The plugin adds a fastCRW tool to your workflow builder.
- In any workflow, add the fastCRW tool and select your operation: Scrape, Search, Crawl, or Map.
The plugin abstracts away HTTP headers and JSON payloads, surfacing parameters as form fields.
Code Example: HTTP Request Node
Here's a Dify workflow configuration using the HTTP Request node:
{
"method": "POST",
"url": "https://api.fastcrw.com/v1/scrape",
"headers": {
"Authorization": "Bearer {{ env.FASTCRW_API_KEY }}",
"Content-Type": "application/json"
},
"body": {
"url": "{{ workflow_variable.target_url }}",
"formats": ["markdown"],
"timeout": 30
}
}
In the Dify editor, connect this node after a Trigger or Code node that provides target_url. The HTTP node will return:
{
"success": true,
"data": {
"url": "https://example.com",
"markdown": "# Page Title\n\nContent here...",
"metadata": {
"title": "Page Title",
"description": "..."
}
}
}
Pipe the data.markdown output into downstream nodes for text splitting, embeddings, or LLM processing.
Example Workflow: Web Scraping RAG
A typical Dify workflow combining fastCRW, text processing, and knowledge base sync:
- Trigger node: Accept a user input (URL or search query).
- HTTP Request node (fastCRW Scrape): POST to
https://api.fastcrw.com/v1/scrapewith the user's URL. Fetch markdown. - Text Splitter node: Break the markdown into chunks (512–1024 tokens).
- Knowledge Base Write node: Sync the chunks into a Dify knowledge base collection.
- End node: Return success message.
This workflow enables self-serve document ingestion — users submit any URL and Dify automatically scrapes, chunks, and indexes it for RAG.
For a more advanced use case with Dify AI agents:
- Agent Start node: Define fastCRW Scrape and fastCRW Search as available tools.
- LLM node: The agent's reasoning loop.
- When the agent decides it needs web data, it calls the fastCRW tool automatically.
- Agent End node: Return the agent's final response.
The agent can reason over live web data without explicit workflow steps — it orchestrates fastCRW calls on the fly.
Dify Plugin Repository
The recommended path is the official crw-dify-plugin, which provides:
- A Scrape tool that wraps
https://api.fastcrw.com/v1/scrapewith parameter dropdowns for format selection. - A Search tool for web search combined with scrape (query + limit).
- A Crawl tool for spidering (set a page
limit; the crawl runs async and the plugin polls the job for you). - A Map tool for discovering URLs on a domain (sitemap parsing + crawling).
Install crw-dify-plugin once, then use any of these tools in any Dify workflow without manually writing HTTP requests. The HTTP Request node remains a fallback when plugins are disabled.
When to Use This
- No-code RAG ingestion — let non-technical users scrape any URL into a Dify knowledge base.
- Dify agent web browsing — give your Dify AI agent a scraping tool so it can fetch live data during reasoning.
- Knowledge base auto-sync — schedule a Dify workflow to periodically scrape and update a knowledge base.
- Competitive monitoring — build a Dify agent that scrapes competitor pricing pages and summarizes changes.
- fastCRW for aggregation — scrape multiple sources and feed them into a Dify summarization workflow.
Troubleshooting
"Authorization header missing"
Make sure your HTTP Request node includes the Authorization: Bearer fcrw_... header. If using an env variable, verify it's set in Dify's Settings → Variables and the syntax is correct for your Dify version.
"Request timeout"
fastCRW's default timeout is 30 seconds. For large pages, increase the timeout in the HTTP node body: "timeout": 60. Or switch to async crawl mode with polling.
"Knowledge base write failed" Dify's knowledge base expects text chunks with metadata. Make sure the HTTP node output is piped to a Text Splitter before the knowledge base write node.
"Plugin not found in marketplace"
If the fastCRW plugin is not in your Dify marketplace, you're on a self-hosted instance without plugin discovery enabled. Install crw-dify-plugin manually by placing it in your Dify plugins directory, then restart Dify.
"Rate limit exceeded" fastCRW applies rate limits per API key. If you hit a 429 error, slow down your request rate or upgrade to a higher-tier plan for more credits.
When to Choose fastCRW
- Local-first runtime: fastCRW is a small single static binary with no exit cost, and the lightweight HTTP interface fits naturally into Dify workflows.
- Self-hosting: fastCRW's single-binary design runs on a VPS, Raspberry Pi, or inside your Dify container without Redis or PostgreSQL.
- MCP compatibility: If you're also using Claude Code or other MCP-compatible tools, fastCRW provides a unified scraping endpoint.
- Cost: fastCRW's consumption-based pricing and local deployment model mean lower TCO for heavy scraping workloads.
- Firecrawl migration: Dify workflows built on Firecrawl's HTTP API port to fastCRW by changing the domain and adding a Bearer token.
Related
Continue exploring
More from Integrations
Langflow Web Scraping Integration — fastCRW [Firecrawl-Compatible]
Migrate from Tavily to fastCRW — Search API Migration Guide
MCP Web Scraping Integration — fastCRW [Firecrawl-Compatible]
fastCRW ships an official MCP server (crw-mcp) exposing scrape, search, crawl, map, and extract to any MCP-compatible client. Small single static binary, local-first, self-host free under AGPL-3.0.
Google ADK Web Scraping Integration — fastCRW [Firecrawl-Compatible]
Wire fastCRW into Google's Agent Development Kit as a FunctionTool. Firecrawl-compatible scrape and search, small single static binary, local-first, self-host free under AGPL-3.0.
OpenAI Agents SDK Web Scraping Integration — fastCRW [Firecrawl-Compatible]
Give OpenAI Agents SDK agents a fastCRW scrape and search tool with the @function_tool decorator. Small single static binary, local-first, Firecrawl-compatible API, self-host free under AGPL-3.0.
Related hubs
