Dify Web Scraping Integration — fastCRW [Firecrawl-Compatible]
Integrate fastCRW into Dify workflows via HTTP node or native plugin. Call scrape and search endpoints from Dify LLM apps, knowledge bases, and agents. 6.6 MB RAM runtime, 92% coverage on the 1,000-URL benchmark.
Add fastCRW to Dify workflows with a single HTTP Request node or install the fastCRW Dify plugin for native integration. Scrape, search, and extract web data inside any Dify LLM app or AI agent.
Why Dify + fastCRW
Dify is the no-code platform for building LLM applications, agents, and knowledge bases. It sits between GPT and your data, orchestrating prompts, tools, and retrieval. The friction point is web scraping — Dify's knowledge base can ingest files and text, but pulling live web data requires hand-written Python or external tools. fastCRW fits inside Dify through the standard HTTP Request node or a native Dify plugin, turning any Dify workflow into a live web scraper. The 6.6 MB RAM fastCRW runtime is lightweight enough to run locally if you self-host Dify, and the Firecrawl-compatible API means workflows built on Firecrawl port to fastCRW with one URL change.
Setup: HTTP Request Method
The simplest path is the built-in HTTP Request node. No plugin installation needed.
- Open your Dify workflow or agent.
- Add a Tool node and select HTTP Request.
- Set the method to POST and URL to
https://fastcrw.com/api/v1/scrape. - Create a secret variable for your fastCRW API key: go to Settings → Variables and add a secret named
FASTCRW_API_KEY. - Add a Header with key
Authorizationand valueBearer {{ env.FASTCRW_API_KEY }}(orBearer {{ secret.FASTCRW_API_KEY }}depending on your Dify version). - Set Body Type to JSON and configure the request payload.
You can now test the node by passing a URL and calling fastCRW.
Setup: Native Dify Plugin
For a visual, form-based experience:
- Open your Dify workspace.
- Go to Tools → Plugin Marketplace (or your self-hosted plugin directory).
- Search for fastCRW or navigate to the fastCRW plugin.
- Click Install and authenticate with your fastCRW API key.
- The plugin adds a fastCRW tool to your workflow builder.
- In any workflow, add the fastCRW tool and select your operation: Scrape, Search, Crawl, or Map.
The plugin abstracts away HTTP headers and JSON payloads, surfacing parameters as form fields.
Code Example: HTTP Request Node
Here's a Dify workflow configuration using the HTTP Request node:
{
"method": "POST",
"url": "https://fastcrw.com/api/v1/scrape",
"headers": {
"Authorization": "Bearer {{ env.FASTCRW_API_KEY }}",
"Content-Type": "application/json"
},
"body": {
"url": "{{ workflow_variable.target_url }}",
"formats": ["markdown"],
"timeout": 30
}
}
In the Dify editor, connect this node after a Trigger or Code node that provides target_url. The HTTP node will return:
{
"success": true,
"data": {
"url": "https://example.com",
"markdown": "# Page Title\n\nContent here...",
"metadata": {
"title": "Page Title",
"description": "..."
}
}
}
Pipe the data.markdown output into downstream nodes for text splitting, embeddings, or LLM processing.
Example Workflow: Web Scraping RAG
A typical Dify workflow combining fastCRW, text processing, and knowledge base sync:
- Trigger node: Accept a user input (URL or search query).
- HTTP Request node (fastCRW Scrape): POST to
https://fastcrw.com/api/v1/scrapewith the user's URL. Fetch markdown. - Text Splitter node: Break the markdown into chunks (512–1024 tokens).
- Knowledge Base Write node: Sync the chunks into a Dify knowledge base collection.
- End node: Return success message.
This workflow enables self-serve document ingestion — users submit any URL and Dify automatically scrapes, chunks, and indexes it for RAG.
For a more advanced use case with Dify AI agents:
- Agent Start node: Define fastCRW Scrape and fastCRW Search as available tools.
- LLM node: The agent's reasoning loop.
- When the agent decides it needs web data, it calls the fastCRW tool automatically.
- Agent End node: Return the agent's final response.
The agent can reason over live web data without explicit workflow steps — it orchestrates fastCRW calls on the fly.
Dify Plugin Repository
The fastCRW Dify plugin is maintained in the crw-saas monorepo under /dify-plugin-crw/. It provides:
- A Scrape tool that wraps
fastcrw.com/api/v1/scrapewith parameter dropdowns for format selection. - A Search tool for web search combined with scrape (query + limit).
- A Crawl tool for spidering (max depth, max pages).
- A Map tool for discovering URLs on a domain (sitemap parsing + crawling).
Install the plugin once, then use any of these tools in any Dify workflow without manually writing HTTP requests.
When to Use This
- No-code RAG ingestion — let non-technical users scrape any URL into a Dify knowledge base.
- Dify agent web browsing — give your Dify AI agent a scraping tool so it can fetch live data during reasoning.
- Knowledge base auto-sync — schedule a Dify workflow to periodically scrape and update a knowledge base.
- Competitive monitoring — build a Dify agent that scrapes competitor pricing pages and summarizes changes.
- Content aggregation — scrape multiple sources and feed them into a Dify summarization workflow.
Troubleshooting
"Authorization header missing"
Make sure your HTTP Request node includes the Authorization: Bearer fcrw_... header. If using an env variable, verify it's set in Dify's Settings → Variables and the syntax is correct for your Dify version.
"Request timeout"
fastCRW's default timeout is 30 seconds. For large pages, increase the timeout in the HTTP node body: "timeout": 60. Or switch to async crawl mode with polling.
"Knowledge base write failed" Dify's knowledge base expects text chunks with metadata. Make sure the HTTP node output is piped to a Text Splitter before the knowledge base write node.
"Plugin not found in marketplace"
If the fastCRW plugin is not in your Dify marketplace, you're on a self-hosted instance without plugin discovery enabled. Install manually by cloning the plugin from /dify-plugin-crw/ and placing it in your Dify plugins directory, then restart Dify.
"Rate limit exceeded" fastCRW applies rate limits per API key. If you hit a 429 error, slow down your request rate or upgrade to a higher-tier plan for more credits.
When to Choose fastCRW
- Speed: fastCRW is 5.5x faster than Firecrawl for most pages, and the lightweight HTTP interface fits naturally into Dify workflows.
- Self-hosting: fastCRW's single-binary design runs on a VPS, Raspberry Pi, or inside your Dify container without Redis or PostgreSQL.
- MCP compatibility: If you're also using Claude Code or other MCP-compatible tools, fastCRW provides a unified scraping endpoint.
- Cost: fastCRW's consumption-based pricing and local deployment model mean lower TCO for heavy scraping workloads.
- Firecrawl migration: Dify workflows built on Firecrawl's HTTP API port to fastCRW by changing the domain and adding a Bearer token.
Related
Continue exploring
More from Integrations
LlamaIndex Web Scraping Integration — fastCRW [Firecrawl-Compatible]
Cline Web Scraping Integration — fastCRW [Firecrawl-Compatible]
Vercel AI SDK Web Scraping Integration — fastCRW [Firecrawl-Compatible]
Register fastCRW as a tool in Vercel AI SDK so generateText and streamText can scrape live web pages. Drop-in alternative to Firecrawl with 6.6 MB RAM runtime and 833 ms average latency on 1,000-URL benchmark.
Cursor Web Scraping Integration — fastCRW [Firecrawl-Compatible]
Add fastCRW as an MCP server in Cursor IDE. Configure ~/.cursor/mcp.json, then scrape, search, crawl, and extract web pages from within your agent prompts. 6.6 MB RAM runtime.
Flowise Web Scraping Integration — fastCRW [Firecrawl-Compatible]
Add fastCRW to Flowise workflows with an HTTP node or custom tool definition. No-code web scraping for LangChain flows, RAG pipelines, and AI agents. 6.6 MB RAM runtime, 92% coverage on the 1,000-URL benchmark.
Related hubs