What We're Building
A CrewAI crew with two specialized agents: a Researcher that scrapes websites using CRW, and an Analyst that processes and summarizes the scraped data. CrewAI handles the multi-agent orchestration — assigning tasks, passing context between agents, and managing the workflow — while CRW provides the fast scraping backend.
This pattern is useful for competitive intelligence, market research, content aggregation, and any workflow where scraping and analysis are distinct steps that benefit from specialization.
Prerequisites
- CRW running locally (
docker run -p 3000:3000 ghcr.io/us/crw:latest) or a fastCRW API key - Python 3.11+
- An LLM API key (OpenAI, Anthropic, or use Ollama for free local inference)
pip install crewai crewai-crw
That's it — two packages. crewai-crw is the CRW tool package for CrewAI, published on PyPI. No SDK wrappers, no firecrawl-py — just direct HTTP to your CRW instance.
How CrewAI Works
CrewAI organizes work into three concepts:
- Agents — autonomous units with a role, goal, and backstory that guide their behavior
- Tasks — specific assignments given to agents, with expected outputs
- Crew — the team that coordinates agents and tasks in a defined process (sequential or hierarchical)
Each agent can use tools — and that's where CRW comes in. The crewai-crw package provides three ready-to-use tools.
Step 1: Import CRW Tools
The crewai-crw package gives you three tools out of the box:
from crewai_crw import CrwScrapeWebsiteTool, CrwCrawlWebsiteTool, CrwMapWebsiteTool
# Self-hosted (default: localhost:3000)
scrape_tool = CrwScrapeWebsiteTool()
crawl_tool = CrwCrawlWebsiteTool()
map_tool = CrwMapWebsiteTool()
# Or use fastCRW cloud
scrape_tool = CrwScrapeWebsiteTool(
api_url="https://fastcrw.com/api",
api_key="your-api-key",
)
# Or set env vars and skip constructor args entirely
# export CRW_API_URL=https://fastcrw.com/api
# export CRW_API_KEY=your-api-key
scrape_tool = CrwScrapeWebsiteTool() # picks up from env
No custom tool classes to write. No BaseTool subclassing. No firecrawl-py dependency. The tools handle API calls, error handling, and response parsing internally.
Step 2: Define the Agents
Create two specialized agents with distinct roles:
from crewai import Agent
# Agent 1: The Researcher
researcher = Agent(
role="Web Research Specialist",
goal="Discover and scrape relevant web pages to gather comprehensive information on the given topic",
backstory="""You are an expert web researcher. You systematically discover
pages on target websites, identify the most relevant content, and extract
clean text for analysis. You prioritize thoroughness and accuracy.""",
tools=[scrape_tool, map_tool],
verbose=True,
max_iter=10,
)
# Agent 2: The Analyst
analyst = Agent(
role="Data Analyst",
goal="Analyze scraped web content and produce structured, actionable summaries",
backstory="""You are a senior data analyst who excels at finding patterns,
extracting key insights, and presenting information in clear, structured
formats. You work with raw scraped content and turn it into valuable reports.""",
verbose=True,
)
Step 3: Define the Tasks
Tasks specify what each agent should do and what output to produce:
from crewai import Task
# Task 1: Research
research_task = Task(
description="""Research the website {target_url}.
Steps:
1. Use the CRW website map tool to find all pages on the site
2. Identify the 5-10 most relevant pages based on their URLs
3. Scrape each relevant page using the CRW web scrape tool
4. Compile all scraped content with source URLs
Focus on pages that contain product information, pricing,
documentation, or key features.""",
expected_output="A comprehensive collection of scraped content from all relevant pages, with source URLs for each piece of content.",
agent=researcher,
)
# Task 2: Analysis
analysis_task = Task(
description="""Analyze the research data provided by the researcher.
Produce a structured report with:
1. Executive Summary (2-3 sentences)
2. Key Findings (bullet points)
3. Product/Service Overview
4. Pricing Information (if found)
5. Competitive Advantages
6. Potential Gaps or Concerns
Base your analysis ONLY on the scraped content — do not invent information.""",
expected_output="A structured analysis report with sections for summary, key findings, pricing, and competitive analysis.",
agent=analyst,
context=[research_task],
)
Step 4: Assemble and Run the Crew
from crewai import Crew, Process
crew = Crew(
agents=[researcher, analyst],
tasks=[research_task, analysis_task],
process=Process.sequential,
verbose=True,
)
result = crew.kickoff(inputs={"target_url": "https://docs.example.com"})
print(result)
The crew runs sequentially: the researcher discovers and scrapes pages, then the analyst receives all scraped content and produces the final report.
Step 5: Hierarchical Process for Complex Workflows
For more complex scraping workflows, use a hierarchical process where a manager agent delegates work:
from crewai import Agent, Crew, Process
manager = Agent(
role="Research Manager",
goal="Coordinate the research and analysis team to produce comprehensive reports",
backstory="You manage a team of researchers and analysts, delegating tasks and ensuring quality.",
allow_delegation=True,
)
hierarchical_crew = Crew(
agents=[researcher, analyst],
tasks=[research_task, analysis_task],
process=Process.hierarchical,
manager_agent=manager,
verbose=True,
)
result = hierarchical_crew.kickoff(inputs={"target_url": "https://docs.example.com"})
Real-World Example: Competitive Analysis Crew
Here's a complete, copy-paste example that compares multiple competitor websites:
from crewai import Agent, Task, Crew, Process
from crewai_crw import CrwScrapeWebsiteTool, CrwMapWebsiteTool
# Tools — one line each, no custom classes needed
scrape_tool = CrwScrapeWebsiteTool()
map_tool = CrwMapWebsiteTool()
# Specialized agents
scraper_agent = Agent(
role="Competitive Intelligence Researcher",
goal="Scrape competitor websites to gather product, pricing, and feature data",
backstory="Expert at navigating competitor sites and extracting key business information.",
tools=[scrape_tool, map_tool],
max_iter=15,
)
comparison_agent = Agent(
role="Competitive Analyst",
goal="Compare competitors across key dimensions and identify advantages/gaps",
backstory="Seasoned analyst who turns raw competitor data into strategic insights.",
)
# Tasks
scrape_competitors = Task(
description="""Scrape the following competitor websites:
{competitors}
For each competitor, gather:
- Product features and capabilities
- Pricing information
- Key messaging and positioning
- Any technical specifications""",
expected_output="Organized scraped content from each competitor website with labeled sections.",
agent=scraper_agent,
)
compare_competitors = Task(
description="""Create a competitive comparison matrix from the scraped data.
Format as a table comparing:
| Feature | Competitor A | Competitor B | Competitor C |
Include sections for: Features, Pricing, Strengths, Weaknesses""",
expected_output="A structured competitive comparison matrix with clear winner/loser annotations per feature.",
agent=comparison_agent,
context=[scrape_competitors],
)
crew = Crew(
agents=[scraper_agent, comparison_agent],
tasks=[scrape_competitors, compare_competitors],
process=Process.sequential,
)
result = crew.kickoff(inputs={
"competitors": "https://competitor1.com, https://competitor2.com, https://competitor3.com"
})
print(result)
Configuration Options
Each tool accepts configuration to fine-tune behavior:
# Scrape with custom settings
scrape_tool = CrwScrapeWebsiteTool(
config={
"formats": ["markdown", "links"],
"onlyMainContent": True,
"renderJs": True, # force JS rendering
"waitFor": 2000, # wait 2s after JS load
}
)
# Crawl with depth and page limits
crawl_tool = CrwCrawlWebsiteTool(
config={
"maxDepth": 3,
"maxPages": 50,
"formats": ["markdown"],
"onlyMainContent": True,
}
)
# Map with sitemap discovery
map_tool = CrwMapWebsiteTool(
config={
"maxDepth": 2,
"useSitemap": True,
}
)
Using fastCRW Cloud
Switch to the managed fastCRW cloud service — no self-hosting required:
# Option 1: Constructor args
scrape_tool = CrwScrapeWebsiteTool(
api_url="https://fastcrw.com/api",
api_key="your-api-key",
)
# Option 2: Environment variables (recommended)
# export CRW_API_URL=https://fastcrw.com/api
# export CRW_API_KEY=your-api-key
scrape_tool = CrwScrapeWebsiteTool() # auto-picks from env
All tools and agents work identically. fastCRW is particularly useful for CrewAI crews that scrape many different sites — the managed infrastructure handles scaling and reliability for you.
Why CRW for CrewAI?
Zero boilerplate. Install crewai-crw, import the tools, and use them. No custom BaseTool subclasses, no SDK wrappers, no firecrawl-py dependency.
Fast tool responses keep agents on track. CrewAI agents have iteration limits. If each scrape call takes 4+ seconds, your agent burns through iterations waiting for responses. CRW's 833ms average means more useful iterations within the same budget.
Clean output reduces agent confusion. When an agent receives raw HTML or noisy content, it wastes tokens parsing irrelevant content and often makes mistakes. CRW returns clean markdown that agents can reason about directly.
Self-hosted or cloud — your choice. Run CRW locally for free during development, switch to fastCRW for production. Same tools, same code, different URL.
Next Steps
- crewai-crw on PyPI — full documentation and source code
- GitHub: us/crewai-crw — report issues, contribute
- Build a RAG pipeline to make your scraped data searchable
- Use CRW's MCP server for direct agent tool integration
- Compare CRW vs Firecrawl performance and features
Get Started
pip install crewai crewai-crw
Run CRW locally in one command:
docker run -p 3000:3000 ghcr.io/us/crw:latest
Or sign up for fastCRW to skip infrastructure setup and start building your CrewAI crew today.