What We're Building
A CrewAI crew with two specialized agents: a Researcher that scrapes websites using CRW, and an Analyst that processes and summarizes the scraped data. CrewAI handles the multi-agent orchestration — assigning tasks, passing context between agents, and managing the workflow — while CRW provides the fast scraping backend.
This pattern is useful for competitive intelligence, market research, content aggregation, and any workflow where scraping and analysis are distinct steps that benefit from specialization.
Prerequisites
- CRW running locally (
docker run -p 3000:3000 ghcr.io/us/crw:latest) or a fastCRW API key - Python 3.11+
- An OpenAI API key
pip install crewai crewai-tools firecrawl-py
How CrewAI Works
CrewAI organizes work into three concepts:
- Agents — autonomous units with a role, goal, and backstory that guide their behavior
- Tasks — specific assignments given to agents, with expected outputs
- Crew — the team that coordinates agents and tasks in a defined process (sequential or hierarchical)
Each agent can use tools — and that's where CRW comes in. We'll create custom CRW tools that agents use to discover, scrape, and extract data from websites.
Step 1: Create CRW Tools for CrewAI
CrewAI tools are Python classes that extend BaseTool. Let's create tools backed by CRW:
from crewai.tools import BaseTool
from firecrawl import FirecrawlApp
from pydantic import BaseModel, Field
# Initialize CRW client
crw = FirecrawlApp(
api_key="fc-YOUR-KEY",
api_url="http://localhost:3000" # or "https://fastcrw.com/api"
)
class ScrapeInput(BaseModel):
url: str = Field(description="The URL to scrape")
class ScrapeTool(BaseTool):
name: str = "Scrape Website"
description: str = "Scrape a single web page and return clean markdown content. Use this to get the full text content of any URL."
args_schema: type[BaseModel] = ScrapeInput
def _run(self, url: str) -> str:
result = crw.scrape_url(url, params={"formats": ["markdown"]})
title = result.get("metadata", {}).get("title", "No title")
markdown = result.get("markdown", "")
return f"# {title}
Source: {url}
{markdown}"
class MapInput(BaseModel):
url: str = Field(description="The base URL to discover pages from")
class MapTool(BaseTool):
name: str = "Discover URLs"
description: str = "Discover all URLs on a website. Returns a list of page URLs found on the site. Use this before scraping to find relevant pages."
args_schema: type[BaseModel] = MapInput
def _run(self, url: str) -> str:
result = crw.map_url(url)
links = result.get("links", [])
return f"Found {len(links)} URLs:
" + "
".join(links[:30])
Step 2: Define the Agents
Create two specialized agents with distinct roles:
from crewai import Agent
# Agent 1: The Researcher
researcher = Agent(
role="Web Research Specialist",
goal="Discover and scrape relevant web pages to gather comprehensive information on the given topic",
backstory="""You are an expert web researcher. You systematically discover
pages on target websites, identify the most relevant content, and extract
clean text for analysis. You prioritize thoroughness and accuracy.""",
tools=[ScrapeTool(), MapTool()],
verbose=True,
max_iter=10,
)
# Agent 2: The Analyst
analyst = Agent(
role="Data Analyst",
goal="Analyze scraped web content and produce structured, actionable summaries",
backstory="""You are a senior data analyst who excels at finding patterns,
extracting key insights, and presenting information in clear, structured
formats. You work with raw scraped content and turn it into valuable reports.""",
verbose=True,
)
Step 3: Define the Tasks
Tasks specify what each agent should do and what output to produce:
from crewai import Task
# Task 1: Research
research_task = Task(
description="""Research the website {target_url}.
Steps:
1. Use the Discover URLs tool to find all pages on the site
2. Identify the 5-10 most relevant pages based on their URLs
3. Scrape each relevant page using the Scrape Website tool
4. Compile all scraped content with source URLs
Focus on pages that contain product information, pricing,
documentation, or key features.""",
expected_output="A comprehensive collection of scraped content from all relevant pages, with source URLs for each piece of content.",
agent=researcher,
)
# Task 2: Analysis
analysis_task = Task(
description="""Analyze the research data provided by the researcher.
Produce a structured report with:
1. Executive Summary (2-3 sentences)
2. Key Findings (bullet points)
3. Product/Service Overview
4. Pricing Information (if found)
5. Competitive Advantages
6. Potential Gaps or Concerns
Base your analysis ONLY on the scraped content — do not invent information.""",
expected_output="A structured analysis report with sections for summary, key findings, pricing, and competitive analysis.",
agent=analyst,
context=[research_task], # analyst receives researcher's output
)
Step 4: Assemble and Run the Crew
from crewai import Crew, Process
crew = Crew(
agents=[researcher, analyst],
tasks=[research_task, analysis_task],
process=Process.sequential, # researcher runs first, then analyst
verbose=True,
)
result = crew.kickoff(inputs={"target_url": "https://docs.example.com"})
print(result)
The crew runs sequentially: the researcher discovers and scrapes pages, then the analyst receives all scraped content and produces the final report.
Step 5: Hierarchical Process for Complex Workflows
For more complex scraping workflows, use a hierarchical process where a manager agent delegates work:
from crewai import Agent, Crew, Process
# Add a manager agent
manager = Agent(
role="Research Manager",
goal="Coordinate the research and analysis team to produce comprehensive reports",
backstory="You manage a team of researchers and analysts, delegating tasks and ensuring quality.",
allow_delegation=True,
)
hierarchical_crew = Crew(
agents=[researcher, analyst],
tasks=[research_task, analysis_task],
process=Process.hierarchical,
manager_agent=manager,
verbose=True,
)
result = hierarchical_crew.kickoff(inputs={"target_url": "https://docs.example.com"})
Real-World Example: Competitive Analysis Crew
Here's a complete example that compares multiple competitor websites:
from crewai import Agent, Task, Crew, Process
from crewai.tools import BaseTool
from firecrawl import FirecrawlApp
from pydantic import BaseModel, Field
crw = FirecrawlApp(
api_key="fc-YOUR-KEY",
api_url="http://localhost:3000"
)
class ScrapeInput(BaseModel):
url: str = Field(description="The URL to scrape")
class CompetitorScrapeTool(BaseTool):
name: str = "Scrape Competitor Page"
description: str = "Scrape a competitor's web page for analysis"
args_schema: type[BaseModel] = ScrapeInput
def _run(self, url: str) -> str:
result = crw.scrape_url(url, params={"formats": ["markdown"]})
return result.get("markdown", "Failed to scrape")
# Specialized agents for competitive analysis
scraper_agent = Agent(
role="Competitive Intelligence Researcher",
goal="Scrape competitor websites to gather product, pricing, and feature data",
backstory="Expert at navigating competitor sites and extracting key business information.",
tools=[CompetitorScrapeTool(), MapTool()],
max_iter=15,
)
comparison_agent = Agent(
role="Competitive Analyst",
goal="Compare competitors across key dimensions and identify advantages/gaps",
backstory="Seasoned analyst who turns raw competitor data into strategic insights.",
)
# Tasks
scrape_competitors = Task(
description="""Scrape the following competitor websites:
{competitors}
For each competitor, gather:
- Product features and capabilities
- Pricing information
- Key messaging and positioning
- Any technical specifications""",
expected_output="Organized scraped content from each competitor website with labeled sections.",
agent=scraper_agent,
)
compare_competitors = Task(
description="""Create a competitive comparison matrix from the scraped data.
Format as a table comparing:
| Feature | Competitor A | Competitor B | Competitor C |
Include sections for: Features, Pricing, Strengths, Weaknesses""",
expected_output="A structured competitive comparison matrix with clear winner/loser annotations per feature.",
agent=comparison_agent,
context=[scrape_competitors],
)
crew = Crew(
agents=[scraper_agent, comparison_agent],
tasks=[scrape_competitors, compare_competitors],
process=Process.sequential,
)
result = crew.kickoff(inputs={
"competitors": "https://competitor1.com, https://competitor2.com, https://competitor3.com"
})
print(result)
Using fastCRW Instead of Self-Hosted
Switch to the managed fastCRW cloud service by changing one line:
crw = FirecrawlApp(
api_key="fc-YOUR-FASTCRW-KEY",
api_url="https://fastcrw.com/api"
)
All tools and agents work identically. fastCRW is particularly useful for CrewAI crews that scrape many different sites — the managed infrastructure handles scaling and reliability for you.
Why CRW for CrewAI?
Fast tool responses keep agents on track. CrewAI agents have iteration limits. If each scrape call takes 4+ seconds, your agent burns through iterations waiting for responses. CRW's 833ms average means more useful iterations within the same budget.
Clean output reduces agent confusion. When an agent receives raw HTML or noisy content, it wastes tokens parsing irrelevant content and often makes mistakes. CRW returns clean markdown that agents can reason about directly.
Drop-in replacement. If your CrewAI crew already uses Firecrawl tools, switching to CRW requires changing only the api_url. Your agent definitions, tasks, and crew configuration stay the same.
Next Steps
- Build a RAG pipeline to make your scraped data searchable
- Use CRW's MCP server for direct agent tool integration
- Compare CRW vs Firecrawl performance and features
Get Started
Run CRW locally in one command:
docker run -p 3000:3000 ghcr.io/us/crw:latest
Or sign up for fastCRW to skip infrastructure setup and start building your CrewAI crew today.