By the fastCRW team · Credit costs and footprint verified against the canonical fact sheet 2026-05-18 · See /pricing for current rates · Verify independently before relying on any number.
Disclosure: we build fastCRW. This is a vendor-authored tutorial, so weight it accordingly — the architecture below works on any Firecrawl-compatible API, and we call out where Firecrawl genuinely wins.
Build a competitor monitoring tool that pays for itself
A competitor monitoring tool watches a handful of rival pages — pricing, features, changelog, blog — and tells you the moment something changes. The hard part is not the first scrape; it is the thousandth. You re-scrape the same pages forever, so per-page cost and infrastructure footprint dominate the economics, and a flaky diff that flaps on cosmetic HTML noise will train your team to ignore the alerts. This tutorial builds the whole loop in Python: collect pages, diff them, summarize what changed with an LLM, and surface it in a dashboard — with honest cost math at every step.
The design rides on one structural fact: fastCRW meters a scrape or crawl page at a flat 1 credit on the http or lightpanda renderer (2 credits when chrome is needed), with no ScrapingBee-style 5× JavaScript multiplier (fastCRW canonical credit table, verified 2026-05-18). Continuous monitoring is exactly the workload where flat per-page metering and a small footprint matter most.
What a competitor monitoring tool watches
Scope the tool before you write code. Most teams watch four page types, and each has a different change cadence:
- Pricing pages — low change frequency, high business impact. A tier price moving is a board-meeting event.
- Feature / product pages — medium frequency; new capabilities and positioning shifts.
- Changelog / release notes — high frequency, append-only; the richest signal for "what are they shipping?"
- Blog / announcements — high frequency, noisy; useful for narrative, bad for alerting unless filtered.
Page-change detection vs full re-scrape
You have two strategies. Full re-scrape pulls every watched URL on each run and diffs the result — simple, deterministic, and the right default for a watchlist of dozens of pages. Change detection tries to fetch only what moved (cheap HEAD checks, sitemap lastmod, ETags) before scraping. Start with full re-scrape; it is one mental model and the per-page cost is flat, so the savings from cleverness rarely justify the complexity until your watchlist is in the hundreds.
Step 1: Collect competitor pages
First discover the pages worth watching, then scrape them to a stable text format. Use /v1/map (1 credit) to enumerate a competitor's URLs, pick the ones that matter, then scrape each to markdown. Clean markdown — not raw HTML — is what makes the diff stable later.
from crw import CrwClient
client = CrwClient(api_url="https://api.fastcrw.com", api_key="YOUR_KEY")
# Discover URLs once, then hand-pick the watchlist (1 credit per /v1/map call).
site_map = client.map(url="https://competitor.example")
watchlist = [u for u in site_map["links"]
if any(k in u for k in ("/pricing", "/changelog", "/features"))]
def snapshot(url: str) -> str:
# 1 credit on http/lightpanda; 2 if the page forces the chrome renderer.
res = client.scrape(url=url, formats=["markdown"])
return res["markdown"]
Because fastCRW is Firecrawl-compatible, the same code runs against Firecrawl by changing api_url — the watchlist is portable. The auto renderer picks chrome → lightpanda → http, so JS-heavy pricing pages still render, and you only pay the 2-credit chrome rate on the pages that actually need it.
Step 2: Detect and store changes
fastCRW is stateless per request — it does not remember yesterday's scrape (fastCRW canonical honest-gaps list). That is by design, and it means you own the history store. This is the single most important architectural fact in the whole tool: the engine fetches, you persist. A tiny SQLite table is plenty to start.
import sqlite3, hashlib, datetime, difflib
db = sqlite3.connect("monitor.db")
db.execute("""CREATE TABLE IF NOT EXISTS snapshots(
url TEXT, captured_at TEXT, content_hash TEXT, body TEXT)""")
def record(url: str, body: str):
h = hashlib.sha256(body.encode()).hexdigest()
prev = db.execute(
"SELECT content_hash, body FROM snapshots WHERE url=? "
"ORDER BY captured_at DESC LIMIT 1", (url,)).fetchone()
if prev and prev[0] == h:
return None # unchanged — no alert, no LLM call
db.execute("INSERT INTO snapshots VALUES (?,?,?,?)",
(url, datetime.datetime.utcnow().isoformat(), h, body))
db.commit()
if not prev:
return None # first capture is a baseline, not a "change"
diff = "\n".join(difflib.unified_diff(
prev[1].splitlines(), body.splitlines(), lineterm=""))
return diff
Hashing the markdown gives you a free no-op fast path: if the hash matches the last snapshot, you skip the diff and — crucially — skip the LLM call in Step 3, so unchanged pages cost nothing beyond the scrape. Diffing markdown rather than HTML is what keeps the signal clean: a re-ordered <div> or a changed analytics tag does not move the rendered text, so your alerts fire on substance, not cosmetic churn.
Step 3: Summarize changes with an LLM
A raw unified diff is noise to a human reader. Hand the diff to an LLM and ask it to answer two questions: what changed and why it matters. fastCRW Cloud gives you two ways to pay for that model. With managed answer mode (paid plans, no key of your own) the default model is DeepSeek, metered in credits based on usage; with BYOK you pass llmApiKey + llmProvider on any plan, including Free, and pay only the flat infra fee (fastCRW canonical search/answer facts). Note that fastCRW's first-party LLM extraction supports OpenAI and Anthropic providers; the managed search default is DeepSeek.
SUMMARY_PROMPT = """You are a competitive-intelligence analyst.
Given this unified diff of a competitor's {page_type} page, output:
1. What changed (one line).
2. Why it matters to us (one line, or "low signal").
Diff:
{diff}"""
def summarize(diff: str, page_type: str, llm) -> str:
return llm.complete(SUMMARY_PROMPT.format(page_type=page_type, diff=diff))
Because Step 2 only calls this on real changes, the LLM bill scales with how often competitors actually move — not with how often you poll.
Step 4: Build the dashboard and alerts
The dashboard is a thin read layer over your snapshot store. A single-file Streamlit app is the fastest path: list watched URLs, show the most recent change summary per URL, and let you click through to the full diff.
import streamlit as st
st.title("Competitor Monitor")
rows = db.execute(
"SELECT url, captured_at, body FROM snapshots ORDER BY captured_at DESC").fetchall()
for url, ts, _ in rows[:50]:
st.write(f"**{url}** — last change {ts}")
For alerting, post the LLM summary to a Slack incoming webhook (or any webhook) the moment record() returns a non-null diff. To avoid alert fatigue, gate notifications on the LLM's "why it matters" line — suppress anything it scores as "low signal," and batch blog-page changes into a daily digest while sending pricing-page changes immediately.
import requests
def alert(url: str, summary: str):
if "low signal" in summary.lower():
return # batch into the daily digest instead
requests.post(SLACK_WEBHOOK_URL, json={"text": f":eyes: {url}\n{summary}"})
Run it on a schedule
The whole loop is a stateless script, so cron (or GitHub Actions, or a systemd timer) is all the scheduling you need. Each run maps nothing (you cache the watchlist), scrapes each watched URL, diffs, and conditionally summarizes. See scheduled crawls with cron for the scheduling patterns in depth.
Cost and self-host vs managed
Here is the honest math for a watchlist of 50 pages polled twice a day, assuming the pages render on http/lightpanda (1 credit each):
| Item | Per run | Per day | Per 30 days |
|---|---|---|---|
| Scrape 50 pages (1 credit each) | 50 credits | 100 credits | 3,000 credits |
| LLM summary (only on real changes) | ~0–5 changes | variable | scales with churn |
| Map (cached watchlist) | 0 | 0 | 0 |
That ~3,000 scrape-credits/month sits inside the Hobby tier; pages that force the chrome renderer double to 2 credits each, so keep an eye on which competitors ship JS-only pricing pages. See live numbers on /pricing rather than trusting a table that can drift.
The bigger lever is self-hosting. The fastCRW engine is a single ~8 MB Rust binary in one container (a structural fact, not a benchmark), licensed AGPL-3.0, so you can self-host it for $0 in software cost and pay only for the server (fastCRW canonical footprint + license facts). For a monitoring tool that re-scrapes the same pages indefinitely, self-hosting turns a recurring per-page bill into a fixed VPS line item, and it keeps your competitor watchlist private — your list of who you watch never leaves your infrastructure. For the managed path with no ops, point the same code at api.fastcrw.com; the only change is the base URL.
Where Firecrawl (and managed cloud) genuinely wins
An honest tutorial states the trade-offs. fastCRW has no built-in page-monitoring product — there is no hosted "observer" that stores snapshots and diffs for you, which is exactly why this post hands you the history store. Firecrawl's cloud also offers heavier anti-bot paths for the most hardened sites; fastCRW has no Fire-engine anti-bot, so a competitor behind aggressive bot protection may need a different tool. And fastCRW is stateless by design, so all persistence, scheduling, and alerting is yours to build and operate. If you want a turnkey hosted monitor and never want to run a cron job, that convenience is real and worth paying for.
What you get in return is a tool whose ongoing cost has a floor, whose diffs are clean because they run on markdown, and whose watchlist stays on your own infrastructure when self-hosted.
Sources
- fastCRW canonical fact sheet — credit costs, structural footprint, stateless/honest gaps, managed answer mode, AGPL-3.0/license, verified 2026-05-18.
- fastCRW repo and live pricing: github.com/us/crw · /pricing
- ScrapingBee JavaScript-rendering multiplier (5× JS, 25–75× premium/stealth tiers): scrapingbee.com/pricing
Related: Competitor monitoring with CRW · Scheduled crawls with cron · Build an AI price tracker · Data enrichment guide
