How do I build a competitor monitoring tool?

Pick a watchlist of competitor URLs (pricing, features, changelog, blog), scrape each to clean markdown on a schedule with a Firecrawl-compatible API, hash and diff each snapshot against the previous one, summarize real changes with an LLM, and surface them in a small dashboard with webhook alerts. fastCRW meters each scrape at a flat 1 credit (2 if the chrome renderer is needed), so re-scraping the same pages forever stays affordable.

How do I detect when a competitor's page changes?

Scrape the page to markdown, hash the markdown, and compare the hash to the last stored snapshot. If the hash matches, nothing changed and you skip the diff and the LLM call entirely. If it differs, run a unified diff on the markdown text. Diffing rendered markdown rather than raw HTML keeps alerts focused on substance, so re-ordered divs or changed tracking tags do not trigger false positives.

Does fastCRW store page snapshots for me?

No. fastCRW is stateless per request — it fetches a page and returns it, but does not remember previous scrapes. That is intentional: you own the history store. A small SQLite or Postgres table holding url, timestamp, content hash, and body is enough to power diffing, the dashboard, and alerting.

How much does continuous monitoring cost?

It scales with pages times polls. A 50-page watchlist polled twice daily on the http/lightpanda renderer is 100 scrape-credits per day, about 3,000 per month, which fits inside the Hobby tier; pages forcing the chrome renderer cost 2 credits each. The LLM summary only runs on pages that actually changed, so that cost tracks competitor churn, not poll frequency. Check live numbers on /pricing since pricing can drift.

Can I self-host to monitor unlimited competitor pages?

Yes. The fastCRW engine is a single ~8 MB Rust binary in one container, licensed AGPL-3.0, so you can self-host it for $0 in software cost and pay only for the server. For a tool that re-scrapes the same pages indefinitely, self-hosting converts a recurring per-page bill into a fixed VPS cost and keeps your competitor watchlist on your own infrastructure.

Build a Competitor Monitoring Tool (Dashboard)

By the fastCRW team · Credit costs and footprint verified against the canonical fact sheet 2026-05-18 · See /pricing for current rates · Verify independently before relying on any number.

Disclosure: we build fastCRW. This is a vendor-authored tutorial, so weight it accordingly — the architecture below works on any Firecrawl-compatible API, and we call out where Firecrawl genuinely wins.

Build a competitor monitoring tool that pays for itself

A competitor monitoring tool watches a handful of rival pages — pricing, features, changelog, blog — and tells you the moment something changes. The hard part is not the first scrape; it is the thousandth. You re-scrape the same pages forever, so per-page cost and infrastructure footprint dominate the economics, and a flaky diff that flaps on cosmetic HTML noise will train your team to ignore the alerts. This tutorial builds the whole loop in Python: collect pages, diff them, summarize what changed with an LLM, and surface it in a dashboard — with honest cost math at every step.

The design rides on one structural fact: fastCRW meters a scrape or crawl page at a flat 1 credit on the http or lightpanda renderer (2 credits when chrome is needed), with no ScrapingBee-style 5× JavaScript multiplier (fastCRW canonical credit table, verified 2026-05-18). Continuous monitoring is exactly the workload where flat per-page metering and a small footprint matter most.

What a competitor monitoring tool watches

Scope the tool before you write code. Most teams watch four page types, and each has a different change cadence:

Pricing pages — low change frequency, high business impact. A tier price moving is a board-meeting event.
Feature / product pages — medium frequency; new capabilities and positioning shifts.
Changelog / release notes — high frequency, append-only; the richest signal for "what are they shipping?"
Blog / announcements — high frequency, noisy; useful for narrative, bad for alerting unless filtered.

Page-change detection vs full re-scrape

You have two strategies. Full re-scrape pulls every watched URL on each run and diffs the result — simple, deterministic, and the right default for a watchlist of dozens of pages. Change detection tries to fetch only what moved (cheap HEAD checks, sitemap lastmod, ETags) before scraping. Start with full re-scrape; it is one mental model and the per-page cost is flat, so the savings from cleverness rarely justify the complexity until your watchlist is in the hundreds.

Step 1: Collect competitor pages

First discover the pages worth watching, then scrape them to a stable text format. Use /v1/map (1 credit) to enumerate a competitor's URLs, pick the ones that matter, then scrape each to markdown. Clean markdown — not raw HTML — is what makes the diff stable later.

from crw import CrwClient client = CrwClient(api_url="https://api.fastcrw.com", api_key="YOUR_KEY") # Discover URLs once, then hand-pick the watchlist (1 credit per /v1/map call). site_map = client.map(url="https://competitor.example") watchlist = [u for u in site_map["links"] if any(k in u for k in ("/pricing", "/changelog", "/features"))] def snapshot(url: str) -> str: # 1 credit on http/lightpanda; 2 if the page forces the chrome renderer. res = client.scrape(url=url, formats=["markdown"]) return res["markdown"]

Because fastCRW is Firecrawl-compatible, the same code runs against Firecrawl by changing api_url — the watchlist is portable. The auto renderer picks chrome → lightpanda → http, so JS-heavy pricing pages still render, and you only pay the 2-credit chrome rate on the pages that actually need it.

Step 2: Detect and store changes

fastCRW is stateless per request — it does not remember yesterday's scrape (fastCRW canonical honest-gaps list). That is by design, and it means you own the history store. This is the single most important architectural fact in the whole tool: the engine fetches, you persist. A tiny SQLite table is plenty to start.

import sqlite3, hashlib, datetime, difflib db = sqlite3.connect("monitor.db") db.execute("""CREATE TABLE IF NOT EXISTS snapshots( url TEXT, captured_at TEXT, content_hash TEXT, body TEXT)""") def record(url: str, body: str): h = hashlib.sha256(body.encode()).hexdigest() prev = db.execute( "SELECT content_hash, body FROM snapshots WHERE url=? " "ORDER BY captured_at DESC LIMIT 1", (url,)).fetchone() if prev and prev[0] == h: return None # unchanged — no alert, no LLM call db.execute("INSERT INTO snapshots VALUES (?,?,?,?)", (url, datetime.datetime.utcnow().isoformat(), h, body)) db.commit() if not prev: return None # first capture is a baseline, not a "change" diff = "\n".join(difflib.unified_diff( prev[1].splitlines(), body.splitlines(), lineterm="")) return diff

Hashing the markdown gives you a free no-op fast path: if the hash matches the last snapshot, you skip the diff and — crucially — skip the LLM call in Step 3, so unchanged pages cost nothing beyond the scrape. Diffing markdown rather than HTML is what keeps the signal clean: a re-ordered <div> or a changed analytics tag does not move the rendered text, so your alerts fire on substance, not cosmetic churn.

Step 3: Summarize changes with an LLM

A raw unified diff is noise to a human reader. Hand the diff to an LLM and ask it to answer two questions: what changed and why it matters. fastCRW Cloud gives you two ways to pay for that model. With managed answer mode (paid plans, no key of your own) the default model is DeepSeek, metered in credits based on usage; with BYOK you pass llmApiKey + llmProvider on any plan, including Free, and pay only the flat infra fee (fastCRW canonical search/answer facts). Note that fastCRW's first-party LLM extraction supports OpenAI and Anthropic providers; the managed search default is DeepSeek.

SUMMARY_PROMPT = """You are a competitive-intelligence analyst. Given this unified diff of a competitor's {page_type} page, output: 1. What changed (one line). 2. Why it matters to us (one line, or "low signal"). Diff: {diff}""" def summarize(diff: str, page_type: str, llm) -> str: return llm.complete(SUMMARY_PROMPT.format(page_type=page_type, diff=diff))

Because Step 2 only calls this on real changes, the LLM bill scales with how often competitors actually move — not with how often you poll.

Step 4: Build the dashboard and alerts

The dashboard is a thin read layer over your snapshot store. A single-file Streamlit app is the fastest path: list watched URLs, show the most recent change summary per URL, and let you click through to the full diff.

import streamlit as st st.title("Competitor Monitor") rows = db.execute( "SELECT url, captured_at, body FROM snapshots ORDER BY captured_at DESC").fetchall() for url, ts, _ in rows[:50]: st.write(f"**{url}** — last change {ts}")

For alerting, post the LLM summary to a Slack incoming webhook (or any webhook) the moment record() returns a non-null diff. To avoid alert fatigue, gate notifications on the LLM's "why it matters" line — suppress anything it scores as "low signal," and batch blog-page changes into a daily digest while sending pricing-page changes immediately.

import requests def alert(url: str, summary: str): if "low signal" in summary.lower(): return # batch into the daily digest instead requests.post(SLACK_WEBHOOK_URL, json={"text": f":eyes: {url}\n{summary}"})

Run it on a schedule

The whole loop is a stateless script, so cron (or GitHub Actions, or a systemd timer) is all the scheduling you need. Each run maps nothing (you cache the watchlist), scrapes each watched URL, diffs, and conditionally summarizes. See scheduled crawls with cron for the scheduling patterns in depth.

Cost and self-host vs managed

Here is the honest math for a watchlist of 50 pages polled twice a day, assuming the pages render on http/lightpanda (1 credit each):

Item	Per run	Per day	Per 30 days
Scrape 50 pages (1 credit each)	50 credits	100 credits	3,000 credits
LLM summary (only on real changes)	~0–5 changes	variable	scales with churn
Map (cached watchlist)	0	0	0

That ~3,000 scrape-credits/month sits inside the Hobby tier; pages that force the chrome renderer double to 2 credits each, so keep an eye on which competitors ship JS-only pricing pages. See live numbers on /pricing rather than trusting a table that can drift.

The bigger lever is self-hosting. The fastCRW engine is a single ~8 MB Rust binary in one container (a structural fact, not a benchmark), licensed AGPL-3.0, so you can self-host it for $0 in software cost and pay only for the server (fastCRW canonical footprint + license facts). For a monitoring tool that re-scrapes the same pages indefinitely, self-hosting turns a recurring per-page bill into a fixed VPS line item, and it keeps your competitor watchlist private — your list of who you watch never leaves your infrastructure. For the managed path with no ops, point the same code at api.fastcrw.com; the only change is the base URL.

Where Firecrawl (and managed cloud) genuinely wins

An honest tutorial states the trade-offs. fastCRW has no built-in page-monitoring product — there is no hosted "observer" that stores snapshots and diffs for you, which is exactly why this post hands you the history store. Firecrawl's cloud also offers heavier anti-bot paths for the most hardened sites; fastCRW has no Fire-engine anti-bot, so a competitor behind aggressive bot protection may need a different tool. And fastCRW is stateless by design, so all persistence, scheduling, and alerting is yours to build and operate. If you want a turnkey hosted monitor and never want to run a cron job, that convenience is real and worth paying for.

What you get in return is a tool whose ongoing cost has a floor, whose diffs are clean because they run on markdown, and whose watchlist stays on your own infrastructure when self-hosted.

Sources

fastCRW canonical fact sheet — credit costs, structural footprint, stateless/honest gaps, managed answer mode, AGPL-3.0/license, verified 2026-05-18.
fastCRW repo and live pricing: github.com/us/crw · /pricing
ScrapingBee JavaScript-rendering multiplier (5× JS, 25–75× premium/stealth tiers): scrapingbee.com/pricing

Build a Competitor Monitoring Tool (Dashboard)

Build a competitor monitoring tool that pays for itself

What a competitor monitoring tool watches

Page-change detection vs full re-scrape

Step 1: Collect competitor pages

Step 2: Detect and store changes

Step 3: Summarize changes with an LLM

Step 4: Build the dashboard and alerts

Run it on a schedule

Cost and self-host vs managed

Where Firecrawl (and managed cloud) genuinely wins

Sources

Frequently asked questions

Try CRW Free

More tutorial posts

Mastra + fastCRW: TypeScript Agents, One Binary

Build a Jobs Aggregator in Python with CRW (2026): Crawl, Extract, Filter

Build a News Aggregator in Python with CRW (2026): Crawl, Dedupe, Summarize