Can PowerShell scrape websites without extra modules?

Yes. Invoke-WebRequest and Invoke-RestMethod ship with PowerShell and need no extra modules. Invoke-WebRequest fetches HTML and exposes .Content and .Links; Invoke-RestMethod deserializes JSON responses into objects. This works for static, server-rendered pages — but neither cmdlet executes JavaScript, so client-rendered content is out of reach without an external rendering step.

Why is Invoke-WebRequest HTML parsing limited in PowerShell 7?

PowerShell 7 is built on cross-platform .NET and dropped the Internet Explorer COM dependency that Windows PowerShell 5.1 used. As a result, Invoke-WebRequest in PS7 no longer populates the ParsedHtml DOM property — you get raw .Content and a flat .Links list. Tutorials that call $response.ParsedHtml.getElementsByTagName were written for 5.1 and fail under PS7, leaving regex or a third-party HTML library as the fallback.

How do I call a scraping REST API from PowerShell?

Build a JSON body with ConvertTo-Json, set an Authorization header, and POST with Invoke-RestMethod. Against fastCRW's Firecrawl-compatible /v1/scrape endpoint: $body = @{ url='...'; formats=@('markdown') } | ConvertTo-Json, then Invoke-RestMethod -Uri 'https://api.fastcrw.com/v1/scrape' -Method POST -Headers $headers -Body $body. The response is already a parsed object, with clean markdown at $result.data.markdown.

Can I run a PowerShell scraper on a schedule with a Scheduled Task?

Yes. Wrap the scrape in a .ps1 file and register it with Register-ScheduledTask, using New-ScheduledTaskAction to run pwsh.exe against your script and New-ScheduledTaskTrigger for the cadence (for example -Daily -At 3am). Log exit codes and write output to a dated file so failed runs are visible. Because the engine is stateless per request, the script is fully self-contained on each run.

Does fastCRW run on Windows / air-gapped networks?

Yes. The engine is a single static Rust binary — roughly an 8 MB Docker image needing 1 container, versus Firecrawl's multi-service stack at around 2-3 GB across 5 containers (README structural facts). It is AGPL-3.0, so you can self-host it on internal infrastructure with no cloud egress; your PowerShell scripts just point at the internal host. Scraped content and target URLs never leave your network.

PowerShell Web Scraping for Windows Teams

PowerShell web scraping with built-in cmdlets

PowerShell web scraping starts with two cmdlets that ship in the box: Invoke-WebRequest and Invoke-RestMethod. For a Windows or ops engineer, that's the appeal — no Python runtime to install, no pip, no extra modules. You can fetch a page, pull out links, and pipe the result into the rest of your automation in three lines. This guide shows the native PowerShell path honestly, where it hits a wall on modern sites, and how one Invoke-RestMethod call to a Firecrawl-compatible /v1/scrape endpoint returns clean markdown that drops straight into a Scheduled Task.

Invoke-WebRequest vs Invoke-RestMethod

The two cmdlets look similar but serve different jobs. Invoke-WebRequest returns a rich response object: status code, headers, raw content, and (on Windows PowerShell 5.1) a parsed DOM. Use it when you want the HTML itself. Invoke-RestMethod is the JSON/XML workhorse — it deserializes a JSON response body straight into a PowerShell object, so you never touch ConvertFrom-Json manually. For scraping an HTML page you reach for the former; for calling a JSON API you reach for the latter.

Invoke-WebRequest -Uri $url returns .Content, .StatusCode, .Headers, .Links, .Images.
Invoke-RestMethod -Uri $url returns the parsed object directly — ideal for a powershell scrape json api task.

Parsing the ParsedHtml/Links collections

On Windows PowerShell 5.1, Invoke-WebRequest hands back a ParsedHtml property (an Internet Explorer COM document) plus convenience collections like .Links and .Forms. A common one-liner to grab every link:

(Invoke-WebRequest -Uri $url).Links | Select-Object -ExpandProperty href

That works for simple, server-rendered pages. The problem is that ParsedHtml leans on a legacy IE engine that no longer exists in modern PowerShell.

Why the legacy IE-based parser is deprecated in PowerShell 7

PowerShell 7 (built on .NET, cross-platform) dropped the Internet Explorer COM dependency. Invoke-WebRequest in PS7 no longer populates ParsedHtml, and the basic-parsing behaviour means you get raw .Content and a flat .Links list rather than a navigable DOM. If a tutorial tells you to do $response.ParsedHtml.getElementsByTagName('div'), it was written for 5.1 and will fail under PS7. The practical takeaway: in modern PowerShell you are left parsing HTML by hand with regex or a third-party HTML library — which is exactly where things get brittle.

Authenticated and stateful requests in PowerShell

Before we get to the wall, it's worth covering the things PowerShell does well: headers, cookies, sessions, and polite throttling. A surprising amount of real scraping is just sending the right request.

WebSession, cookies, and headers

For anything that needs a login or a persistent cookie jar, use a WebRequestSession. Capture it on the first call with -SessionVariable, then reuse it with -WebSession so cookies carry across requests:

Invoke-WebRequest -Uri $login -SessionVariable s -Method POST -Body $creds
Invoke-WebRequest -Uri $page -WebSession $s

Custom headers (including a realistic User-Agent) go through -Headers @{ 'User-Agent' = '...' }. The default PowerShell user-agent is an instant tell to many servers, so set one explicitly.

Handling redirects and TLS

Both cmdlets follow redirects automatically; cap them with -MaximumRedirection when you want to detect a redirect chain instead of silently following it. On older Windows PowerShell you may still need to force a modern TLS version — [Net.ServicePointManager]::SecurityProtocol = [Net.SecurityProtocolType]::Tls12 — before the request, or the connection fails with an opaque handshake error. PowerShell 7 defaults to the OS TLS settings and rarely needs this.

Throttling and Start-Sleep backoff

Politeness keeps you off block lists. A simple loop with jittered Start-Sleep between requests goes a long way, and a try/catch with exponential backoff handles transient 429/503 responses:

foreach ($u in $urls) { ...; Start-Sleep -Seconds (Get-Random -Minimum 1 -Maximum 4) }
On a caught 429, double the delay and retry a bounded number of times.

Where PowerShell scraping hits a wall

Everything above works on static, server-rendered HTML. The trouble starts the moment a site renders its content in the browser or actively fights bots.

No JavaScript execution in Invoke-WebRequest

Invoke-WebRequest is an HTTP client, not a browser. It fetches the initial HTML payload and stops. If a page builds its product grid, pricing table, or article body with client-side JavaScript (React, Vue, or any SPA), that content simply isn't in the response — you'll scrape an empty shell and a pile of <script> tags. There is no flag that turns on rendering; PowerShell has no DOM and no JS engine.

Brittle regex/DOM parsing on modern sites

With ParsedHtml gone in PS7, the fallback is regex against raw HTML or a NuGet HTML library like HtmlAgilityPack loaded via Add-Type. Regex over HTML is notoriously fragile — a class-name change, a reordered attribute, or a whitespace tweak breaks the selector, and the scraper fails silently by returning nothing rather than erroring loudly. Maintaining those selectors across a fleet of target sites becomes a recurring chore that quietly rots over time.

Anti-bot and the limits of a single Windows IP

A scheduled scraper running from one Windows box hammers the target from a single IP. Modern anti-bot systems (rate fingerprinting, TLS/JA3 checks, JS challenges) flag that pattern quickly. PowerShell gives you no proxy rotation, no headless stealth, and no challenge-solving out of the box. For a handful of internal or friendly endpoints this is fine; for adversarial public sites it's a dead end.

Calling a Firecrawl-compatible scrape API from PowerShell

The clean fix for both the JS-rendering and parsing problems is to stop parsing HTML in PowerShell at all. Hand the URL to a scrape API and get back content that's already rendered and cleaned. fastCRW exposes a Firecrawl-compatible REST surface, so the call is a single Invoke-RestMethod POST — and because it's a drop-in for the Firecrawl API shape, any example you find for Firecrawl works after a base-URL swap.

A clean Invoke-RestMethod POST to /v1/scrape

One cmdlet, one JSON body, no module install:

$body = @{ url = 'https://example.com'; formats = @('markdown') } | ConvertTo-Json
$headers = @{ Authorization = "Bearer $env:CRW_API_KEY"; 'Content-Type' = 'application/json' }
$result = Invoke-RestMethod -Uri 'https://api.fastcrw.com/v1/scrape' -Method POST -Headers $headers -Body $body

Because Invoke-RestMethod deserializes the JSON response automatically, $result is already a navigable PowerShell object — no ConvertFrom-Json needed.

Getting markdown back instead of raw HTML

With formats = @('markdown') the response carries clean, LLM-ready markdown at $result.data.markdown — the boilerplate, nav chrome, and scripts stripped out. That's the difference that matters: instead of writing a regex to find the article body, you get the article body. Save it straight to disk with $result.data.markdown | Set-Content out.md. This is also why the approach holds up where native parsing breaks — accuracy is the headline of fastCRW's benchmark: the highest truth-recall of three tools tested, 63.74% of 819 labeled URLs (diagnose_3way.py, Firecrawl public dataset, 2026-05-08), ahead of Crawl4AI (59.95%) and Firecrawl (56.04%).

Structured extraction with a JSON schema

If you want typed fields rather than prose, ask for formats = @('json') and pass a jsonSchema. The engine runs an LLM extraction pass and returns data matching your schema at $result.data.json — which Invoke-RestMethod hands you as a native object you can drop into a CSV or a database. See structured extraction with a JSON schema for the full pattern. Two honest notes: a request with formats: ["json"] is a 5-credit operation (vs 1 credit for a plain markdown scrape), and LLM extraction supports OpenAI and Anthropic providers only.

Wiring scraping into Windows automation

The payoff for staying inside PowerShell is that the result slots into the Windows automation you already run. No new runtime, no cross-language glue.

Running it from a Scheduled Task

Wrap the scrape in a .ps1 script and register it with Task Scheduler so it runs nightly, hourly, or on whatever trigger you need:

$action = New-ScheduledTaskAction -Execute 'pwsh.exe' -Argument '-File C:\scripts\scrape.ps1'
$trigger = New-ScheduledTaskTrigger -Daily -At 3am
Register-ScheduledTask -TaskName 'NightlyScrape' -Action $action -Trigger $trigger

Log exit codes and write the markdown/JSON output to a dated file so a failed run is visible the next morning. If you need a richer schedule with locking and retries, the scheduled crawls and cron pattern guide covers the same ideas with a cron-style scheduler.

Self-host vs managed for air-gapped Windows shops

Plenty of Windows teams run in locked-down or air-gapped environments where sending URLs to a cloud API is a non-starter. fastCRW's engine is a single static Rust binary — roughly an 8 MB Docker image needing 1 container, versus Firecrawl's multi-service stack at around 2–3 GB across 5 containers (README structural facts, not a benchmark claim). That footprint is the whole point for compliance-bound shops: it self-hosts cleanly on internal infrastructure, the engine is AGPL-3.0, and your PowerShell scripts just point $Uri at the internal host instead of the cloud. Scraped content and target URLs never leave your network. If you'd rather not run anything, the managed cloud handles it; the script is identical apart from the base URL.

Honest gaps: stateless, no screenshot output

Two limits to plan around. First, the engine is stateless per request — there is no persistent server-side session, so multi-step authenticated flows are something you orchestrate in your PowerShell WebRequestSession, not on the API. Second, there is no screenshot output: a request for formats: ["screenshot"] returns HTTP 422. If your task specifically needs page images, this isn't the tool — a headless browser like Playwright is. For HTML-to-markdown and structured extraction inside Windows automation, though, a single Invoke-RestMethod call is the shortest honest path.

Sources

fastCRW canonical fact sheet — scrape benchmark (diagnose_3way.py, 819 labeled URLs, 2026-05-08), structural footprint, endpoint surface, honest gaps. github.com/us/crw
Microsoft PowerShell docs — Invoke-WebRequest and Invoke-RestMethod (basic parsing in PowerShell 7).
fastCRW plans and managed cloud: /pricing · fastcrw.com