Skip to main content
Tutorial

Port a TypeScript Scraper to Python: Skip the Rewrite

Port TypeScript browser automation to Python, or skip the rewrite with a Firecrawl-compatible API. Map Playwright/Puppeteer scripts or call one /v1/scrape.

fastcrw
June 12, 20268 min readLast updated: June 2, 2026

Port a TypeScript scraper to Python without rewriting the browser glue

The usual reason to port a TypeScript scraper to Python is consolidation: your scraper started life as a standalone Node service, and now it needs to live inside a Python ML or RAG codebase where the rest of the data pipeline already runs. Rewriting a Playwright or Puppeteer script line-for-line in Python works, but it carries every fragile part of the original forward — the waits, the selector drift, the headless-detection workarounds — into a second language you now have to maintain twice during the transition.

There are two honest paths. Path one is the manual port: rewrite the browser automation in Python's Playwright bindings, which have near-identical API parity. Path two is to stop maintaining browser glue at all and call the same HTTP scraping contract from both languages, so the "migration" becomes a base-URL change plus one JSON schema. This guide covers both, and is explicit about where each one breaks down.

Why port a TypeScript scraper to Python at all

The decision is rarely about the scraper itself. It's about where the scraped data goes next. If your retrieval, embedding, and evaluation code is Python, a Node scraper means a process boundary, a serialization step, and two CI pipelines for one logical job. Folding the scrape into the Python codebase removes that seam.

Before you commit to a rewrite, separate the two things a browser-automation scraper actually does:

  • Navigation and rendering — loading the page, waiting for JavaScript, getting final HTML. This is the part that's painful to port and painful to maintain.
  • Extraction — turning that HTML into the fields or Markdown you need. This is portable logic, not browser logic.

If your script only needs the rendered content (most read-only scrapers do), you can move the navigation/rendering burden behind an API and only port the extraction intent. If your script genuinely interacts — clicks, logins, multi-step forms — the manual port is the right call, and we'll say so plainly below.

Manual port: Playwright TypeScript to Playwright Python

Playwright is the one case where porting is genuinely low-friction, because the Python bindings mirror the Node API closely. The shapes line up almost one-to-one:

TypeScript (Node)Python
const browser = await chromium.launch()browser = await playwright.chromium.launch()
await page.goto(url)await page.goto(url)
await page.waitForSelector(sel)await page.wait_for_selector(sel)
await page.$eval(sel, fn)await page.eval_on_selector_all(sel, fn)
await page.content()await page.content()

The naming convention flips from camelCase to snake_case, and you choose between async_playwright and the sync API. The async bindings map most directly onto an existing async/await TypeScript script, so prefer them if your original used promises throughout.

Where the manual port still hurts

The API parity is real, but it does not save you from the parts that made the original fragile in the first place:

  • Environment — you re-pin a browser binary, re-solve headless flags, and re-install system libraries in your Python image. The browser fleet does not get lighter by changing languages.
  • Timeouts and flakiness — every waitForSelector and network-idle heuristic ports across as-is, including the ones that flake. A rewrite is a chance to fix them, but it is not a fix by itself.
  • Selector drift — CSS/XPath selectors are the same brittle strings in either language; the site changes and both versions break together.

If you are porting Puppeteer rather than Playwright, there is no first-party Python Puppeteer; you are effectively rewriting onto Playwright Python anyway. At that point the rewrite is already most of the work — which is exactly when path two starts to look better.

Skip the rewrite: one API, both languages

If your scraper is read-only — navigate, render, extract content — you can avoid porting browser code entirely by calling a scraping API that returns clean content directly. fastCRW exposes a POST /v1/scrape endpoint that takes a URL and returns Markdown by default (1 credit), handling the rendering decision server-side. The same HTTP contract is called identically from TypeScript and Python, so "porting" becomes pointing both languages at one endpoint.

Because fastCRW implements a Firecrawl-compatible REST API, this is a drop-in after a base-URL swap. If your TypeScript code already uses the Firecrawl SDK, you change the API base URL and keep the rest. The Node side stays exactly as it was; the Python side calls the same endpoint with the same request body. There is no second browser stack to stand up in either runtime.

The Python path has one extra convenience worth knowing: the crw Python SDK on PyPI ships a CrwClient() that runs a self-contained local engine. You do not need to deploy a separate server first to start scraping from Python — the SDK runs the engine itself. That removes the "stand up infrastructure before I can test" friction that usually shows up mid-migration.

For background on each language's entry point, see the Python scraping quickstart and the Node.js scraping quickstart. If you're moving off a heavier browser-automation stack rather than a hand-rolled script, the dedicated Puppeteer/Playwright-to-API migration guide walks the function-by-function mapping.

Choosing the renderer

A browser-automation script implies you needed JavaScript execution. fastCRW's renderers are auto (default), http, lightpanda, and chrome, with auto falling back chrome → lightpanda → http. For pages that needed a full browser in your original script, request the chrome renderer (2 credits instead of 1). Pages that were rendering JavaScript "just in case" often work on the lighter renderers — worth testing, because it halves the per-page cost.

Field extraction that survives the move

The most portable part of a scraper is its extraction intent — "I want the title, price, and SKU from this page." Rather than re-port CSS selectors into Python, you can define that intent once as a JSON schema and reuse it across both languages. Call /v1/scrape with formats: ["json"] and a jsonSchema, and the engine fills the schema from the page content. The same schema string is sent from TypeScript and from Python — it is data, not code, so it ports for free.

Two honest specifics to plan around:

  • Cost: a JSON-extraction request is 5 credits, versus 1 credit for a plain Markdown scrape. If you only need clean text, skip JSON and take Markdown.
  • Providers: LLM-backed extraction supports OpenAI and Anthropic only. If your stack standardizes on a different extraction model, that's a constraint to weigh before you lean on schema extraction.

For the full schema design pattern — required vs optional fields, nesting, handling missing data — see structured extraction with JSON schema. Defining the schema once and consuming it from either language is the single biggest reason the TS-to-Python move stops being a rewrite.

What you give up vs a hand-rolled browser script

An API is not a full browser, and pretending otherwise would set you up for a failed migration. State these limits plainly before you cut over:

  • Stateless per request. fastCRW holds no session between calls. If your scraper logs in, clicks through a wizard, or carries state across pages, that interaction logic cannot move to a stateless scrape endpoint — it stays in a real browser.
  • No screenshot output. A request for formats: ["screenshot"] returns HTTP 422. If your TypeScript script captured screenshots, that responsibility stays in Playwright/Puppeteer.
  • No built-in anti-bot. There is no Fire-engine-style anti-bot layer. Heavily protected sites your stealth Playwright setup defeated may still need a real browser plus a proxy.

When the port should keep a real browser

Keep the browser automation — port it manually to Playwright Python or leave it in Node behind a thin interface — when the scraper does any of the following: authenticates and maintains a session, fills and submits forms, drives multi-step interactive flows, or fights aggressive bot protection. For everything that is fundamentally "load this page and give me the content," the API path removes the browser-glue maintenance from both languages at once. A pragmatic migration often does both: read-only scrapes move to /v1/scrape, the handful of genuinely interactive flows keep a browser.

The migration in three honest steps

  1. Triage your scrapers. Split them into read-only (navigate + extract) and interactive (click/login/form). Read-only is the candidate for the API path; interactive stays a browser.
  2. Move read-only to the API. Swap the base URL to point the Firecrawl-compatible SDK at fastCRW, or use the crw Python SDK's local engine. Take Markdown for content, or define a jsonSchema for fields and pay the 5-credit JSON cost.
  3. Manually port what's left. For genuinely interactive scrapers, port TypeScript Playwright to Playwright Python using the near-1:1 API, accepting that environment and flakiness work carries over.

You can self-host the engine for $0 under AGPL-3.0 during the transition, or use the managed cloud and pay per credit — see /pricing for the current tiers.

Sources

  • fastCRW canonical fact sheet — endpoints, renderers, credits, honest gaps.
  • fastCRW open-core repo and SDKs: github.com/us/crw · Python SDK crw (PyPI) · managed cloud fastcrw.com.
  • Playwright Python vs Node API reference: playwright.dev/python (verified independently).

Related: Python scraping quickstart · Node.js scraping quickstart · Migrate Puppeteer/Playwright to an API · Structured extraction with JSON schema

FAQ

Frequently asked questions

How do I port a TypeScript Playwright scraper to Python?
Playwright's Python bindings mirror the Node API closely, so most of the port is mechanical: chromium.launch() and page.goto() stay the same, while camelCase methods become snake_case (waitForSelector → wait_for_selector, $$eval → eval_on_selector_all). Prefer the async_playwright API to match an async/await Node script. The catch is that environment setup, timeout/flakiness handling, and brittle selectors all carry over unchanged — a rewrite is a chance to fix them, not a fix by itself.
Do I have to rewrite browser automation when moving from Node to Python?
Not for read-only scrapers. If your script just navigates, renders, and extracts content, you can call a scraping API from both languages instead of porting browser glue. fastCRW's /v1/scrape returns Markdown (1 credit) and is called identically from TypeScript and Python, so the migration becomes a base-URL change. You only need a manual browser port for genuinely interactive flows — logins, form fills, multi-step wizards — because those need real browser state.
Does fastCRW have both a Python and a TypeScript path?
Yes. fastCRW exposes a Firecrawl-compatible REST API, so any HTTP client in either language works, and the Firecrawl SDK works after a base-URL swap. Python additionally has the crw SDK on PyPI, whose CrwClient() runs a self-contained local engine. The same /v1/scrape request body is valid from both languages, which is what lets one scraper contract serve a Node service and a Python codebase at once.
Can I reuse one JSON schema across languages?
Yes — that is the main reason the TS-to-Python move stops being a rewrite. Define your extraction intent once as a jsonSchema and send it with formats: ["json"] to /v1/scrape. The schema is data, not code, so the exact same string is sent from TypeScript and Python. Note that JSON extraction costs 5 credits per request (versus 1 for plain Markdown) and LLM-backed extraction supports OpenAI and Anthropic providers only.
Does the Python SDK need a separate server to run?
No. The crw Python SDK's CrwClient() runs a self-contained local engine, so you can scrape from Python without deploying a separate API server first. That removes the usual mid-migration friction of standing up infrastructure before you can test. When you do want a shared service, you can self-host the AGPL-3.0 engine for $0 (you pay only your server) or use the managed cloud and pay per credit.

Get Started

Try CRW Free

Self-host for free (AGPL) or use fastCRW cloud with 500 free credits — no credit card required.

Continue exploring

More tutorial posts

View category archive