Skip to main content
Benchmarks/Benchmark / ArXivQA · research recall

Results from the ArXivQA Research-Retrieval Benchmark

fastCRW's Research API vs Firecrawl's Research Index on the alphaXiv ArXivQA benchmark — 191 paper-retrieval queries scored on recall. fastCRW reaches 61.0% vs Firecrawl 53.3%.

Published
June 20, 2026
Updated
June 20, 2026
Category
benchmarks
Verdict

On the alphaXiv ArXivQA benchmark (191 paper-retrieval queries, scored on recall), fastCRW's Research API reaches 61.0% — ahead of Firecrawl's Research Index (53.3%) and every other provider on the board (Claude 45.4%, Parallel 44.3%, Exa 43.4%). The score comes from the deployed, Firecrawl-compatible /v2/search/research/* endpoints driven by the open research skill — live, with no self-hosted paper index.

61.0% recall — highest provider on the board, ahead of Firecrawl's 53.3%Scored on the deployed /v2/search/research/* endpoints, not a private harnessLive multi-source retrieval — our own SearXNG search + live open scholarly full-text sources — no self-hosted indexDrop-in Firecrawl-compatible: point the Firecrawl SDK at api.fastcrw.com

What this benchmark measures

ArXivQA is alphaXiv's public paper-retrieval benchmark: 191 natural-language research questions, each with a set of ground-truth arXiv papers that answer it. A retriever is scored on recall — the fraction of each question's ground-truth papers it surfaces, averaged over all 191 questions. Extra papers don't hurt the score, so the game is coverage: find every paper that answers the question.

The questions are hard on purpose. Many ask for a family of papers ("papers proposing overlong reward shaping", "LoRA variants that insert a new matrix"), some ask "what does paper X compare against" (the answer lives in X's bibliography), and some ask "which open model is best on benchmark Y" (you have to read a leaderboard). A title-and-abstract search alone misses most of them.

Results

ArXivQA recall — fastCRW 61.0%, Firecrawl 53.3%, Claude 45.4%, Parallel 44.3%, Exa 43.4% — run on each provider's live deployed endpoint

ProviderRecall
fastCRW Research API61.0%
Firecrawl Research Index53.3%
Claude45.4%
Parallel44.3%
Exa43.4%

fastCRW reaches 61.0% — the highest provider on the board, +7.7 points ahead of Firecrawl's Research Index and well clear of the rest. Scored on the full 191 questions, with the ground truth hidden from the agent.

How it works — live, no index

fastCRW's Research API is not a pre-built paper index. Each query is answered live by merging multiple retrieval sources at request time:

  • Our own search (self-hosted SearXNG) — web (Google/Bing) + a research mode that routes to arXiv, Crossref, and scholarly engines. This is the primary recall driver: the agent rewrites each question into 8–12 exact-name queries (specific method/model/dataset/benchmark names), and these specific queries surface the niche papers a single broad query misses.
  • Open scholarly sources — paper metadata and the citation graph, for "what does X reference / who cites X" expansion.
  • Full-text paper search — searches paper bodies, not just titles and abstracts. In our source-by-source analysis this was the single highest-recall source and contributed ~19% of ground-truth papers that nothing else found: the papers that mention the topic in the text but not the title.

The endpoints (/v2/search/research/papers, /papers/{id}, /papers/{id}/similar, /search/research/github) are stateless primitives. The intelligence — intent routing, exact-name reframing, reading a leaderboard, pulling a paper's self-references for "compare-against" questions — lives in the open research skill, exactly the way Firecrawl splits its Research Index endpoints from its research skill.

Honest scope

  • The score is the agent + skill over the live endpoints, the same shape Firecrawl's number is measured in. It is not the endpoints called blind.
  • It is live retrieval: latency is seconds, not the milliseconds of a hot in-memory index. We trade a little speed for current coverage and zero index maintenance. The deployed endpoints answered all 191 questions; the run is the product, not a private harness.
  • The numbers for other providers are the published ArXivQA leaderboard figures.

Reproduce it

The endpoints are a drop-in target for the Firecrawl research SDK — point it at https://api.fastcrw.com and the /v2/search/research/* surface matches. The Research API docs cover every endpoint, and the open skill that drives the cascade is what reproduces the 61.0%.

Continue exploring

More from Benchmarks

View all benchmarks

Related hubs

Keep the crawl path moving