Cloudflare's New Crawl API: Scrape an Entire Website with One API Call

Web scraping has always been tedious. Spin up Playwright or Puppeteer, manage browser instances, deal with pagination, handle JavaScript-rendered content, hope the target site doesn't block you. Cloudflare just made a lot of that setup unnecessary — at least when it comes to their own network.

On March 10, 2026, Cloudflare launched the /crawl endpoint for its Browser Rendering service in open beta. The premise is straightforward: submit a starting URL, get a job ID, come back for results.

How It Works

The /crawl endpoint operates asynchronously. You POST a URL to Cloudflare's API, receive a job ID, and poll for status as pages are discovered and rendered. Results come back in your choice of HTML, Markdown, or structured JSON.

# Initiate a crawl
curl -X POST 'https://api.cloudflare.com/client/v4/accounts/{account_id}/browser-rendering/crawl' \
  -H 'Authorization: Bearer <apiToken>' \
  -d '{"url": "https://example.com"}'

Under the hood, the endpoint spins up headless browsers, renders JavaScript-heavy pages, and recursively follows links from the seed URL. It respects robots.txt and Cloudflare's own AI Crawl Control by default — meaning crawl jobs are signed agents that comply with webmaster guidance automatically.

Pricing: $0.09 per browser hour plus $2 per concurrent browser beyond the free tier.

Why This Matters for Developers

For developers building AI-powered applications, this is a significant quality-of-life improvement. The two dominant use cases Cloudflare highlights are:

RAG pipelines — Retrieve entire documentation sites, knowledge bases, or product pages and feed them into retrieval-augmented generation systems without custom scraping infrastructure.
AI training data — Efficiently collect structured Markdown output for model fine-tuning or evaluation datasets.

Beyond AI, there's utility in content monitoring (detecting changes across a site) and competitive analysis — anything where you need a full snapshot of a website's public content.

The Markdown output format is particularly convenient for developers already using tools like Cloudflare's markdown-for-agents endpoint — the crawl API acts as a natural upstream step, turning whole sites into LLM-ready text.

The Controversy

The launch generated immediate discussion on Hacker News (thread: #47329557) — and not entirely positive.

Cloudflare controls roughly 20% of global internet traffic and has built a substantial business selling bot protection to website owners — blocking scrapers by default. Critics pointed out an obvious tension: Cloudflare now sells both the lock and the key. The /crawl endpoint works partly because Cloudflare's own infrastructure doesn't block itself.

One commenter captured the sentiment directly: "It's hard to see how this isn't extorting folks by offering a working solution that, oh, Cloudflare doesn't block."

Cloudflare's counter-argument is that the endpoint respects robots.txt and their AI Crawl Control settings — so website owners who have explicitly opted out of AI crawling will still be protected. The signed-agent designation means the crawler identifies itself honestly, rather than masquerading as a regular browser.

Whether you find that reassuring or not depends on how much trust you place in Cloudflare's gatekeeping role on the web.

Practical Implications

For developers on a Cloudflare stack — particularly those using Workers, D1, or NuxtHub with Cloudflare bindings — this endpoint fits cleanly into existing workflows. A Workers script could trigger a crawl job on a schedule, store the resulting Markdown in R2, and serve it as a knowledge base for an AI assistant without ever leaving the Cloudflare ecosystem.

It's also worth noting the output quality advantage: because pages are rendered in a full headless browser, JavaScript-generated content is captured correctly — something plain HTTP scrapers consistently fail at.

Getting Started

The endpoint is available to all Cloudflare accounts in open beta. Official documentation lives at developers.cloudflare.com/browser-rendering/rest-api/crawl-endpoint/.

For RAG use cases, the combination of /crawl (full-site content retrieval) and /markdown (single-page Markdown rendering) covers most ingestion workflows without additional tooling. Whether you're building internal docs search or a customer-facing AI assistant, the barrier to getting clean, structured web content just dropped considerably.

Sources: Cloudflare Changelog — March 10, 2026, Hacker News discussion