Cloudflare's New Crawl API: Scrape an Entire Website with One API Call
Cloudflare launched a /crawl endpoint for its Browser Rendering service on March 10, 2026 — enabling developers to crawl entire websites asynchronously and get back HTML, Markdown, or JSON. Here's what it means for AI pipelines and web development.
Web scraping has always been tedious. Spin up Playwright or Puppeteer, manage browser instances, deal with pagination, handle JavaScript-rendered content, hope the target site doesn't block you. Cloudflare just made a lot of that setup unnecessary — at least when it comes to their own network.
On March 10, 2026, Cloudflare launched the /crawl endpoint for its Browser Rendering service in open beta. The premise is straightforward: submit a starting URL, get a job ID, come back for results.
How It Works
The /crawl endpoint operates asynchronously. You POST a URL to Cloudflare's API, receive a job ID, and poll for status as pages are discovered and rendered. Results come back in your choice of HTML, Markdown, or structured JSON.
# Initiate a crawl
curl -X POST 'https://api.cloudflare.com/client/v4/accounts/{account_id}/browser-rendering/crawl' \
-H 'Authorization: Bearer <apiToken>' \
-d '{"url": "https://example.com"}'
Under the hood, the endpoint spins up headless browsers, renders JavaScript-heavy pages, and recursively follows links from the seed URL. It respects robots.txt and Cloudflare's own AI Crawl Control by default — meaning crawl jobs are signed agents that comply with webmaster guidance automatically.
Pricing: $0.09 per browser hour plus $2 per concurrent browser beyond the free tier.
Why This Matters for Developers
For developers building AI-powered applications, this is a significant quality-of-life improvement. The two dominant use cases Cloudflare highlights are:
- RAG pipelines — Retrieve entire documentation sites, knowledge bases, or product pages and feed them into retrieval-augmented generation systems without custom scraping infrastructure.
- AI training data — Efficiently collect structured Markdown output for model fine-tuning or evaluation datasets.
Beyond AI, there's utility in content monitoring (detecting changes across a site) and competitive analysis — anything where you need a full snapshot of a website's public content.
The Markdown output format is particularly convenient for developers already using tools like Cloudflare's markdown-for-agents endpoint — the crawl API acts as a natural upstream step, turning whole sites into LLM-ready text.
The Controversy
The launch generated immediate discussion on Hacker News (thread: #47329557) — and not entirely positive.
Cloudflare controls roughly 20% of global internet traffic and has built a substantial business selling bot protection to website owners — blocking scrapers by default. Critics pointed out an obvious tension: Cloudflare now sells both the lock and the key. The /crawl endpoint works partly because Cloudflare's own infrastructure doesn't block itself.
One commenter captured the sentiment directly: "It's hard to see how this isn't extorting folks by offering a working solution that, oh, Cloudflare doesn't block."
Cloudflare's counter-argument is that the endpoint respects robots.txt and their AI Crawl Control settings — so website owners who have explicitly opted out of AI crawling will still be protected. The signed-agent designation means the crawler identifies itself honestly, rather than masquerading as a regular browser.
Whether you find that reassuring or not depends on how much trust you place in Cloudflare's gatekeeping role on the web.
Practical Implications
For developers on a Cloudflare stack — particularly those using Workers, D1, or NuxtHub with Cloudflare bindings — this endpoint fits cleanly into existing workflows. A Workers script could trigger a crawl job on a schedule, store the resulting Markdown in R2, and serve it as a knowledge base for an AI assistant without ever leaving the Cloudflare ecosystem.
It's also worth noting the output quality advantage: because pages are rendered in a full headless browser, JavaScript-generated content is captured correctly — something plain HTTP scrapers consistently fail at.
Getting Started
The endpoint is available to all Cloudflare accounts in open beta. Official documentation lives at developers.cloudflare.com/browser-rendering/rest-api/crawl-endpoint/.
For RAG use cases, the combination of /crawl (full-site content retrieval) and /markdown (single-page Markdown rendering) covers most ingestion workflows without additional tooling. Whether you're building internal docs search or a customer-facing AI assistant, the barrier to getting clean, structured web content just dropped considerably.
Sources: Cloudflare Changelog — March 10, 2026, Hacker News discussion