Laravel Cloud is here! Zero-config managed infrastructure for Laravel apps. Deploy now.

A PHP Package for Concurrent Website Crawling

Last updated on by

A PHP Package for Concurrent Website Crawling image

spatie/crawler is a PHP package by Freek Van der Herten for crawling websites concurrently using Guzzle promises. It was recently updated to version 9, introducing a new CrawlResponse object, improved scope controls, testing utilities, and more.

Key features include:

  • Handling crawl events via closure callbacks and observer classes
  • CrawlResponse object with typed accessors
  • Collecting URLs and controlling crawl scope
  • Testing with fake()
  • And more...

Handling Crawl Events

The crawler supports two approaches for handling crawl events: closure callbacks and observer classes. The closure approach looks like this:

use Spatie\Crawler\Crawler;
use Spatie\Crawler\CrawlResponse;
 
Crawler::create('https://example.com')
->onCrawled(function (string $url, CrawlResponse $response) {
echo "{$url}: {$response->status()}\n";
})
->start();

The onFailed() and onFinished() handlers follow the same pattern for handling errors and post-crawl logic. There's also the onWillCrawl() that is called before a URL is crawled.

CrawlResponse

Each crawled URL delivers a CrawlResponse object with typed accessors for common inspection needs:

Crawler::create('https://example.com')
->onCrawled(function (string $url, CrawlResponse $response) {
if ($response->wasRedirected()) {
echo "Redirected from: " . implode(' → ', $response->redirectHistory()) . "\n";
}
 
$dom = $response->dom(); // Symfony DomCrawler instance
})
->start();

The object also exposes body(), header(), and transferStats() for timing data.

Collecting URLs and Controlling Scope

With the crawler you can control scope and collect URLs without crawling each link individually. This is useful when you want to crawl a page for links—even filtering by internal links only—and return them without processing:

$urls = Crawler::create('https://example.com')
->internalOnly()
->depth(3)
->foundUrls();

Testing with fake()

Spatie always delivers excellent test helpers with their package, and the crawler package is no different. This package's fake() method lets you test crawl logic without making real HTTP requests. Pass a map of URLs to HTML strings and the crawler uses those as responses:

Crawler::create('https://example.com')
->fake([
'https://example.com' => '<html><a href="/about">About</a></html>',
'https://example.com/about' => '<html>About page</html>',
])
->foundUrls();

Other Highlights

  • Throttling: FixedDelayThrottle for a fixed delay between requests, AdaptiveThrottle to back off based on server response times
  • retry(): automatic retries on connection errors and 5xx responses
  • stream(): opt-in streaming to reduce memory usage on large crawls
  • FinishReason enum: start() returns Completed, CrawlLimitReached, TimeLimitReached, or Interrupted
  • JavaScript rendering: a JavaScriptRenderer interface with a CloudflareRenderer included and spatie/browsershot as a suggested driver
  • And more

You can find the full source at spatie/crawler on GitHub.

Paul Redmond photo

Staff writer at Laravel News. Full stack web developer and author.

Cube

Laravel Newsletter

Join 40k+ other developers and never miss out on new tips, tutorials, and more.

image
Tinkerwell

Enjoy coding and debugging in an editor designed for fast feedback and quick iterations. It's like a shell for your application – but with multi-line editing, code completion, and more.

Visit Tinkerwell
Acquaint Softtech logo

Acquaint Softtech

Acquaint Softtech offers AI-ready Laravel developers who onboard in 48 hours at $3000/Month with no lengthy sales process and a 100 percent money-back guarantee.

Acquaint Softtech
Laravel Cloud logo

Laravel Cloud

Easily create and manage your servers and deploy your Laravel applications in seconds.

Laravel Cloud
SerpApi logo

SerpApi

Access real-time search engine results through a simple API—no more scraping headaches! Use it for AI applications, SEO tools, product research, travel information, and more

SerpApi
Shift logo

Shift

Running an old Laravel version? Instant, automated Laravel upgrades and code modernization to keep your applications fresh.

Shift
Get expert guidance in a few days with a Laravel code review logo

Get expert guidance in a few days with a Laravel code review

Expert code review! Get clear, practical feedback from two Laravel devs with 10+ years of experience helping teams build better apps.

Get expert guidance in a few days with a Laravel code review
Lucky Media logo

Lucky Media

Get Lucky Now - the ideal choice for Laravel Development, with over a decade of experience!

Lucky Media
PhpStorm logo

PhpStorm

The go-to PHP IDE with extensive out-of-the-box support for Laravel and its ecosystem.

PhpStorm
SaaSykit: Laravel SaaS Starter Kit logo

SaaSykit: Laravel SaaS Starter Kit

SaaSykit is a Multi-tenant Laravel SaaS Starter Kit that comes with all features required to run a modern SaaS. Payments, Beautiful Checkout, Admin Panel, User dashboard, Auth, Ready Components, Stats, Blog, Docs and more.

SaaSykit: Laravel SaaS Starter Kit
Kirschbaum logo

Kirschbaum

Providing innovation and stability to ensure your web application succeeds.

Kirschbaum
Harpoon: Next generation time tracking and invoicing logo

Harpoon: Next generation time tracking and invoicing

The next generation time-tracking and billing software that helps your agency plan and forecast a profitable future.

Harpoon: Next generation time tracking and invoicing
Tinkerwell logo

Tinkerwell

The must-have code runner for Laravel developers. Tinker with AI, autocompletion and instant feedback on local and production environments.

Tinkerwell

The latest

View all →
DHH Joins Laravel Live Denmark 2026 for Fireside Chat with Taylor Otwell image

DHH Joins Laravel Live Denmark 2026 for Fireside Chat with Taylor Otwell

Read article
Model-Based Scheduling for Laravel with Cadence image

Model-Based Scheduling for Laravel with Cadence

Read article
Laravel's AI SDK adds sub-agents image

Laravel's AI SDK adds sub-agents

Read article
Laravel Introduces First-Party Passkey Authentication Support image

Laravel Introduces First-Party Passkey Authentication Support

Read article
Scrollbar Styling and Container Size Utilities in Tailwind CSS v4.3.0 image

Scrollbar Styling and Container Size Utilities in Tailwind CSS v4.3.0

Read article
Attach Addresses to Any Eloquent Model with Laravel Addressable image

Attach Addresses to Any Eloquent Model with Laravel Addressable

Read article