Complete Web Scraping toolkit for PHP
Published on by Paul Redmond
Roach PHP is a complete web scraping toolkit for PHP. Not only does it handle the crawling of web content, but it also provides an entire pipeline to process scraped data, making it an all-in-one resource for scraping web pages with PHP.
The main features this package provides (among many other awesome web scraping features) include:
- Define Spiders (classes) designed to crawl web pages
- Data pipelines to process and collect data that spiders crawl
- Easily extract data from HTML and XML documents
- Interactive shell
- Spider middleware
- Write extensions to hook into/extend Roach PHP features
- Built-in Logging extension
While Roach PHP is framework agnostic and integrates it with any PHP project, there is a first-party roach-php/laravel package to start using Roach within Laravel projects easily. The Laravel package defines convenient services for Roach PHP and CLI commands to create spiders and run an Interactive Shell:
# Create a spider classphp artisan roach:spider LaravelDocsSpider # Start a REPL with a given URLphp artisan roach:shell https://laravel-news.com
Learn More
The Roach PHP documentation has full installation instructions and a guide with everything you need to get started. Also, be sure to check out roach-php/laravel to begin using Roach PHP in Laravel projects.