phpSpider is a lightweight and practical PHP crawler framework that allows you to quickly scrape information from the web and save it to a local database by writing simple rules. It is ideal for developers who need to extract data with specific formats and can greatly simplify the crawler development process.
Before using phpSpider, you need to install the framework. phpSpider manages dependencies via Composer, so make sure Composer is installed on your system.
composer create-project phpspider/phpspider
After installation, you can verify the framework by running a test script:
cd phpspider
php tests/simple_test.php
phpSpider supports unlimited web data scraping. Below is a demonstration of how to scrape data from a simple website.
First, create a new project by executing the command below. phpSpider will generate the corresponding project folder automatically:
php phpspider startproject myproject
Once the project is created, define scraping rules that instruct phpSpider how to extract data from the target website. Inside the myproject/rules directory, create a rule.php file with content similar to the example below:
return [
'start_urls' => [
'http://www.example.com'
],
'rules' => [
[
'type' => 'regex',
'pattern' => '/(<a.*?>(.*?)<\/a><\/div>)/',
'id' => 1,
'fields' => [
[
'name' => 'title',
'selector' => 'text()',
],
[
'name' => 'link',
'selector' => '@href',
]
],
]
],
];
?>
After configuring the rules, you can run phpSpider to start scraping data:
php phpspider run myproject
After reading this article, you should have a clear understanding of how to use phpSpider, from installation and project creation to rule configuration and running the crawler. As a simple and powerful PHP spider framework, phpSpider is well-suited for quickly building targeted scraping projects, helping you easily collect the web data you need.