With the richness of internet content, there are countless high-quality images available online. Often, we need to batch scrape images from specific websites, such as landscape photos or artwork. This article will show you how to use PHP to write scripts that automatically collect high-quality images from websites.
Before scraping, it's essential to analyze the target website's structure. Usually, images are embedded via the img tag in the HTML. By examining the source code, we can find the pattern for image URLs and extract the desired image links accordingly.
Once the scraping approach is clear, we use PHP’s curl function to fetch the webpage HTML, then apply regex to match image URLs, and finally loop through the URLs to download and save the images.
// Initialize curl
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_HEADER, 0);
// Get HTML content
$html = curl_exec($ch);
curl_close($ch);
// Define regex pattern
$pattern = '/<img[^>]+src=["\']?([^"\'>]+)["\']?[^>]*>/is';
// Extract image URLs
preg_match_all($pattern, $html, $matches);
$matches = $matches[1];
// Remove duplicates
$matches = array_unique($matches);
// Set image save path
$path = "./images/";
if(!file_exists($path)){
mkdir($path);
}
// Download images
foreach($matches as $key => $value){
// Get image filename
$imgname = basename($value);
// Open file
$fp = fopen($path . $imgname, 'w');
// Setup curl
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $value);
curl_setopt($ch, CURLOPT_FILE, $fp);
curl_setopt($ch, CURLOPT_HEADER, 0);
// Execute download
curl_exec($ch);
curl_close($ch);
// Close file
fclose($fp);
}
This approach allows you to implement a simple and efficient web image scraper. You can adjust and optimize the code according to your specific needs. If you encounter issues like failed downloads, check network conditions and permissions. Hopefully, this guide helps you master PHP image scraping easily.