Practical Guide to Efficiently Scraping Tmall and Taobao Product Data Using PHP

gitbox 2025-07-31

Introduction

With the rapid growth of e-commerce, online shopping has become an essential part of daily life. As China's largest e-commerce platforms, Tmall and Taobao host vast amounts of product information. This article shares how to scrape product data from these platforms using PHP, making it easy to obtain detailed product information.

Preparation

Installing Dependencies

Before starting, install two key PHP libraries to support crawler development. First is Guzzle, a powerful HTTP client used to send web requests. Install it via Composer:

composer require guzzlehttp/guzzle

Next, install the DiDom library for parsing HTML documents, which helps extract the needed information from pages:

composer require imangazaliev/didom

Obtaining Cookies

Because some product data on Tmall and Taobao require login, you must first obtain a valid login cookie. After logging in, use your browser's developer tools to copy the cookie information, simulating a logged-in state in your requests.

Scraping Product Data

Sending HTTP Requests

Use Guzzle to send a request to the product detail page to get the page's HTML source code. Set the User-Agent and Cookie headers to simulate a browser and maintain login status:

use GuzzleHttp\Client;
$client = new Client();
$response = $client->get('https://detail.tmall.com/item.htm?id=123456789', [
    'headers' => [
        'User-Agent' => 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.82 Safari/537.36',
        'Cookie' => 'your_cookie_value_here',
    ],
]);
$html = $response->getBody()->getContents();

Replace the URL with the target product's detail page to ensure the request successfully returns the content.

Parsing HTML Source Code

After obtaining the HTML, use DiDom to parse the document and extract key product data such as the title, price, image URL, and product description:

use DiDom\Document;
$document = new Document($html);
// Get product title
$title = $document->find('.tb-detail-hd h1')[0]->text();
// Get product price
$price = $document->find('.tm-price')[0]->text();
// Get product image URL
$imageUrl = $document->find('.tm-goldbox img')[0]->attr('src');
// Get product description
$description = $document->find('.tb-detail-content')[0]->text();

The CSS selectors above can be adjusted based on the actual page structure to accurately locate the elements.

Conclusion

By combining PHP, Guzzle, and DiDom, you can efficiently scrape product data from Tmall and Taobao. Simulating requests and parsing page content allows easy access to product titles, prices, images, and descriptions. This approach is valuable for market analysis and competitive intelligence gathering.

We hope this article helps you quickly get started with e-commerce data scraping and improve your data processing efficiency.