在web 開發中, cURL是一個非常常用的工具,可以用來通過各種協議(如HTTP, FTP)與服務器進行通信。在PHP 中, cURL擴展被廣泛應用於獲取遠程頁面內容、提交表單數據等操作。通常情況下, cURL會默認使用PHP 本身的User-Agent ,但在模擬瀏覽器請求時,我們需要偽裝成瀏覽器發送請求,其中一個最常見的方式就是設置自定義的User-Agent 。
User-Agent是客戶端(通常是瀏覽器)在向服務器發送請求時攜帶的一個HTTP 頭部。它通常用於標識請求的來源和客戶端的詳細信息,包括瀏覽器類型、操作系統版本等。通過User-Agent ,服務器可以根據不同的客戶端設備和瀏覽器返回不同的內容。
例如,當瀏覽器訪問網頁時,HTTP 請求頭會包含類似以下內容的User-Agent字段:
<span><span><span class="hljs-keyword">User</span></span><span><span class="hljs-operator">-</span></span><span>Agent: Mozilla</span><span><span class="hljs-operator">/</span></span><span><span class="hljs-number">5.0</span></span><span> (Windows NT </span><span><span class="hljs-number">10.0</span></span><span>; Win64; x64) AppleWebKit</span><span><span class="hljs-operator">/</span></span><span><span class="hljs-number">537.36</span></span><span> (KHTML, </span><span><span class="hljs-keyword">like</span></span><span> Gecko) Chrome</span><span><span class="hljs-operator">/</span></span><span><span class="hljs-number">91.0</span></span><span><span class="hljs-number">.4472</span></span><span><span class="hljs-number">.124</span></span><span> Safari</span><span><span class="hljs-operator">/</span></span><span><span class="hljs-number">537.36</span></span><span>
</span></span>
使用cURL擴展模擬瀏覽器請求時,可以通過curl_setopt()函數的CURLOPT_USERAGENT選項來設置自定義的User-Agent 。
在以下示例中,我們通過curl_setopt()來設置User-Agent ,以模擬Chrome 瀏覽器的請求:
<span><span><span class="hljs-title function_ invoke__">curl_setopt</span></span><span>(</span><span><span class="hljs-variable">$ch</span></span><span>, CURLOPT_USERAGENT, </span><span><span class="hljs-string">"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36"</span></span><span>);
</span></span>
上述代碼中的User-Agent字符串模擬了在Windows 操作系統下,使用Chrome 瀏覽器訪問網頁的請求。
通過下面的PHP 代碼,我們演示瞭如何在cURL 請求中設置User-Agent來模擬瀏覽器請求:
<span><span><span class="hljs-variable">$ch</span></span><span> = </span><span><span class="hljs-title function_ invoke__">curl_init</span></span><span>();
</span><span><span class="hljs-title function_ invoke__">curl_setopt</span></span><span>(</span><span><span class="hljs-variable">$ch</span></span><span>, CURLOPT_URL, </span><span><span class="hljs-string">"https://www.example.com"</span></span><span>); </span><span><span class="hljs-comment">// 目標網址</span></span><span>
</span><span><span class="hljs-title function_ invoke__">curl_setopt</span></span><span>(</span><span><span class="hljs-variable">$ch</span></span><span>, CURLOPT_USERAGENT, </span><span><span class="hljs-string">"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36"</span></span><span>); </span><span><span class="hljs-comment">// 模擬瀏覽器的 User-Agent</span></span><span>
</span><span><span class="hljs-title function_ invoke__">curl_setopt</span></span><span>(</span><span><span class="hljs-variable">$ch</span></span><span>, CURLOPT_RETURNTRANSFER, </span><span><span class="hljs-literal">true</span></span><span>); </span><span><span class="hljs-comment">// 返迴響應內容而非直接輸出</span></span><span>
</span><span><span class="hljs-variable">$response</span></span><span> = </span><span><span class="hljs-title function_ invoke__">curl_exec</span></span><span>(</span><span><span class="hljs-variable">$ch</span></span><span>); </span><span><span class="hljs-comment">// 執行請求</span></span><span>
</span><span><span class="hljs-title function_ invoke__">curl_close</span></span><span>(</span><span><span class="hljs-variable">$ch</span></span><span>); </span><span><span class="hljs-comment">// 關閉 cURL 會話</span></span><span>
</span><span><span class="hljs-keyword">echo</span></span><span> </span><span><span class="hljs-variable">$response</span></span><span>; </span><span><span class="hljs-comment">// 輸出響應內容</span></span><span>
</span></span>
為了提高爬蟲的模擬效果,你可能還需要設置更為複雜的User-Agent ,或甚至是多個User-Agent之間的隨機切換。以下是幾個常見的瀏覽器的User-Agent示例:
Google Chrome :
<span><span><span class="hljs-type">Mozilla</span></span><span><span class="hljs-regexp">/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/</span></span><span><span class="hljs-number">537.36</span></span><span> (</span><span><span class="hljs-type">KHTML</span></span><span>, like </span><span><span class="hljs-type">Gecko</span></span><span>) </span><span><span class="hljs-type">Chrome</span></span><span><span class="hljs-regexp">/91.0.4472.124 Safari/</span></span><span><span class="hljs-number">537.36</span></span><span>
</span></span>
Mozilla Firefox :
<span><span><span class="hljs-type">Mozilla</span></span><span><span class="hljs-regexp">/5.0 (Windows NT 10.0; Win64; x64) Gecko/</span></span><span><span class="hljs-number">20100101</span></span><span> </span><span><span class="hljs-type">Firefox</span></span><span><span class="hljs-operator">/</span></span><span><span class="hljs-number">89.0</span></span><span>
</span></span>
Safari (Mac) :
<span><span><span class="hljs-type">Mozilla</span></span><span><span class="hljs-regexp">/5.0 (Macintosh; Intel Mac OS X 10_15_4) AppleWebKit/</span></span><span><span class="hljs-number">537.36</span></span><span> (</span><span><span class="hljs-type">KHTML</span></span><span>, like </span><span><span class="hljs-type">Gecko</span></span><span>) </span><span><span class="hljs-type">Version</span></span><span><span class="hljs-regexp">/13.1 Safari/</span></span><span><span class="hljs-number">537.36</span></span><span>
</span></span>
你可以根據需求更改User-Agent字符串,以使請求看起來像是來自不同的設備或瀏覽器。
通過PHP 的cURL擴展,我們可以輕鬆地模擬瀏覽器請求,設置自定義的User-Agent來偽裝請求來源。這在進行網絡爬蟲、API 請求等操作時非常有用,能夠繞過一些簡單的反爬機制,避免因使用默認的User-Agent被識別為機器人。
通過上述方法,你可以靈活地設置並調整User-Agent ,模擬各種不同的瀏覽器或設備來獲取網頁內容。