當前位置: 首頁> 最新文章列表> 使用get_meta_tags 獲取頁面標題和關鍵字時的常見問題及解決辦法

使用get_meta_tags 獲取頁面標題和關鍵字時的常見問題及解決辦法

gitbox 2025-09-16

get_meta_tags()是PHP 內置的一個方便函數,用來從遠程或本地HTML 文件中提取<meta name="...">標籤的內容。它常被用於抓取頁面關鍵字(keywords)或描述(description)。然而在實際使用中,開發者會遇到各種問題:提取不到標題、關鍵字為空、字符編碼錯亂、遠程請求失敗、meta 標籤寫法不規範等。本文總結常見問題、產生原因,並給出對策與更健壯的替代方案(含可複制的PHP 示例代碼)。


1. get_meta_tags()的工作方式與限制(先理解再調試)

  • get_meta_tags(string $filename, bool $use_include_path = false) :它讀取文件並嘗試解析<meta name="xxx" content="yyy"> ,返回一個關聯數組name => content (全部小寫的name)。

  • 它不會獲取<title>標籤內容(即頁面標題),也不會解析<meta property="og:..."><meta charset="...">等非name屬性的meta。

  • 它對HTML 的要求相對嚴格:meta 必須以name="..."content="..."的形式存在,屬性順序或換行有時會影響解析。

結論:如果你需要頁面<title> ,或meta用了property (如Open Graph), get_meta_tags()單獨使用就不夠了。


2. 常見問題與解決辦法一覽

問題A:無法獲取<title> (頁面標題)

原因get_meta_tags()不解析<title>
解決辦法:用DOMDocument或正則(不推薦)解析<title> 。示例(推薦DOMDocument ):

 <span><span><span class="hljs-function"><span class="hljs-keyword">function</span></span></span><span> </span><span><span class="hljs-title">fetch_title</span></span><span>(</span><span><span class="hljs-params"><span class="hljs-variable">$html</span></span></span><span>) {
    </span><span><span class="hljs-variable">$dom</span></span><span> = </span><span><span class="hljs-keyword">new</span></span><span> </span><span><span class="hljs-title class_">DOMDocument</span></span><span>();
    </span><span><span class="hljs-comment">// suppress warnings for malformed HTML</span></span><span>
    @</span><span><span class="hljs-variable">$dom</span></span><span>-&gt;</span><span><span class="hljs-title function_ invoke__">loadHTML</span></span><span>(</span><span><span class="hljs-variable">$html</span></span><span>, LIBXML_NOWARNING | LIBXML_NOERROR);
    </span><span><span class="hljs-variable">$nodes</span></span><span> = </span><span><span class="hljs-variable">$dom</span></span><span>-&gt;</span><span><span class="hljs-title function_ invoke__">getElementsByTagName</span></span><span>(</span><span><span class="hljs-string">'title'</span></span><span>);
    </span><span><span class="hljs-keyword">return</span></span><span> </span><span><span class="hljs-variable">$nodes</span></span><span>-&gt;length ? </span><span><span class="hljs-title function_ invoke__">trim</span></span><span>(</span><span><span class="hljs-variable">$nodes</span></span><span>-&gt;</span><span><span class="hljs-title function_ invoke__">item</span></span><span>(</span><span><span class="hljs-number">0</span></span><span>)-&gt;textContent) : </span><span><span class="hljs-literal">null</span></span><span>;
}
</span></span>

如果需要遠程獲取頁面內容,請先file_get_contents / curl拉下HTML,再傳給fetch_title()


問題B: get_meta_tags()返回空數組或缺失某些meta

可能原因

  1. HTML meta 的寫法不是name="..." + content="..." (如property="og:..."http-equiv )。

  2. meta 在<head>之外(或頁面結構不規範)。

  3. 字符編碼或BOM 導致解析失敗。

  4. allow_url_fopen被禁用,無法使用URL。

解決辦法

  • 檢查meta 屬性類型,必要時使用DOMDocument檢查meta->getAttribute('name')meta->getAttribute('property')

  • 對遠程URL,優先使用curl獲取頁面內容(更靈活),然後使用DOM 解析。

  • allow_url_fopen被禁用,改用curl

示例:用curl + DOM 提取常見meta(包括nameproperty )與標題:

 <span><span><span class="hljs-function"><span class="hljs-keyword">function</span></span></span><span> </span><span><span class="hljs-title">fetch_html</span></span><span>(</span><span><span class="hljs-params"><span class="hljs-variable">$url</span></span></span><span>, </span><span><span class="hljs-variable">$timeout</span></span><span> = </span><span><span class="hljs-number">10</span></span><span>) {
    </span><span><span class="hljs-variable">$ch</span></span><span> = </span><span><span class="hljs-title function_ invoke__">curl_init</span></span><span>(</span><span><span class="hljs-variable">$url</span></span><span>);
    </span><span><span class="hljs-title function_ invoke__">curl_setopt_array</span></span><span>(</span><span><span class="hljs-variable">$ch</span></span><span>, [
        CURLOPT_RETURNTRANSFER =&gt; </span><span><span class="hljs-literal">true</span></span><span>,
        CURLOPT_FOLLOWLOCATION =&gt; </span><span><span class="hljs-literal">true</span></span><span>,
        CURLOPT_MAXREDIRS =&gt; </span><span><span class="hljs-number">5</span></span><span>,
        CURLOPT_CONNECTTIMEOUT =&gt; </span><span><span class="hljs-variable">$timeout</span></span><span>,
        CURLOPT_TIMEOUT =&gt; </span><span><span class="hljs-variable">$timeout</span></span><span>,
        CURLOPT_USERAGENT =&gt; </span><span><span class="hljs-string">'Mozilla/5.0 (compatible; PHP script)'</span></span><span>,
    ]);
    </span><span><span class="hljs-variable">$html</span></span><span> = </span><span><span class="hljs-title function_ invoke__">curl_exec</span></span><span>(</span><span><span class="hljs-variable">$ch</span></span><span>);
    </span><span><span class="hljs-variable">$err</span></span><span> = </span><span><span class="hljs-title function_ invoke__">curl_error</span></span><span>(</span><span><span class="hljs-variable">$ch</span></span><span>);
    </span><span><span class="hljs-title function_ invoke__">curl_close</span></span><span>(</span><span><span class="hljs-variable">$ch</span></span><span>);
    </span><span><span class="hljs-keyword">if</span></span><span> (</span><span><span class="hljs-variable">$html</span></span><span> === </span><span><span class="hljs-literal">false</span></span><span>) {
        </span><span><span class="hljs-keyword">throw</span></span><span> </span><span><span class="hljs-keyword">new</span></span><span> </span><span><span class="hljs-built_in">RuntimeException</span></span><span>(</span><span><span class="hljs-string">"Failed to fetch URL: <span class="hljs-subst">$err</span></span></span><span>");
    }
    </span><span><span class="hljs-keyword">return</span></span><span> </span><span><span class="hljs-variable">$html</span></span><span>;
}

</span><span><span class="hljs-function"><span class="hljs-keyword">function</span></span></span><span> </span><span><span class="hljs-title">parse_meta_and_title</span></span><span>(</span><span><span class="hljs-params"><span class="hljs-variable">$html</span></span></span><span>) {
    </span><span><span class="hljs-variable">$dom</span></span><span> = </span><span><span class="hljs-keyword">new</span></span><span> </span><span><span class="hljs-title class_">DOMDocument</span></span><span>();
    @</span><span><span class="hljs-variable">$dom</span></span><span>-&gt;</span><span><span class="hljs-title function_ invoke__">loadHTML</span></span><span>(</span><span><span class="hljs-variable">$html</span></span><span>, LIBXML_NOWARNING | LIBXML_NOERROR);
    </span><span><span class="hljs-variable">$result</span></span><span> = [</span><span><span class="hljs-string">'title'</span></span><span> =&gt; </span><span><span class="hljs-literal">null</span></span><span>, </span><span><span class="hljs-string">'meta'</span></span><span> =&gt; []];

    </span><span><span class="hljs-comment">// title</span></span><span>
    </span><span><span class="hljs-variable">$titles</span></span><span> = </span><span><span class="hljs-variable">$dom</span></span><span>-&gt;</span><span><span class="hljs-title function_ invoke__">getElementsByTagName</span></span><span>(</span><span><span class="hljs-string">'title'</span></span><span>);
    </span><span><span class="hljs-keyword">if</span></span><span> (</span><span><span class="hljs-variable">$titles</span></span><span>-&gt;length) {
        </span><span><span class="hljs-variable">$result</span></span><span>[</span><span><span class="hljs-string">'title'</span></span><span>] = </span><span><span class="hljs-title function_ invoke__">trim</span></span><span>(</span><span><span class="hljs-variable">$titles</span></span><span>-&gt;</span><span><span class="hljs-title function_ invoke__">item</span></span><span>(</span><span><span class="hljs-number">0</span></span><span>)-&gt;textContent);
    }

    </span><span><span class="hljs-comment">// metas</span></span><span>
    </span><span><span class="hljs-variable">$metas</span></span><span> = </span><span><span class="hljs-variable">$dom</span></span><span>-&gt;</span><span><span class="hljs-title function_ invoke__">getElementsByTagName</span></span><span>(</span><span><span class="hljs-string">'meta'</span></span><span>);
    </span><span><span class="hljs-keyword">foreach</span></span><span> (</span><span><span class="hljs-variable">$metas</span></span><span> </span><span><span class="hljs-keyword">as</span></span><span> </span><span><span class="hljs-variable">$meta</span></span><span>) {
        </span><span><span class="hljs-variable">$name</span></span><span> = </span><span><span class="hljs-variable">$meta</span></span><span>-&gt;</span><span><span class="hljs-title function_ invoke__">getAttribute</span></span><span>(</span><span><span class="hljs-string">'name'</span></span><span>);
        </span><span><span class="hljs-variable">$prop</span></span><span> = </span><span><span class="hljs-variable">$meta</span></span><span>-&gt;</span><span><span class="hljs-title function_ invoke__">getAttribute</span></span><span>(</span><span><span class="hljs-string">'property'</span></span><span>); </span><span><span class="hljs-comment">// og: 等</span></span><span>
        </span><span><span class="hljs-variable">$http_equiv</span></span><span> = </span><span><span class="hljs-variable">$meta</span></span><span>-&gt;</span><span><span class="hljs-title function_ invoke__">getAttribute</span></span><span>(</span><span><span class="hljs-string">'http-equiv'</span></span><span>);
        </span><span><span class="hljs-variable">$content</span></span><span> = </span><span><span class="hljs-variable">$meta</span></span><span>-&gt;</span><span><span class="hljs-title function_ invoke__">getAttribute</span></span><span>(</span><span><span class="hljs-string">'content'</span></span><span>);

        </span><span><span class="hljs-keyword">if</span></span><span> (</span><span><span class="hljs-variable">$name</span></span><span>) {
            </span><span><span class="hljs-variable">$result</span></span><span>[</span><span><span class="hljs-string">'meta'</span></span><span>][</span><span><span class="hljs-title function_ invoke__">strtolower</span></span><span>(</span><span><span class="hljs-variable">$name</span></span><span>)] = </span><span><span class="hljs-variable">$content</span></span><span>;
        } </span><span><span class="hljs-keyword">elseif</span></span><span> (</span><span><span class="hljs-variable">$prop</span></span><span>) {
            </span><span><span class="hljs-variable">$result</span></span><span>[</span><span><span class="hljs-string">'meta'</span></span><span>][</span><span><span class="hljs-title function_ invoke__">strtolower</span></span><span>(</span><span><span class="hljs-variable">$prop</span></span><span>)] = </span><span><span class="hljs-variable">$content</span></span><span>;
        } </span><span><span class="hljs-keyword">elseif</span></span><span> (</span><span><span class="hljs-variable">$http_equiv</span></span><span>) {
            </span><span><span class="hljs-variable">$result</span></span><span>[</span><span><span class="hljs-string">'meta'</span></span><span>][</span><span><span class="hljs-title function_ invoke__">strtolower</span></span><span>(</span><span><span class="hljs-variable">$http_equiv</span></span><span>)] = </span><span><span class="hljs-variable">$content</span></span><span>;
        }
    }
    </span><span><span class="hljs-keyword">return</span></span><span> </span><span><span class="hljs-variable">$result</span></span><span>;
}
</span></span>

問題C:字符編碼(中文等多字節)亂碼

原因

  • 頁面使用的編碼(如UTF-8、GBK)與DOMDocument::loadHTML默認行為不匹配。

  • HTTP header 與頁面meta 中的charset 信息不一致。

解決辦法

  • loadHTML()前把HTML 轉為UTF-8(若不是),並在頭部注入<meta http-equiv="Content-Type" content="text/html; charset=utf-8"> ,這樣DOMDocument更易識別。

  • 使用mb_detect_encoding()判斷編碼並轉換為UTF-8。

示例:

 <span><span><span class="hljs-function"><span class="hljs-keyword">function</span></span></span><span> </span><span><span class="hljs-title">normalize_to_utf8</span></span><span>(</span><span><span class="hljs-params"><span class="hljs-variable">$html</span></span></span><span>) {
    </span><span><span class="hljs-comment">// 嘗試通過 BOM 或 meta 判斷編碼,若不確定則用 mb_detect_encoding</span></span><span>
    </span><span><span class="hljs-variable">$encoding</span></span><span> = </span><span><span class="hljs-literal">null</span></span><span>;
    </span><span><span class="hljs-keyword">if</span></span><span> (</span><span><span class="hljs-title function_ invoke__">preg_match</span></span><span>(</span><span><span class="hljs-string">'/&lt;meta.+?charset=["\']?\s*([a-zA-Z0-9\-\_]+)\b/i'</span></span><span>, </span><span><span class="hljs-variable">$html</span></span><span>, </span><span><span class="hljs-variable">$m</span></span><span>)) {
        </span><span><span class="hljs-variable">$encoding</span></span><span> = </span><span><span class="hljs-title function_ invoke__">strtoupper</span></span><span>(</span><span><span class="hljs-variable">$m</span></span><span>[</span><span><span class="hljs-number">1</span></span><span>]);
    }
    </span><span><span class="hljs-keyword">if</span></span><span> (!</span><span><span class="hljs-variable">$encoding</span></span><span>) {
        </span><span><span class="hljs-variable">$encoding</span></span><span> = </span><span><span class="hljs-title function_ invoke__">mb_detect_encoding</span></span><span>(</span><span><span class="hljs-variable">$html</span></span><span>, [</span><span><span class="hljs-string">'UTF-8'</span></span><span>,</span><span><span class="hljs-string">'GB2312'</span></span><span>,</span><span><span class="hljs-string">'GBK'</span></span><span>,</span><span><span class="hljs-string">'ISO-8859-1'</span></span><span>], </span><span><span class="hljs-literal">true</span></span><span>);
    }
    </span><span><span class="hljs-keyword">if</span></span><span> (</span><span><span class="hljs-variable">$encoding</span></span><span> &amp;&amp; </span><span><span class="hljs-title function_ invoke__">strtoupper</span></span><span>(</span><span><span class="hljs-variable">$encoding</span></span><span>) !== </span><span><span class="hljs-string">'UTF-8'</span></span><span>) {
        </span><span><span class="hljs-variable">$html</span></span><span> = </span><span><span class="hljs-title function_ invoke__">mb_convert_encoding</span></span><span>(</span><span><span class="hljs-variable">$html</span></span><span>, </span><span><span class="hljs-string">'UTF-8'</span></span><span>, </span><span><span class="hljs-variable">$encoding</span></span><span>);
    }
    </span><span><span class="hljs-comment">// 保證 loadHTML 識別為 UTF-8</span></span><span>
    </span><span><span class="hljs-keyword">if</span></span><span> (</span><span><span class="hljs-title function_ invoke__">stripos</span></span><span>(</span><span><span class="hljs-variable">$html</span></span><span>, </span><span><span class="hljs-string">'&lt;meta http-equiv="Content-Type"'</span></span><span>) === </span><span><span class="hljs-literal">false</span></span><span>) {
        </span><span><span class="hljs-variable">$html</span></span><span> = </span><span><span class="hljs-title function_ invoke__">preg_replace</span></span><span>(</span><span><span class="hljs-string">'/&lt;head([^&gt;]*)&gt;/i'</span></span><span>, </span><span><span class="hljs-string">'&lt;head$1&gt;&lt;meta http-equiv="Content-Type" content="text/html; charset=utf-8"&gt;'</span></span><span>, </span><span><span class="hljs-variable">$html</span></span><span>, </span><span><span class="hljs-number">1</span></span><span>);
    }
    </span><span><span class="hljs-keyword">return</span></span><span> </span><span><span class="hljs-variable">$html</span></span><span>;
}
</span></span>

問題D: get_meta_tags()對HTML 註釋或不規則格式敏感

原因:函數內部基於簡單的解析器,遇到換行、註釋、或在content屬性中嵌套奇怪字符會失敗。
解決辦法:使用DOMDocument容錯更好;或者先把HTML 中頭部做預處理(去除註釋、壓平屬性到同一行)再調用get_meta_tags() (不太優雅,但可作為短期補救)。


問題E:抓取遠程頁面超時、被反爬或返回403/429

對策

  • 使用CURLOPT_USERAGENT設置常見瀏覽器UA。

  • 設置合理的CURLOPT_TIMEOUTCURLOPT_CONNECTTIMEOUT

  • 支持CURLOPT_FOLLOWLOCATION (注意在某些環境中需要啟用)。

  • 若站點有反爬策略(驗證碼、JS 渲染、反機器人),考慮:

    • 簡單請求頭偽裝(但要遵守法律與網站robots 協議)。

    • 使用帶JS 的抓取工具(如headless 瀏覽器),但這超出PHP 原生範疇。

  • 處理HTTP 狀態碼並在失敗時重試(指數回退),但避免過度請求。

示例:帶headers 的curl:

 <span><span><span class="hljs-title function_ invoke__">curl_setopt_array</span></span><span>(</span><span><span class="hljs-variable">$ch</span></span><span>, [
    CURLOPT_RETURNTRANSFER =&gt; </span><span><span class="hljs-literal">true</span></span><span>,
    CURLOPT_FOLLOWLOCATION =&gt; </span><span><span class="hljs-literal">true</span></span><span>,
    CURLOPT_MAXREDIRS =&gt; </span><span><span class="hljs-number">5</span></span><span>,
    CURLOPT_CONNECTTIMEOUT =&gt; </span><span><span class="hljs-number">10</span></span><span>,
    CURLOPT_TIMEOUT =&gt; </span><span><span class="hljs-number">15</span></span><span>,
    CURLOPT_USERAGENT =&gt; </span><span><span class="hljs-string">'Mozilla/5.0 (Windows NT 10.0; Win64; x64)'</span></span><span>,
    CURLOPT_HTTPHEADER =&gt; [
        </span><span><span class="hljs-string">'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8'</span></span><span>,
        </span><span><span class="hljs-string">'Accept-Language: en-US,en;q=0.5'</span></span><span>,
    ],
]);
</span></span>

問題F: get_meta_tags()只返回小寫的鍵名

這是函數的設計:鍵名會被轉為小寫。如果你的業務依賴大小寫敏感的字段,請注意標準化鍵名。


3. 推薦的穩健實現(統一獲取title、keywords、description、og 標籤)

下面給出一個組合函數:先用curl取HTML,再做編碼歸一化,最後用DOM 解析並返回常見字段與所有meta 列表。

 <span><span><span class="hljs-function"><span class="hljs-keyword">function</span></span></span><span> </span><span><span class="hljs-title">fetch_page_info</span></span><span>(</span><span><span class="hljs-params"><span class="hljs-variable">$url</span></span></span><span>) {
    </span><span><span class="hljs-variable">$html</span></span><span> = </span><span><span class="hljs-title function_ invoke__">fetch_html</span></span><span>(</span><span><span class="hljs-variable">$url</span></span><span>);
    </span><span><span class="hljs-variable">$html</span></span><span> = </span><span><span class="hljs-title function_ invoke__">normalize_to_utf8</span></span><span>(</span><span><span class="hljs-variable">$html</span></span><span>);
    </span><span><span class="hljs-variable">$data</span></span><span> = </span><span><span class="hljs-title function_ invoke__">parse_meta_and_title</span></span><span>(</span><span><span class="hljs-variable">$html</span></span><span>);

    </span><span><span class="hljs-comment">// 常見字段規範化:title, keywords, description</span></span><span>
    </span><span><span class="hljs-variable">$info</span></span><span> = [];
    </span><span><span class="hljs-variable">$info</span></span><span>[</span><span><span class="hljs-string">'title'</span></span><span>] = </span><span><span class="hljs-variable">$data</span></span><span>[</span><span><span class="hljs-string">'title'</span></span><span>] ?? </span><span><span class="hljs-literal">null</span></span><span>;
    </span><span><span class="hljs-variable">$meta</span></span><span> = </span><span><span class="hljs-variable">$data</span></span><span>[</span><span><span class="hljs-string">'meta'</span></span><span>] ?? [];

    </span><span><span class="hljs-variable">$info</span></span><span>[</span><span><span class="hljs-string">'keywords'</span></span><span>] = </span><span><span class="hljs-variable">$meta</span></span><span>[</span><span><span class="hljs-string">'keywords'</span></span><span>] ?? (</span><span><span class="hljs-variable">$meta</span></span><span>[</span><span><span class="hljs-string">'og:site_name'</span></span><span>] ?? </span><span><span class="hljs-literal">null</span></span><span>);
    </span><span><span class="hljs-variable">$info</span></span><span>[</span><span><span class="hljs-string">'description'</span></span><span>] = </span><span><span class="hljs-variable">$meta</span></span><span>[</span><span><span class="hljs-string">'description'</span></span><span>] ?? (</span><span><span class="hljs-variable">$meta</span></span><span>[</span><span><span class="hljs-string">'og:description'</span></span><span>] ?? </span><span><span class="hljs-literal">null</span></span><span>);

    </span><span><span class="hljs-comment">// 返回所有 meta 以便進一步使用</span></span><span>
    </span><span><span class="hljs-variable">$info</span></span><span>[</span><span><span class="hljs-string">'meta_all'</span></span><span>] = </span><span><span class="hljs-variable">$meta</span></span><span>;

    </span><span><span class="hljs-keyword">return</span></span><span> </span><span><span class="hljs-variable">$info</span></span><span>;
}

</span><span><span class="hljs-comment">// 使用例子:</span></span><span>
</span><span><span class="hljs-keyword">try</span></span><span> {
    </span><span><span class="hljs-variable">$url</span></span><span> = </span><span><span class="hljs-string">'https://example.com'</span></span><span>;
    </span><span><span class="hljs-variable">$info</span></span><span> = </span><span><span class="hljs-title function_ invoke__">fetch_page_info</span></span><span>(</span><span><span class="hljs-variable">$url</span></span><span>);
    </span><span><span class="hljs-title function_ invoke__">var_export</span></span><span>(</span><span><span class="hljs-variable">$info</span></span><span>);
} </span><span><span class="hljs-keyword">catch</span></span><span> (</span><span><span class="hljs-built_in">Exception</span></span><span> </span><span><span class="hljs-variable">$e</span></span><span>) {
    </span><span><span class="hljs-keyword">echo</span></span><span> </span><span><span class="hljs-string">"Error: "</span></span><span> . </span><span><span class="hljs-variable">$e</span></span><span>-&gt;</span><span><span class="hljs-title function_ invoke__">getMessage</span></span><span>();
}
</span></span>

4. 性能與緩存建議

  • 若需要批量抓取大量頁面,請不要每次都實時抓取同一URL。建議使用緩存(Redis、Memcached 或文件緩存)並設置合適的過期策略,例如1 小時或24 小時,視頁面更新頻率而定。

  • 並發抓取時控制並發數,避免被目標站點封禁或自己主機壓力過高。

  • 對於大型站點優先抓取首頁& 重要頁面,避免盲目抓取所有鏈接。


5. 補充技巧與註意事項

  • meta tags 寫法不統一:很多現代站點使用og:titletwitter:title ,這些都不在get_meta_tags()的目標範圍內,使用DOM 能一次性抓取全部類型。

  • meta 標籤重複:如果頁面中出現多個同名meta(可能用於多語言或版本控制),你的解析邏輯應決定是取第一個、合併還是全部保存。

  • meta 中的HTML 實體:注意對&&#123;等實體進行解碼( html_entity_decode() )。

  • robots/meta-refresh :如果需要處理meta refresh(重定向)或robots noindex,請專門檢查http-equiv和相應屬性。

  • 遵守robots.txt 與法律:抓取前請檢查目標站點robots.txt 與服務條款,尊重隱私與版權,不要抓取受限制內容。


6. 實用checklist(快速排查步驟)

  1. 確認你要抓取的是<meta name="keywords">還是<title> (兩者不同工具)。

  2. 如果是遠程抓取:先用curl獲取並打印原始HTML,查看meta 的具體寫法與編碼。

  3. 檢查charset ,若非UTF-8,先轉換再解析。

  4. get_meta_tags()無法提取,切換到DOMDocument ,並同時捕獲namepropertyhttp-equiv

  5. 處理HTTP 錯誤、重定向與反爬機制(適當設置UA、超時與重試策略)。

  6. 對重要頁面實現緩存,避免重複請求。


7. 總結

  • get_meta_tags()簡單易用,但只適用於標準且簡單的meta name="..."場景。它不會抓取<title>property類型的meta。

  • 面對複雜、非標准或非UTF-8 的頁面,推薦使用curl + DOMDocument的組合:更靈活、魯棒性更高。

  • 編碼、遠程請求失敗、反爬、meta 寫法不規範是常見故障點,按照上文的排查順序即可定位並修復大部分問題。

  • 若需處理需要JS 渲染的頁面(SPA、動態加載meta),則需要使用headless 瀏覽器或服務器端渲染的方案(超出PHP 原生範圍)。