當前位置: 首頁> 最新文章列表> get_meta_tags 和DOMDocument 的結合使用:如何解析網頁並提取Meta 信息?

get_meta_tags 和DOMDocument 的結合使用:如何解析網頁並提取Meta 信息?

gitbox 2025-06-30

一、使用get_meta_tags提取Meta 信息

PHP 內置的get_meta_tags函數是一個非常簡單且方便的方法,用來從HTML 頁面中提取所有的meta 信息。它直接返回一個關聯數組,數組的鍵是meta 標籤的屬性,值是相應的內容。該方法不需要額外的庫或依賴,適合快速提取簡單的meta 數據。

示例代碼:

<div class="contain-inline-size rounded-2xl relative bg-token-sidebar-surface-primary"><div class="flex items-center text-token-text-secondary px-4 py-2 text-xs font-sans justify-between h-9 bg-token-sidebar-surface-primary select-none rounded-t-2xl"> php </div><div class="sticky top-9"><div class="absolute end-0 bottom-0 flex h-9 items-center pe-2"><div class="bg-token-bg-elevated-secondary text-token-text-secondary flex items-center gap-4 rounded-sm px-2 font-sans text-xs"><button class="flex gap-1 items-center select-none py-1" aria-label="複製"><svg width="20" height="20" viewBox="0 0 20 20" fill="currentColor" xmlns="http://www.w3.org/2000/svg" class="icon-xs"><path d="M12.668 10.667C12.668 9.95614 12.668 9.46258 12.6367 9.0791C12.6137 8.79732 12.5758 8.60761 12.5244 8.46387L12.4688 8.33399C12.3148 8.03193 12.0803 7.77885 11.793 7.60254L11.666 7.53125C11.508 7.45087 11.2963 7.39395 10.9209 7.36328C10.5374 7.33197 10.0439 7.33203 9.33301 7.33203H6.5C5.78896 7.33203 5.29563 7.33195 4.91211 7.36328C4.63016 7.38632 4.44065 7.42413 4.29688 7.47559L4.16699 7.53125C3.86488 7.68518 3.61186 7.9196 3.43555 8.20703L3.36524 8.33399C3.28478 8.49198 3.22795 8.70352 3.19727 9.0791C3.16595 9.46259 3.16504 9.95611 3.16504 10.667V13.5C3.16504 14.211 3.16593 14.7044 3.19727 15.0879C3.22797 15.4636 3.28473 15.675 3.36524 15.833L3.43555 15.959C3.61186 16.2466 3.86474 16.4807 4.16699 16.6348L4.29688 16.6914C4.44063 16.7428 4.63025 16.7797 4.91211 16.8027C5.29563 16.8341 5.78896 16.835 6.5 16.835H9.33301C10.0439 16.835 10.5374 16.8341 10.9209 16.8027C11.2965 16.772 11.508 16.7152 11.666 16.6348L11.793 16.5645C12.0804 16.3881 12.3148 16.1351 12.4688 15.833L12.5244 15.7031C12.5759 15.5594 12.6137 15.3698 12.6367 15.0879C12.6681 14.7044 12.668 14.211 12.668 13.5V10.667ZM13.998 12.665C14.4528 12.6634 14.8011 12.6602 15.0879 12.6367C15.4635 12.606 15.675 12.5492 15.833 12.4688L15.959 12.3975C16.2466 12.2211 16.4808 11.9682 16.6348 11.666L16.6914 11.5361C16.7428 11.3924 16.7797 11.2026 16.8027 10.9209C16.8341 10.5374 16.835 10.0439 16.835 9.33301V6.5C16.835 5.78896 16.8341 5.29563 16.8027 4.91211C16.7797 4.63025 16.7428 4.44063 16.6914 4.29688L16.6348 4.16699C16.4807 3.86474 16.2466 3.61186 15.959 3.43555L15.833 3.36524C15.675 3.28473 15.4636 3.22797 15.0879 3.19727C14.7044 3.16593 14.211 3.16504 13.5 3.16504H10.667C9.9561 3.16504 9.46259 3.16595 9.0791 3.19727C8.79739 3.22028 8.6076 3.2572 8.46387 3.30859L8.33399 3.36524C8.03176 3.51923 7.77886 3.75343 7.60254 4.04102L7.53125 4.16699C7.4508 4.32498 7.39397 4.53655 7.36328 4.91211C7.33985 5.19893 7.33562 5.54719 7.33399 6.00195H9.33301C10.022 6.00195 10.5791 6.00131 11.0293 6.03809C11.4873 6.07551 11.8937 6.15471 12.2705 6.34668L12.4883 6.46875C12.984 6.7728 13.3878 7.20854 13.6533 7.72949L13.7197 7.87207C13.8642 8.20859 13.9292 8.56974 13.9619 8.9707C13.9987 9.42092 13.998 9.97799 13.998 10.667V12.665ZM18.165 9.33301C18.165 10.022 18.1657 10.5791 18.1289 11.0293C18.0961 11.4302 18.0311 11.7914 17.8867 12.1279L17.8203 12.2705C17.5549 12.7914 17.1509 13.2272 16.6553 13.5313L16.4365 13.6533C16.0599 13.8452 15.6541 13.9245 15.1963 13.9619C14.8593 13.9895 14.4624 13.9935 13.9951 13.9951C13.9935 14.4624 13.9895 14.8593 13.9619 15.1963C13.9292 15.597 13.864 15.9576 13.7197 16.2939L13.6533 16.4365C13.3878 16.9576 12.9841 17.3941 12.4883 17.6982L12.2705 17.8203C11.8937 18.0123 11.4873 18.0915 11.0293 18.1289C10.5791 18.1657 10.022 18.165 9.33301 18.165H6.5C5.81091 18.165 5.25395 18.1657 4.80371 18.1289C4.40306 18.0962 4.04235 18.031 3.70606 17.8867L3.56348 17.8203C3.04244 17.5548 2.60585 17.151 2.30176 16.6553L2.17969 16.4365C1.98788 16.0599 1.90851 15.6541 1.87109 15.1963C1.83431 14.746 1.83496 14.1891 1.83496 13.5V10.667C1.83496 9.978 1.83432 9.42091 1.87109 8.9707C1.90851 8.5127 1.98772 8.10625 2.17969 7.72949L2.30176 7.51172C2.60586 7.0159 3.04236 6.6122 3.56348 6.34668L3.70606 6.28027C4.04237 6.136 4.40303 6.07083 4.80371 6.03809C5.14051 6.01057 5.53708 6.00551 6.00391 6.00391C6.00551 5.53708 6.01057 5.14051 6.03809 4.80371C6.0755 4.34588 6.15483 3.94012 6.34668 3.56348L6.46875 3.34473C6.77282 2.84912 7.20856 2.44514 7.72949 2.17969L7.87207 2.11328C8.20855 1.96886 8.56979 1.90385 8.9707 1.87109C9.42091 1.83432 9.978 1.83496 10.667 1.83496H13.5C14.1891 1.83496 14.746 1.83431 15.1963 1.87109C15.6541 1.90851 16.0599 1.98788 16.4365 2.17969L16.6553 2.30176C17.151 2.60585 17.5548 3.04244 17.8203 3.56348L17.8867 3.70606C18.031 4.04235 18.0962 4.40306 18.1289 4.80371C18.1657 5.25395 18.165 5.81091 18.165 6.5V9.33301Z"></path></svg>複製<span class="" data-state="closed"><button class="flex items-center gap-1 py-1 select-none"></span> <svg width="20" height="20" viewBox="0 0 20 20" fill="currentColor" xmlns="http://www.w3.org/2000/svg" class="icon-xs"><path d="M12.0303 4.11328C13.4406 2.70317 15.7275 2.70305 17.1377 4.11328C18.5474 5.52355 18.5476 7.81057 17.1377 9.2207L10.8457 15.5117C10.522 15.8354 10.2868 16.0723 10.0547 16.2627L9.82031 16.4395C9.61539 16.5794 9.39783 16.7003 9.1709 16.7998L8.94141 16.8916C8.75976 16.9582 8.57206 17.0072 8.35547 17.0518L7.59082 17.1865L5.19727 17.5859C5.05455 17.6097 4.90286 17.6358 4.77441 17.6455C4.67576 17.653 4.54196 17.6555 4.39648 17.6201L4.24707 17.5703C4.02415 17.4746 3.84119 17.3068 3.72559 17.0957L3.67969 17.0029C3.59322 16.8013 3.59553 16.6073 3.60547 16.4756C3.61519 16.3473 3.6403 16.1963 3.66406 16.0537L4.06348 13.6602C4.1638 13.0582 4.22517 12.6732 4.3584 12.3096L4.45117 12.0791C4.55073 11.8521 4.67152 11.6346 4.81152 11.4297L4.9873 11.1953C5.17772 10.9632 5.4146 10.728 5.73828 10.4043L12.0303 4.11328ZM6.67871 11.3447C6.32926 11.6942 6.14542 11.8803 6.01953 12.0332L5.90918 12.1797C5.81574 12.3165 5.73539 12.4618 5.66895 12.6133L5.60742 12.7666C5.52668 12.9869 5.48332 13.229 5.375 13.8789L4.97656 16.2725L4.97559 16.2744H4.97852L7.37207 15.875L8.08887 15.749C8.25765 15.7147 8.37336 15.6839 8.4834 15.6436L8.63672 15.5811C8.78817 15.5146 8.93356 15.4342 9.07031 15.3408L9.2168 15.2305C9.36965 15.1046 9.55583 14.9207 9.90527 14.5713L14.8926 9.58301L11.666 6.35742L6.67871 11.3447ZM16.1963 5.05371C15.3054 4.16304 13.8616 4.16305 12.9707 5.05371L12.6074 5.41602L15.833 8.64258L16.1963 8.2793C17.0869 7.38845 17.0869 5.94456 16.1963 5.05371Z"></path><path d="M4.58301 1.7832C4.72589 1.7832 4.84877 1.88437 4.87695 2.02441C4.99384 2.60873 5.22432 3.11642 5.58398 3.50391C5.94115 3.88854 6.44253 4.172 7.13281 4.28711C7.27713 4.3114 7.38267 4.43665 7.38281 4.58301C7.38281 4.7295 7.27723 4.8546 7.13281 4.87891C6.44249 4.99401 5.94116 5.27746 5.58398 5.66211C5.26908 6.00126 5.05404 6.43267 4.92676 6.92676L4.87695 7.1416C4.84891 7.28183 4.72601 7.38281 4.58301 7.38281C4.44013 7.38267 4.31709 7.28173 4.28906 7.1416C4.17212 6.55728 3.94179 6.04956 3.58203 5.66211C3.22483 5.27757 2.72347 4.99395 2.0332 4.87891C1.88897 4.85446 1.7832 4.72938 1.7832 4.58301C1.78335 4.43673 1.88902 4.3115 2.0332 4.28711C2.72366 4.17203 3.22481 3.88861 3.58203 3.50391C3.94186 3.11638 4.17214 2.60888 4.28906 2.02441L4.30371 1.97363C4.34801 1.86052 4.45804 1.78333 4.58301 1.7832Z"></path></svg>編輯</div></div></div><div class="overflow-y-auto p-4" dir="ltr"><?php
// 指定要解析的網頁 URL
$url = 'https://www.example.com';

// 使用 get_meta_tags 獲取 meta 資訊
$metaTags = get_meta_tags($url);

// 輸出所有的 meta 資訊
echo '<pre>';
print_r($metaTags);
echo '
'; ?>

輸出結果:

 <span><span><span class="hljs-title function_ invoke__">Array</span></span><span>
(
    [description] => Example Domain
    [keywords] => example, domain
    [author] => Example Author
)
</span></span>

get_meta_tags函數能夠獲取標籤中的nameproperty屬性,如果網頁中存在這樣的標籤,它將會提取並返回相應的內容。不過, get_meta_tags也有一定的限制,它僅能解析常見的meta 標籤,且不支持複雜的HTML 結構或特殊屬性。


二、使用DOMDocument提取Meta 信息

相比get_meta_tagsDOMDocument提供了更強大和靈活的功能。它能夠處理更加複雜和多樣化的HTML 結構,並且能夠針對每一個標籤進行更精確的操作。 DOMDocument是PHP 內建的一個類,允許你通過文檔對像模型(DOM)來解析HTML 和XML 內容。

示例代碼:

 <span><span><span class="hljs-meta"><?php</span></span><span>
</span><span><span class="hljs-comment">// 指定要解析的網頁 URL</span></span><span>
</span><span><span class="hljs-variable">$url</span></span><span> = </span><span><span class="hljs-string">'https://www.example.com'</span></span><span>;

</span><span><span class="hljs-comment">// 獲取网页內容</span></span><span>
</span><span><span class="hljs-variable">$htmlContent</span></span><span> = </span><span><span class="hljs-title function_ invoke__">file_get_contents</span></span><span>(</span><span><span class="hljs-variable">$url</span></span><span>);

</span><span><span class="hljs-comment">// 創建 DOMDocument 對象</span></span><span>
</span><span><span class="hljs-variable">$dom</span></span><span> = </span><span><span class="hljs-keyword">new</span></span><span> </span><span><span class="hljs-title class_">DOMDocument</span></span><span>();

</span><span><span class="hljs-comment">// 關閉警告輸出(HTML 可能不符合標準)</span></span><span>
</span><span><span class="hljs-title function_ invoke__">libxml_use_internal_errors</span></span><span>(</span><span><span class="hljs-literal">true</span></span><span>);

</span><span><span class="hljs-comment">// 載入 HTML 內容</span></span><span>
</span><span><span class="hljs-variable">$dom</span></span><span>-></span><span><span class="hljs-title function_ invoke__">loadHTML</span></span><span>(</span><span><span class="hljs-variable">$htmlContent</span></span><span>);

</span><span><span class="hljs-comment">// 獲取所有的 <meta> 標籤</span></span><span>
</span><span><span class="hljs-variable">$metaTags</span></span><span> = </span><span><span class="hljs-variable">$dom</span></span><span>-></span><span><span class="hljs-title function_ invoke__">getElementsByTagName</span></span><span>(</span><span><span class="hljs-string">'meta'</span></span><span>);

</span><span><span class="hljs-comment">// 遍歷所有的 meta 標籤并提取內容</span></span><span>
</span><span><span class="hljs-keyword">foreach</span></span><span> (</span><span><span class="hljs-variable">$metaTags</span></span><span> </span><span><span class="hljs-keyword">as</span></span><span> </span><span><span class="hljs-variable">$meta</span></span><span>) {
    </span><span><span class="hljs-comment">// 獲取 meta 標籤的 name 和 content 屬性</span></span><span>
    </span><span><span class="hljs-variable">$name</span></span><span> = </span><span><span class="hljs-variable">$meta</span></span><span>-></span><span><span class="hljs-title function_ invoke__">getAttribute</span></span><span>(</span><span><span class="hljs-string">'name'</span></span><span>);
    </span><span><span class="hljs-variable">$property</span></span><span> = </span><span><span class="hljs-variable">$meta</span></span><span>-></span><span><span class="hljs-title function_ invoke__">getAttribute</span></span><span>(</span><span><span class="hljs-string">'property'</span></span><span>);
    </span><span><span class="hljs-variable">$content</span></span><span> = </span><span><span class="hljs-variable">$meta</span></span><span>-></span><span><span class="hljs-title function_ invoke__">getAttribute</span></span><span>(</span><span><span class="hljs-string">'content'</span></span><span>);

    </span><span><span class="hljs-keyword">if</span></span><span> (</span><span><span class="hljs-variable">$name</span></span><span> || </span><span><span class="hljs-variable">$property</span></span><span>) {
        </span><span><span class="hljs-keyword">echo</span></span><span> </span><span><span class="hljs-string">"Name/Property: "</span></span><span> . (</span><span><span class="hljs-variable">$name</span></span><span> ?: </span><span><span class="hljs-variable">$property</span></span><span>) . </span><span><span class="hljs-string">"<br>"</span></span><span>;
        </span><span><span class="hljs-keyword">echo</span></span><span> </span><span><span class="hljs-string">"Content: "</span></span><span> . </span><span><span class="hljs-variable">$content</span></span><span> . </span><span><span class="hljs-string">"<br><br>"</span></span><span>;
    }
}
</span><span><span class="hljs-meta">?></span></span><span>
</span></span>

輸出結果:

 <span><span>Name/Property: description
Content: Example Domain

Name/Property: keywords
Content: example, domain
</span></span>

通過使用DOMDocument ,我們能夠更加細緻地操作網頁中的每一個元素。在上面的例子中,我們首先加載了HTML 頁面,然後通過getElementsByTagName('meta')獲取所有的標籤,接著提取出標籤中的namepropertycontent屬性。無論是標準的meta name還是Open Graph(OG)協議中的meta property ,都能夠一併獲取。


三、 get_meta_tagsDOMDocument的對比

雖然get_meta_tagsDOMDocument都能幫助我們提取網頁的meta 信息,但它們在使用場景和功能上各有不同。

優缺點對比:

特性get_meta_tags DOMDocument
簡單性非常簡單,代碼量少相對複雜,需要手動遍曆元素
靈活性只支持常見的meta 標籤支持所有HTML 元素,功能更強大
性能性能較好,適合快速提取處理複雜頁面時性能較差
錯誤處理無法處理不規則的HTML支持處理錯誤並提供詳細的調試信息
擴展性功能有限,擴展性差可以處理更複雜的HTML 和XML 格式

適用場景:

  • get_meta_tags :適合簡單的頁面,尤其是當我們只需要提取基礎的meta 信息(如description、keywords 等)時,它更加高效。

  • DOMDocument :適合處理結構複雜的HTML 頁面,特別是需要靈活提取不同標籤信息,或者處理含有嵌套標籤、腳本、樣式等的頁面時。