当前位置: 首页> 最新文章列表> get_meta_tags 和 DOMDocument 的结合使用:如何解析网页并提取 Meta 信息?

get_meta_tags 和 DOMDocument 的结合使用:如何解析网页并提取 Meta 信息?

gitbox 2025-06-30

一、使用 get_meta_tags 提取 Meta 信息

PHP 内置的 get_meta_tags 函数是一个非常简单且方便的方法,用来从 HTML 页面中提取所有的 meta 信息。它直接返回一个关联数组,数组的键是 meta 标签的属性,值是相应的内容。该方法不需要额外的库或依赖,适合快速提取简单的 meta 数据。

示例代码:

<div class="contain-inline-size rounded-2xl relative bg-token-sidebar-surface-primary"><div class="flex items-center text-token-text-secondary px-4 py-2 text-xs font-sans justify-between h-9 bg-token-sidebar-surface-primary select-none rounded-t-2xl">php</div><div class="sticky top-9"><div class="absolute end-0 bottom-0 flex h-9 items-center pe-2"><div class="bg-token-bg-elevated-secondary text-token-text-secondary flex items-center gap-4 rounded-sm px-2 font-sans text-xs"><button class="flex gap-1 items-center select-none py-1" aria-label="复制"><svg width="20" height="20" viewBox="0 0 20 20" fill="currentColor" xmlns="http://www.w3.org/2000/svg" class="icon-xs"><path d="M12.668 10.667C12.668 9.95614 12.668 9.46258 12.6367 9.0791C12.6137 8.79732 12.5758 8.60761 12.5244 8.46387L12.4688 8.33399C12.3148 8.03193 12.0803 7.77885 11.793 7.60254L11.666 7.53125C11.508 7.45087 11.2963 7.39395 10.9209 7.36328C10.5374 7.33197 10.0439 7.33203 9.33301 7.33203H6.5C5.78896 7.33203 5.29563 7.33195 4.91211 7.36328C4.63016 7.38632 4.44065 7.42413 4.29688 7.47559L4.16699 7.53125C3.86488 7.68518 3.61186 7.9196 3.43555 8.20703L3.36524 8.33399C3.28478 8.49198 3.22795 8.70352 3.19727 9.0791C3.16595 9.46259 3.16504 9.95611 3.16504 10.667V13.5C3.16504 14.211 3.16593 14.7044 3.19727 15.0879C3.22797 15.4636 3.28473 15.675 3.36524 15.833L3.43555 15.959C3.61186 16.2466 3.86474 16.4807 4.16699 16.6348L4.29688 16.6914C4.44063 16.7428 4.63025 16.7797 4.91211 16.8027C5.29563 16.8341 5.78896 16.835 6.5 16.835H9.33301C10.0439 16.835 10.5374 16.8341 10.9209 16.8027C11.2965 16.772 11.508 16.7152 11.666 16.6348L11.793 16.5645C12.0804 16.3881 12.3148 16.1351 12.4688 15.833L12.5244 15.7031C12.5759 15.5594 12.6137 15.3698 12.6367 15.0879C12.6681 14.7044 12.668 14.211 12.668 13.5V10.667ZM13.998 12.665C14.4528 12.6634 14.8011 12.6602 15.0879 12.6367C15.4635 12.606 15.675 12.5492 15.833 12.4688L15.959 12.3975C16.2466 12.2211 16.4808 11.9682 16.6348 11.666L16.6914 11.5361C16.7428 11.3924 16.7797 11.2026 16.8027 10.9209C16.8341 10.5374 16.835 10.0439 16.835 9.33301V6.5C16.835 5.78896 16.8341 5.29563 16.8027 4.91211C16.7797 4.63025 16.7428 4.44063 16.6914 4.29688L16.6348 4.16699C16.4807 3.86474 16.2466 3.61186 15.959 3.43555L15.833 3.36524C15.675 3.28473 15.4636 3.22797 15.0879 3.19727C14.7044 3.16593 14.211 3.16504 13.5 3.16504H10.667C9.9561 3.16504 9.46259 3.16595 9.0791 3.19727C8.79739 3.22028 8.6076 3.2572 8.46387 3.30859L8.33399 3.36524C8.03176 3.51923 7.77886 3.75343 7.60254 4.04102L7.53125 4.16699C7.4508 4.32498 7.39397 4.53655 7.36328 4.91211C7.33985 5.19893 7.33562 5.54719 7.33399 6.00195H9.33301C10.022 6.00195 10.5791 6.00131 11.0293 6.03809C11.4873 6.07551 11.8937 6.15471 12.2705 6.34668L12.4883 6.46875C12.984 6.7728 13.3878 7.20854 13.6533 7.72949L13.7197 7.87207C13.8642 8.20859 13.9292 8.56974 13.9619 8.9707C13.9987 9.42092 13.998 9.97799 13.998 10.667V12.665ZM18.165 9.33301C18.165 10.022 18.1657 10.5791 18.1289 11.0293C18.0961 11.4302 18.0311 11.7914 17.8867 12.1279L17.8203 12.2705C17.5549 12.7914 17.1509 13.2272 16.6553 13.5313L16.4365 13.6533C16.0599 13.8452 15.6541 13.9245 15.1963 13.9619C14.8593 13.9895 14.4624 13.9935 13.9951 13.9951C13.9935 14.4624 13.9895 14.8593 13.9619 15.1963C13.9292 15.597 13.864 15.9576 13.7197 16.2939L13.6533 16.4365C13.3878 16.9576 12.9841 17.3941 12.4883 17.6982L12.2705 17.8203C11.8937 18.0123 11.4873 18.0915 11.0293 18.1289C10.5791 18.1657 10.022 18.165 9.33301 18.165H6.5C5.81091 18.165 5.25395 18.1657 4.80371 18.1289C4.40306 18.0962 4.04235 18.031 3.70606 17.8867L3.56348 17.8203C3.04244 17.5548 2.60585 17.151 2.30176 16.6553L2.17969 16.4365C1.98788 16.0599 1.90851 15.6541 1.87109 15.1963C1.83431 14.746 1.83496 14.1891 1.83496 13.5V10.667C1.83496 9.978 1.83432 9.42091 1.87109 8.9707C1.90851 8.5127 1.98772 8.10625 2.17969 7.72949L2.30176 7.51172C2.60586 7.0159 3.04236 6.6122 3.56348 6.34668L3.70606 6.28027C4.04237 6.136 4.40303 6.07083 4.80371 6.03809C5.14051 6.01057 5.53708 6.00551 6.00391 6.00391C6.00551 5.53708 6.01057 5.14051 6.03809 4.80371C6.0755 4.34588 6.15483 3.94012 6.34668 3.56348L6.46875 3.34473C6.77282 2.84912 7.20856 2.44514 7.72949 2.17969L7.87207 2.11328C8.20855 1.96886 8.56979 1.90385 8.9707 1.87109C9.42091 1.83432 9.978 1.83496 10.667 1.83496H13.5C14.1891 1.83496 14.746 1.83431 15.1963 1.87109C15.6541 1.90851 16.0599 1.98788 16.4365 2.17969L16.6553 2.30176C17.151 2.60585 17.5548 3.04244 17.8203 3.56348L17.8867 3.70606C18.031 4.04235 18.0962 4.40306 18.1289 4.80371C18.1657 5.25395 18.165 5.81091 18.165 6.5V9.33301Z"></path></svg>复制</button><span class="" data-state="closed"><button class="flex items-center gap-1 py-1 select-none"><svg width="20" height="20" viewBox="0 0 20 20" fill="currentColor" xmlns="http://www.w3.org/2000/svg" class="icon-xs"><path d="M12.0303 4.11328C13.4406 2.70317 15.7275 2.70305 17.1377 4.11328C18.5474 5.52355 18.5476 7.81057 17.1377 9.2207L10.8457 15.5117C10.522 15.8354 10.2868 16.0723 10.0547 16.2627L9.82031 16.4395C9.61539 16.5794 9.39783 16.7003 9.1709 16.7998L8.94141 16.8916C8.75976 16.9582 8.57206 17.0072 8.35547 17.0518L7.59082 17.1865L5.19727 17.5859C5.05455 17.6097 4.90286 17.6358 4.77441 17.6455C4.67576 17.653 4.54196 17.6555 4.39648 17.6201L4.24707 17.5703C4.02415 17.4746 3.84119 17.3068 3.72559 17.0957L3.67969 17.0029C3.59322 16.8013 3.59553 16.6073 3.60547 16.4756C3.61519 16.3473 3.6403 16.1963 3.66406 16.0537L4.06348 13.6602C4.1638 13.0582 4.22517 12.6732 4.3584 12.3096L4.45117 12.0791C4.55073 11.8521 4.67152 11.6346 4.81152 11.4297L4.9873 11.1953C5.17772 10.9632 5.4146 10.728 5.73828 10.4043L12.0303 4.11328ZM6.67871 11.3447C6.32926 11.6942 6.14542 11.8803 6.01953 12.0332L5.90918 12.1797C5.81574 12.3165 5.73539 12.4618 5.66895 12.6133L5.60742 12.7666C5.52668 12.9869 5.48332 13.229 5.375 13.8789L4.97656 16.2725L4.97559 16.2744H4.97852L7.37207 15.875L8.08887 15.749C8.25765 15.7147 8.37336 15.6839 8.4834 15.6436L8.63672 15.5811C8.78817 15.5146 8.93356 15.4342 9.07031 15.3408L9.2168 15.2305C9.36965 15.1046 9.55583 14.9207 9.90527 14.5713L14.8926 9.58301L11.666 6.35742L6.67871 11.3447ZM16.1963 5.05371C15.3054 4.16304 13.8616 4.16305 12.9707 5.05371L12.6074 5.41602L15.833 8.64258L16.1963 8.2793C17.0869 7.38845 17.0869 5.94456 16.1963 5.05371Z"></path><path d="M4.58301 1.7832C4.72589 1.7832 4.84877 1.88437 4.87695 2.02441C4.99384 2.60873 5.22432 3.11642 5.58398 3.50391C5.94115 3.88854 6.44253 4.172 7.13281 4.28711C7.27713 4.3114 7.38267 4.43665 7.38281 4.58301C7.38281 4.7295 7.27723 4.8546 7.13281 4.87891C6.44249 4.99401 5.94116 5.27746 5.58398 5.66211C5.26908 6.00126 5.05404 6.43267 4.92676 6.92676L4.87695 7.1416C4.84891 7.28183 4.72601 7.38281 4.58301 7.38281C4.44013 7.38267 4.31709 7.28173 4.28906 7.1416C4.17212 6.55728 3.94179 6.04956 3.58203 5.66211C3.22483 5.27757 2.72347 4.99395 2.0332 4.87891C1.88897 4.85446 1.7832 4.72938 1.7832 4.58301C1.78335 4.43673 1.88902 4.3115 2.0332 4.28711C2.72366 4.17203 3.22481 3.88861 3.58203 3.50391C3.94186 3.11638 4.17214 2.60888 4.28906 2.02441L4.30371 1.97363C4.34801 1.86052 4.45804 1.78333 4.58301 1.7832Z"></path></svg>编辑</button></span></div></div></div><div class="overflow-y-auto p-4" dir="ltr"><?php
// 指定要解析的网页 URL
$url = 'https://www.example.com';

// 使用 get_meta_tags 获取 meta 信息
$metaTags = get_meta_tags($url);

// 输出所有的 meta 信息
echo '<pre>';
print_r($metaTags);
echo '
'; ?>

输出结果:

<span><span><span class="hljs-title function_ invoke__">Array</span></span><span>
(
    [description] => Example Domain
    [keywords] => example, domain
    [author] => Example Author
)
</span></span>

get_meta_tags 函数能够获取 标签中的 nameproperty 属性,如果网页中存在 这样的标签,它将会提取并返回相应的内容。不过,get_meta_tags 也有一定的限制,它仅能解析常见的 meta 标签,且不支持复杂的 HTML 结构或特殊属性。


二、使用 DOMDocument 提取 Meta 信息

相比 get_meta_tagsDOMDocument 提供了更强大和灵活的功能。它能够处理更加复杂和多样化的 HTML 结构,并且能够针对每一个标签进行更精确的操作。DOMDocument 是 PHP 内建的一个类,允许你通过文档对象模型(DOM)来解析 HTML 和 XML 内容。

示例代码:

<span><span><span class="hljs-meta"><?php</span></span><span>
</span><span><span class="hljs-comment">// 指定要解析的网页 URL</span></span><span>
</span><span><span class="hljs-variable">$url</span></span><span> = </span><span><span class="hljs-string">'https://www.example.com'</span></span><span>;

</span><span><span class="hljs-comment">// 获取网页内容</span></span><span>
</span><span><span class="hljs-variable">$htmlContent</span></span><span> = </span><span><span class="hljs-title function_ invoke__">file_get_contents</span></span><span>(</span><span><span class="hljs-variable">$url</span></span><span>);

</span><span><span class="hljs-comment">// 创建 DOMDocument 对象</span></span><span>
</span><span><span class="hljs-variable">$dom</span></span><span> = </span><span><span class="hljs-keyword">new</span></span><span> </span><span><span class="hljs-title class_">DOMDocument</span></span><span>();

</span><span><span class="hljs-comment">// 关闭警告输出(HTML 可能不符合标准)</span></span><span>
</span><span><span class="hljs-title function_ invoke__">libxml_use_internal_errors</span></span><span>(</span><span><span class="hljs-literal">true</span></span><span>);

</span><span><span class="hljs-comment">// 加载 HTML 内容</span></span><span>
</span><span><span class="hljs-variable">$dom</span></span><span>-></span><span><span class="hljs-title function_ invoke__">loadHTML</span></span><span>(</span><span><span class="hljs-variable">$htmlContent</span></span><span>);

</span><span><span class="hljs-comment">// 获取所有的 <meta> 标签</span></span><span>
</span><span><span class="hljs-variable">$metaTags</span></span><span> = </span><span><span class="hljs-variable">$dom</span></span><span>-></span><span><span class="hljs-title function_ invoke__">getElementsByTagName</span></span><span>(</span><span><span class="hljs-string">'meta'</span></span><span>);

</span><span><span class="hljs-comment">// 遍历所有的 meta 标签并提取内容</span></span><span>
</span><span><span class="hljs-keyword">foreach</span></span><span> (</span><span><span class="hljs-variable">$metaTags</span></span><span> </span><span><span class="hljs-keyword">as</span></span><span> </span><span><span class="hljs-variable">$meta</span></span><span>) {
    </span><span><span class="hljs-comment">// 获取 meta 标签的 name 和 content 属性</span></span><span>
    </span><span><span class="hljs-variable">$name</span></span><span> = </span><span><span class="hljs-variable">$meta</span></span><span>-></span><span><span class="hljs-title function_ invoke__">getAttribute</span></span><span>(</span><span><span class="hljs-string">'name'</span></span><span>);
    </span><span><span class="hljs-variable">$property</span></span><span> = </span><span><span class="hljs-variable">$meta</span></span><span>-></span><span><span class="hljs-title function_ invoke__">getAttribute</span></span><span>(</span><span><span class="hljs-string">'property'</span></span><span>);
    </span><span><span class="hljs-variable">$content</span></span><span> = </span><span><span class="hljs-variable">$meta</span></span><span>-></span><span><span class="hljs-title function_ invoke__">getAttribute</span></span><span>(</span><span><span class="hljs-string">'content'</span></span><span>);

    </span><span><span class="hljs-keyword">if</span></span><span> (</span><span><span class="hljs-variable">$name</span></span><span> || </span><span><span class="hljs-variable">$property</span></span><span>) {
        </span><span><span class="hljs-keyword">echo</span></span><span> </span><span><span class="hljs-string">"Name/Property: "</span></span><span> . (</span><span><span class="hljs-variable">$name</span></span><span> ?: </span><span><span class="hljs-variable">$property</span></span><span>) . </span><span><span class="hljs-string">"<br>"</span></span><span>;
        </span><span><span class="hljs-keyword">echo</span></span><span> </span><span><span class="hljs-string">"Content: "</span></span><span> . </span><span><span class="hljs-variable">$content</span></span><span> . </span><span><span class="hljs-string">"<br><br>"</span></span><span>;
    }
}
</span><span><span class="hljs-meta">?></span></span><span>
</span></span>

输出结果:

<span><span>Name/Property: description
Content: Example Domain

Name/Property: keywords
Content: example, domain
</span></span>

通过使用 DOMDocument,我们能够更加细致地操作网页中的每一个元素。在上面的例子中,我们首先加载了 HTML 页面,然后通过 getElementsByTagName('meta') 获取所有的 标签,接着提取出标签中的 namepropertycontent 属性。无论是标准的 meta name 还是 Open Graph(OG)协议中的 meta property,都能够一并获取。


三、get_meta_tagsDOMDocument 的对比

虽然 get_meta_tagsDOMDocument 都能帮助我们提取网页的 meta 信息,但它们在使用场景和功能上各有不同。

优缺点对比:

特性get_meta_tagsDOMDocument
简单性非常简单,代码量少相对复杂,需要手动遍历元素
灵活性只支持常见的 meta 标签支持所有 HTML 元素,功能更强大
性能性能较好,适合快速提取处理复杂页面时性能较差
错误处理无法处理不规则的 HTML支持处理错误并提供详细的调试信息
扩展性功能有限,扩展性差可以处理更复杂的 HTML 和 XML 格式

适用场景:

  • get_meta_tags:适合简单的页面,尤其是当我们只需要提取基础的 meta 信息(如 description、keywords 等)时,它更加高效。

  • DOMDocument:适合处理结构复杂的 HTML 页面,特别是需要灵活提取不同标签信息,或者处理含有嵌套标签、脚本、样式等的页面时。