strip_tags() 函数的作用是去除字符串中的所有 HTML 和 PHP 标签。它的基本语法如下:
<span><span><span class="hljs-title function_ invoke__">strip_tags</span></span><span>(</span><span><span class="hljs-keyword">string</span></span><span> </span><span><span class="hljs-variable">$str</span></span><span>, </span><span><span class="hljs-keyword">string</span></span><span>|</span><span><span class="hljs-literal">null</span></span><span> </span><span><span class="hljs-variable">$allowable_tags</span></span><span> = </span><span><span class="hljs-literal">null</span></span><span>): </span><span><span class="hljs-keyword">string</span></span><span>
</span></span>
$str: 需要处理的字符串。
$allowable_tags: 一个可选参数,指定允许保留的标签。如果不指定,则默认移除所有标签。
例子:
<span><span><span class="hljs-variable">$html</span></span><span> = </span><span><span class="hljs-string">'<p>Hello <b>world</b>!</p>'</span></span><span>;
</span><span><span class="hljs-variable">$clean_text</span></span><span> = </span><span><span class="hljs-title function_ invoke__">strip_tags</span></span><span>(</span><span><span class="hljs-variable">$html</span></span><span>);
</span><span><span class="hljs-keyword">echo</span></span><span> </span><span><span class="hljs-variable">$clean_text</span></span><span>; </span><span><span class="hljs-comment">// 输出:Hello world!</span></span><span>
</span></span>
如上所示,strip_tags() 默认会移除所有HTML标签。那当字符串中包含嵌套标签时,如何确保移除正确且不出错呢?
在处理复杂的HTML结构时,strip_tags() 的行为需要特别关注。如果HTML结构不规范或标签嵌套过深,直接使用 strip_tags() 可能无法完全达到预期效果。比如,嵌套标签可能会导致部分标签没有被完全移除,或者处理后的字符串格式不如预期。
嵌套的HTML标签有时可能会不符合规范,导致 strip_tags() 无法正常工作。要解决这个问题,首先应确保HTML代码是规范的。可以使用 PHP 的 DOMDocument 类来加载并规范化HTML结构。
<span><span><span class="hljs-variable">$html</span></span><span> = </span><span><span class="hljs-string">'<div><b>Hello <i>world</i></b>!</div>'</span></span><span>;
</span><span><span class="hljs-variable">$dom</span></span><span> = </span><span><span class="hljs-keyword">new</span></span><span> </span><span><span class="hljs-title class_">DOMDocument</span></span><span>();
</span><span><span class="hljs-title function_ invoke__">libxml_use_internal_errors</span></span><span>(</span><span><span class="hljs-literal">true</span></span><span>); </span><span><span class="hljs-comment">// 忽略HTML格式错误</span></span><span>
</span><span><span class="hljs-variable">$dom</span></span><span>-></span><span><span class="hljs-title function_ invoke__">loadHTML</span></span><span>(</span><span><span class="hljs-variable">$html</span></span><span>);
</span><span><span class="hljs-variable">$clean_html</span></span><span> = </span><span><span class="hljs-variable">$dom</span></span><span>-></span><span><span class="hljs-title function_ invoke__">saveHTML</span></span><span>();
</span><span><span class="hljs-variable">$clean_text</span></span><span> = </span><span><span class="hljs-title function_ invoke__">strip_tags</span></span><span>(</span><span><span class="hljs-variable">$clean_html</span></span><span>); </span><span><span class="hljs-comment">// 使用strip_tags移除标签</span></span><span>
</span><span><span class="hljs-keyword">echo</span></span><span> </span><span><span class="hljs-variable">$clean_text</span></span><span>; </span><span><span class="hljs-comment">// 输出:Hello world!</span></span><span>
</span></span>
通过 DOMDocument,我们可以先加载并修复HTML代码,再使用 strip_tags() 函数清理标签。
如果只需要保留某些标签,可以通过第二个参数来指定允许保留的标签。例如,如果你只希望保留 <b> 和 <i> 标签,其他标签全部移除:
<span><span><span class="hljs-variable">$html</span></span><span> = </span><span><span class="hljs-string">'<p><b>Hello <i>world</i>!</b></p>'</span></span><span>;
</span><span><span class="hljs-variable">$clean_text</span></span><span> = </span><span><span class="hljs-title function_ invoke__">strip_tags</span></span><span>(</span><span><span class="hljs-variable">$html</span></span><span>, </span><span><span class="hljs-string">'<b><i>'</span></span><span>);
</span><span><span class="hljs-keyword">echo</span></span><span> </span><span><span class="hljs-variable">$clean_text</span></span><span>; </span><span><span class="hljs-comment">// 输出:Hello <i>world</i>!</span></span><span>
</span></span>
这样,strip_tags() 会移除所有不在允许列表中的标签,只保留 <b> 和 <i> 标签,避免其他标签干扰。
有时,仅依赖 strip_tags() 可能不够精细,特别是在处理复杂的HTML结构时。此时,我们可以结合正则表达式进一步清理字符串,移除嵌套标签或其他不必要的部分。
<span><span><span class="hljs-variable">$html</span></span><span> = </span><span><span class="hljs-string">'<div><b>Hello <i>world</i></b>!</div>'</span></span><span>;
</span><span><span class="hljs-variable">$clean_text</span></span><span> = </span><span><span class="hljs-title function_ invoke__">strip_tags</span></span><span>(</span><span><span class="hljs-variable">$html</span></span><span>, </span><span><span class="hljs-string">'<b><i>'</span></span><span>); </span><span><span class="hljs-comment">// 首先移除不需要的标签</span></span><span>
</span><span><span class="hljs-variable">$clean_text</span></span><span> = </span><span><span class="hljs-title function_ invoke__">preg_replace</span></span><span>(</span><span><span class="hljs-string">'/<[^>]+>/'</span></span><span>, </span><span><span class="hljs-string">''</span></span><span>, </span><span><span class="hljs-variable">$clean_text</span></span><span>); </span><span><span class="hljs-comment">// 再使用正则移除剩余的HTML标签</span></span><span>
</span><span><span class="hljs-keyword">echo</span></span><span> </span><span><span class="hljs-variable">$clean_text</span></span><span>; </span><span><span class="hljs-comment">// 输出:Hello world!</span></span><span>
</span></span>
这种方法可以让你对清理标签的过程更加精细。
相关标签:
HTML