strip_tags()函數的作用是去除字符串中的所有HTML 和PHP 標籤。它的基本語法如下:
<span><span><span class="hljs-title function_ invoke__">strip_tags</span></span><span>(</span><span><span class="hljs-keyword">string</span></span><span> </span><span><span class="hljs-variable">$str</span></span><span>, </span><span><span class="hljs-keyword">string</span></span><span>|</span><span><span class="hljs-literal">null</span></span><span> </span><span><span class="hljs-variable">$allowable_tags</span></span><span> = </span><span><span class="hljs-literal">null</span></span><span>): </span><span><span class="hljs-keyword">string</span></span><span>
</span></span>
$str : 需要處理的字符串。
$allowable_tags : 一個可選參數,指定允許保留的標籤。如果不指定,則默認移除所有標籤。
例子:
<span><span><span class="hljs-variable">$html</span></span><span> = </span><span><span class="hljs-string">'<p>Hello <b>world</b>!</p>'</span></span><span>;
</span><span><span class="hljs-variable">$clean_text</span></span><span> = </span><span><span class="hljs-title function_ invoke__">strip_tags</span></span><span>(</span><span><span class="hljs-variable">$html</span></span><span>);
</span><span><span class="hljs-keyword">echo</span></span><span> </span><span><span class="hljs-variable">$clean_text</span></span><span>; </span><span><span class="hljs-comment">// 輸出:Hello world!</span></span><span>
</span></span>
如上所示, strip_tags()默認會移除所有HTML標籤。那當字符串中包含嵌套標籤時,如何確保移除正確且不出錯呢?
在處理複雜的HTML結構時, strip_tags()的行為需要特別關注。如果HTML結構不規範或標籤嵌套過深,直接使用strip_tags()可能無法完全達到預期效果。比如,嵌套標籤可能會導致部分標籤沒有被完全移除,或者處理後的字符串格式不如預期。
嵌套的HTML標籤有時可能會不符合規範,導致strip_tags()無法正常工作。要解決這個問題,首先應確保HTML代碼是規範的。可以使用PHP 的DOMDocument類來加載並規範化HTML結構。
<span><span><span class="hljs-variable">$html</span></span><span> = </span><span><span class="hljs-string">'<div><b>Hello <i>world</i></b>!</div>'</span></span><span>;
</span><span><span class="hljs-variable">$dom</span></span><span> = </span><span><span class="hljs-keyword">new</span></span><span> </span><span><span class="hljs-title class_">DOMDocument</span></span><span>();
</span><span><span class="hljs-title function_ invoke__">libxml_use_internal_errors</span></span><span>(</span><span><span class="hljs-literal">true</span></span><span>); </span><span><span class="hljs-comment">// 忽略HTML格式錯誤</span></span><span>
</span><span><span class="hljs-variable">$dom</span></span><span>-></span><span><span class="hljs-title function_ invoke__">loadHTML</span></span><span>(</span><span><span class="hljs-variable">$html</span></span><span>);
</span><span><span class="hljs-variable">$clean_html</span></span><span> = </span><span><span class="hljs-variable">$dom</span></span><span>-></span><span><span class="hljs-title function_ invoke__">saveHTML</span></span><span>();
</span><span><span class="hljs-variable">$clean_text</span></span><span> = </span><span><span class="hljs-title function_ invoke__">strip_tags</span></span><span>(</span><span><span class="hljs-variable">$clean_html</span></span><span>); </span><span><span class="hljs-comment">// 使用strip_tags移除標籤</span></span><span>
</span><span><span class="hljs-keyword">echo</span></span><span> </span><span><span class="hljs-variable">$clean_text</span></span><span>; </span><span><span class="hljs-comment">// 輸出:Hello world!</span></span><span>
</span></span>
通過DOMDocument ,我們可以先加載並修復HTML代碼,再使用strip_tags()函數清理標籤。
如果只需要保留某些標籤,可以通過第二個參數來指定允許保留的標籤。例如,如果你只希望保留<b>和<i>標籤,其他標籤全部移除:
<span><span><span class="hljs-variable">$html</span></span><span> = </span><span><span class="hljs-string">'<p><b>Hello <i>world</i>!</b></p>'</span></span><span>;
</span><span><span class="hljs-variable">$clean_text</span></span><span> = </span><span><span class="hljs-title function_ invoke__">strip_tags</span></span><span>(</span><span><span class="hljs-variable">$html</span></span><span>, </span><span><span class="hljs-string">'<b><i>'</span></span><span>);
</span><span><span class="hljs-keyword">echo</span></span><span> </span><span><span class="hljs-variable">$clean_text</span></span><span>; </span><span><span class="hljs-comment">// 輸出:Hello <i>world</i>!</span></span><span>
</span></span>
這樣, strip_tags()會移除所有不在允許列表中的標籤,只保留<b>和<i>標籤,避免其他標籤干擾。
有時,僅依賴strip_tags()可能不夠精細,特別是在處理複雜的HTML結構時。此時,我們可以結合正則表達式進一步清理字符串,移除嵌套標籤或其他不必要的部分。
<span><span><span class="hljs-variable">$html</span></span><span> = </span><span><span class="hljs-string">'<div><b>Hello <i>world</i></b>!</div>'</span></span><span>;
</span><span><span class="hljs-variable">$clean_text</span></span><span> = </span><span><span class="hljs-title function_ invoke__">strip_tags</span></span><span>(</span><span><span class="hljs-variable">$html</span></span><span>, </span><span><span class="hljs-string">'<b><i>'</span></span><span>); </span><span><span class="hljs-comment">// 首先移除不需要的標籤</span></span><span>
</span><span><span class="hljs-variable">$clean_text</span></span><span> = </span><span><span class="hljs-title function_ invoke__">preg_replace</span></span><span>(</span><span><span class="hljs-string">'/<[^>]+>/'</span></span><span>, </span><span><span class="hljs-string">''</span></span><span>, </span><span><span class="hljs-variable">$clean_text</span></span><span>); </span><span><span class="hljs-comment">// 再使用正則移除剩餘的HTML標籤</span></span><span>
</span><span><span class="hljs-keyword">echo</span></span><span> </span><span><span class="hljs-variable">$clean_text</span></span><span>; </span><span><span class="hljs-comment">// 輸出:Hello world!</span></span><span>
</span></span>
這種方法可以讓你對清理標籤的過程更加精細。
相關標籤:
HTML