When handling Chinese or other multibyte character texts, ordinary string replacement functions often fail to correctly recognize and process double-byte characters, especially in sensitive word filtering scenarios. The PHP mb_eregi_replace function is a multibyte-safe regex replacement function that ignores case and correctly handles multibyte characters, making it ideal for replacing sensitive words containing double-byte characters in text.
Below is an example demonstrating how to use mb_eregi_replace to replace sensitive words in text.
<?php
// Set internal character encoding to UTF-8 to ensure multibyte string functions work properly
mb_internal_encoding("UTF-8");
<p>// Original text containing Chinese sensitive words<br>
$text = "This is a test text containing sensitive words: 敏感词 and 不良内容.";</p>
<p>// Sensitive word list (supports regex patterns)<br>
$sensitiveWords = [<br>
"敏感词",<br>
"不良内容"<br>
];</p>
<p>// Replace sensitive words with ***<br>
foreach ($sensitiveWords as $word) {<br>
// Use mb_eregi_replace for case-insensitive replacement<br>
$text = mb_eregi_replace($word, "***", $text);<br>
}</p>
<p>echo $text;<br>
?><br>
Output:
This is a test text containing sensitive words: *** and ***.
Multibyte Safety
mb_eregi_replace is the case-insensitive version of mb_ereg_replace, designed to handle multibyte encoded strings, avoiding issues where regular expressions fail to recognize Chinese, Japanese, and other characters.
Character Encoding Setup
You need to call mb_internal_encoding("UTF-8") first or ensure the script’s default encoding is UTF-8 to guarantee proper functioning of multibyte string functions.
Sensitive Word Matching
Supports regular expressions, allowing flexible definitions of sensitive word rules such as fuzzy matching or stem matching.
If there are many sensitive words, you can load the list from a database or file and then loop through for replacements. It can also be integrated with user input filtering to perform real-time sensitive word replacement to ensure content security.