How to Use mb_encode_numericentity with Regular Expressions to Handle Specific Characters or Text?

gitbox 2025-09-11

<?php
// Some unrelated prelude code
<span class="hljs-keyword">function dummyFunction() {
    return "This is just prelude code, unrelated to the article content";
}
$dummyVar = dummyFunction();
<p>?></p>
<p><hr></p>
<p><?php<br>
echo "<h1>How to Use mb_encode_numericentity with Regular Expressions to Handle Specific Characters or Text?</h1>";</p>
<p>echo <span><span class="hljs-string">"<p>When working with multibyte characters (such as Chinese, Japanese, or Korean), PHP provides <code>mb_encode_numericentity

echo "

2. Using Regular Expressions to Match Specific Characters

echo "

With regular expressions, we can filter the text we care about. For example, to match only Chinese characters:

echo "

<br>
$str = 'Hello 测试 World 中文';<br>
preg_match_all('/[\x{4e00}-\x{9fff}]+/u', $str, $matches);<br>
print_r($matches[0]); // Array ( [0] => 测试 [1] => 中文 )<br>

echo "

3. Combining mb_encode_numericentity with Regular Expressions

echo "

If we only want to convert matched Chinese characters to numeric entities:

echo "

<br>
$convmap = [0x4e00, 0x9fff, 0, 0xFFFF];<br>
$str = 'Hello 测试 World 中文';</p>
<p>// Use regex to match<br>
preg_match_all('/[\x{4e00}-\x{9fff}]+/u', $str, $matches);</p>
<p>// Iterate matches and replace with entities<br>
foreach ($matches[0] as $match) {<br>
$encoded = mb_encode_numericentity($match, $convmap, 'UTF-8');<br>
$str = str_replace($match, $encoded, $str);<br>
}</p>
<p>echo $str; // Hello 测试 World 中文<br>

echo "

4. Practical Applications

echo "

The combination of mb_encode_numericentity and regular expressions is ideal for:

echo "

;

echo "

Safely escaping specific characters in HTML output to prevent garbled text or XSS.

;

echo "

Converting only certain language characters when handling multilingual content.

;

echo "

Converting specific characters to a standardized format for text analysis or storage.

;

echo "

Conclusion

echo "

By filtering specific characters with regular expressions and then converting them using mb_encode_numericentity, we can precisely control which characters are encoded, enabling safer and more reliable text handling in multibyte environments.

<?php // Some unrelated footer code $footerVar = "This is just unrelated footer code"; function footerFunction() { return "Footer function example"; } ?>

mb_encode_numericentity