mb_decode_numericentity() is a multibyte string handling function (provided by the mbstring extension). Its role is to decode numeric HTML entities in strings (such as Ӓ) into their corresponding characters. Unlike the traditional html_entity_decode() function, mb_decode_numericentity() supports more character sets and handles multibyte characters more effectively.
The function prototype is as follows:
<span><span><span class="hljs-title function_ invoke__">mb_decode_numericentity</span></span><span>(</span><span><span class="hljs-keyword">string</span></span><span> </span><span><span class="hljs-variable">$string</span></span>, </span><span><span class="hljs-keyword">array</span></span><span> </span><span><span class="hljs-variable">$map</span></span>, </span><span><span class="hljs-keyword">string</span></span><span> </span><span><span class="hljs-variable">$encoding</span></span>): </span><span><span class="hljs-keyword">string</span></span>
</span></span>
$string: The string to be decoded.
$map: An array defining the range of numeric entities to decode.
$encoding: Specifies the character encoding (e.g., UTF-8 or ISO-8859-1).
Suppose you have a string containing HTML entity encoding like this:
<span><span><span class="hljs-variable">$str</span></span><span> = </span><span><span class="hljs-string">"Hello &#20844;&#22909;!"</span></span><span>;
</span></span>
This string contains the HTML entity encoding for the Chinese characters "你好". Now we want to decode it back to the original characters.
<span><span><span class="hljs-comment">// Decode HTML numeric entities</span></span>
</span><span><span class="hljs-variable">$decoded_str</span></span><span> = </span><span><span class="hljs-title function_ invoke__">mb_decode_numericentity</span></span><span>(</span><span><span class="hljs-variable">$str</span></span><span>, </span><span><span class="hljs-keyword">array</span></span><span>(</span><span><span class="hljs-number">0x80</span></span><span>, </span><span><span class="hljs-number">0x10FFFF</span></span><span>, </span><span><span class="hljs-number">0</span></span><span>, </span><span><span class="hljs-number">0xFFFF</span></span><span>), </span><span><span class="hljs-string">'UTF-8'</span></span><span>);
<p></span>echo $decoded_str; // Output: Hello 你好!<br>
</span></span>
In this example, we use the mb_decode_numericentity() function and pass in a character range array [0x80, 0x10FFFF, 0, 0xFFFF], which covers all valid Unicode character ranges. The decoded string is "Hello 你好!".
The $map parameter defines the range of numeric entities to decode. It is a four-element array with the following format:
<span><span><span class="hljs-keyword">array</span></span><span>(</span><span><span class="hljs-variable">$from</span></span><span>, </span><span><span class="hljs-variable">$to</span></span><span>, </span><span><span class="hljs-variable">$from2</span></span><span>, </span><span><span class="hljs-variable">$to2</span></span><span>);
</span></span>
$from and $to: The start and end values of the first range.
$from2 and $to2: The start and end values of the second range (if any).
In practice, it’s common to decode all HTML entities by using a large character range covering all Unicode characters.
mb_decode_numericentity() supports multiple character encodings. You can choose the encoding format based on your needs. If your application targets a multilingual environment, it’s recommended to use UTF-8 encoding, which handles character sets for all languages.
<span><span><span class="hljs-variable">$str</span></span><span> = </span><span><span class="hljs-string">"&#20844;&#22909; &#12371;&#12395;&#12385;"</span></span><span>; </span><span><span class="hljs-comment">// 你好 こんにちは</span></span><span>
</span><span><span class="hljs-variable">$decoded_str</span></span><span> = </span><span><span class="hljs-title function_ invoke__">mb_decode_numericentity</span></span><span>(</span><span><span class="hljs-variable">$str</span></span><span>, </span><span><span class="hljs-keyword">array</span></span><span>(</span><span><span class="hljs-number">0x80</span></span><span>, </span><span><span class="hljs-number">0x10FFFF</span></span><span>, </span><span><span class="hljs-number">0</span></span><span>, </span><span><span class="hljs-number">0xFFFF</span></span><span>), </span><span><span class="hljs-string">'UTF-8'</span></span><span>);
</span><span><span class="hljs-keyword">echo</span></span><span> </span><span><span class="hljs-variable">$decoded_str</span></span><span>; </span><span><span class="hljs-comment">// Output: 你好 こんにちは</span></span><span>
</span></span>
Besides numeric entities (like {), HTML entities can also appear as character names (like <). mb_decode_numericentity() is mainly used for numeric entities, but if your string contains named character entities, you may need to use the html_entity_decode() function in combination.
<span><span><span class="hljs-variable">$str</span></span><span> = </span><span><span class="hljs-string">"Hello &lt;b&gt;World&lt;/b&gt;!"</span></span><span>;
</span><span><span class="hljs-variable">$decoded_str</span></span><span> = </span><span><span class="hljs-title function_ invoke__">html_entity_decode</span></span><span>(</span><span><span class="hljs-variable">$str</span></span><span>, ENT_QUOTES, </span><span><span class="hljs-string">'UTF-8'</span></span><span>);
</span><span><span class="hljs-keyword">echo</span></span><span> </span><span><span class="hljs-variable">$decoded_str</span></span><span>; </span><span><span class="hljs-comment">// Output: Hello <b>World</b>!</span></span><span>
</span></span>
mb_decode_numericentity() requires the mbstring extension, so make sure it is installed and enabled in your PHP environment before using it.
This function mainly decodes numeric entities; named character entities require other methods.
The character encoding must match the encoding actually used in the string, otherwise garbled characters may appear.
mb_decode_numericentity() is a very useful tool, especially when handling strings with special characters. It allows us to easily decode HTML numeric entities and restore the original characters. Whether for multilingual support or converting HTML entity encodings, mb_decode_numericentity() helps us manage character data effectively.
By properly using this function, we can better process and display data containing special characters in PHP applications, improving user experience and system stability.