Current Location: Home> Latest Articles> How to Decode Strings with Special Characters? Let mb_decode_numericentity Help

How to Decode Strings with Special Characters? Let mb_decode_numericentity Help

gitbox 2025-07-09

1. What is the mb_decode_numericentity() function?

mb_decode_numericentity() is a multibyte string handling function (provided by the mbstring extension). Its role is to decode numeric HTML entities in strings (such as Ӓ) into their corresponding characters. Unlike the traditional html_entity_decode() function, mb_decode_numericentity() supports more character sets and handles multibyte characters more effectively.

The function prototype is as follows:

<span><span><span class="hljs-title function_ invoke__">mb_decode_numericentity</span></span><span>(</span><span><span class="hljs-keyword">string</span></span><span> </span><span><span class="hljs-variable">$string</span></span>, </span><span><span class="hljs-keyword">array</span></span><span> </span><span><span class="hljs-variable">$map</span></span>, </span><span><span class="hljs-keyword">string</span></span><span> </span><span><span class="hljs-variable">$encoding</span></span>): </span><span><span class="hljs-keyword">string</span></span>
</span></span>
  • $string: The string to be decoded.

  • $map: An array defining the range of numeric entities to decode.

  • $encoding: Specifies the character encoding (e.g., UTF-8 or ISO-8859-1).

2. How to Use mb_decode_numericentity() to Decode Strings?

Basic Usage Example:

Suppose you have a string containing HTML entity encoding like this:

<span><span><span class="hljs-variable">$str</span></span><span> = </span><span><span class="hljs-string">"Hello &amp;#20844;&amp;#22909;!"</span></span><span>;
</span></span>

This string contains the HTML entity encoding for the Chinese characters "你好". Now we want to decode it back to the original characters.

<span><span><span class="hljs-comment">// Decode HTML numeric entities</span></span>
</span><span><span class="hljs-variable">$decoded_str</span></span><span> = </span><span><span class="hljs-title function_ invoke__">mb_decode_numericentity</span></span><span>(</span><span><span class="hljs-variable">$str</span></span><span>, </span><span><span class="hljs-keyword">array</span></span><span>(</span><span><span class="hljs-number">0x80</span></span><span>, </span><span><span class="hljs-number">0x10FFFF</span></span><span>, </span><span><span class="hljs-number">0</span></span><span>, </span><span><span class="hljs-number">0xFFFF</span></span><span>), </span><span><span class="hljs-string">&#039;UTF-8&#039;</span></span><span>);
<p></span>echo $decoded_str; // Output: Hello 你好!<br>
</span></span>

In this example, we use the mb_decode_numericentity() function and pass in a character range array [0x80, 0x10FFFF, 0, 0xFFFF], which covers all valid Unicode character ranges. The decoded string is "Hello 你好!".

3. Explanation of the $map Parameter

The $map parameter defines the range of numeric entities to decode. It is a four-element array with the following format:

<span><span><span class="hljs-keyword">array</span></span><span>(</span><span><span class="hljs-variable">$from</span></span><span>, </span><span><span class="hljs-variable">$to</span></span><span>, </span><span><span class="hljs-variable">$from2</span></span><span>, </span><span><span class="hljs-variable">$to2</span></span><span>);
</span></span>
  • $from and $to: The start and end values of the first range.

  • $from2 and $to2: The start and end values of the second range (if any).

In practice, it’s common to decode all HTML entities by using a large character range covering all Unicode characters.

4. Decoding Specific Character Sets

mb_decode_numericentity() supports multiple character encodings. You can choose the encoding format based on your needs. If your application targets a multilingual environment, it’s recommended to use UTF-8 encoding, which handles character sets for all languages.

<span><span><span class="hljs-variable">$str</span></span><span> = </span><span><span class="hljs-string">"&amp;#20844;&amp;#22909; &amp;#12371;&amp;#12395;&amp;#12385;"</span></span><span>; </span><span><span class="hljs-comment">// 你好 こんにちは</span></span><span>
</span><span><span class="hljs-variable">$decoded_str</span></span><span> = </span><span><span class="hljs-title function_ invoke__">mb_decode_numericentity</span></span><span>(</span><span><span class="hljs-variable">$str</span></span><span>, </span><span><span class="hljs-keyword">array</span></span><span>(</span><span><span class="hljs-number">0x80</span></span><span>, </span><span><span class="hljs-number">0x10FFFF</span></span><span>, </span><span><span class="hljs-number">0</span></span><span>, </span><span><span class="hljs-number">0xFFFF</span></span><span>), </span><span><span class="hljs-string">&#039;UTF-8&#039;</span></span><span>);
</span><span><span class="hljs-keyword">echo</span></span><span> </span><span><span class="hljs-variable">$decoded_str</span></span><span>; </span><span><span class="hljs-comment">// Output: 你好 こんにちは</span></span><span>
</span></span>

5. Handling Different Types of HTML Entities

Besides numeric entities (like {), HTML entities can also appear as character names (like <). mb_decode_numericentity() is mainly used for numeric entities, but if your string contains named character entities, you may need to use the html_entity_decode() function in combination.

<span><span><span class="hljs-variable">$str</span></span><span> = </span><span><span class="hljs-string">"Hello &amp;lt;b&amp;gt;World&amp;lt;/b&amp;gt;!"</span></span><span>;
</span><span><span class="hljs-variable">$decoded_str</span></span><span> = </span><span><span class="hljs-title function_ invoke__">html_entity_decode</span></span><span>(</span><span><span class="hljs-variable">$str</span></span><span>, ENT_QUOTES, </span><span><span class="hljs-string">&#039;UTF-8&#039;</span></span><span>);
</span><span><span class="hljs-keyword">echo</span></span><span> </span><span><span class="hljs-variable">$decoded_str</span></span><span>; </span><span><span class="hljs-comment">// Output: Hello <b>World</b>!</span></span><span>
</span></span>

6. Precautions

  • mb_decode_numericentity() requires the mbstring extension, so make sure it is installed and enabled in your PHP environment before using it.

  • This function mainly decodes numeric entities; named character entities require other methods.

  • The character encoding must match the encoding actually used in the string, otherwise garbled characters may appear.

7. Summary

mb_decode_numericentity() is a very useful tool, especially when handling strings with special characters. It allows us to easily decode HTML numeric entities and restore the original characters. Whether for multilingual support or converting HTML entity encodings, mb_decode_numericentity() helps us manage character data effectively.

By properly using this function, we can better process and display data containing special characters in PHP applications, improving user experience and system stability.