Current Location: Home> Latest Articles> How to Properly Configure the $convmap Parameter in the mb_encode_numericentity Function? Detailed Guide with Tips and Precautions

How to Properly Configure the $convmap Parameter in the mb_encode_numericentity Function? Detailed Guide with Tips and Precautions

gitbox 2025-08-21

1. Overview of the mb_encode_numericentity Function

The basic syntax of the mb_encode_numericentity function is as follows:

<span><span><span class="hljs-title function_ invoke__">mb_encode_numericentity</span></span><span>(</span><span><span class="hljs-keyword">string</span></span><span> </span><span><span class="hljs-variable">$str</span></span>, </span><span><span class="hljs-keyword">array</span></span><span> </span><span><span class="hljs-variable">$convmap</span></span>, </span><span><span class="hljs-keyword">string</span></span> </span><span><span class="hljs-variable">$from_encoding</span></span>, </span><span><span class="hljs-keyword">string</span></span> </span><span><span class="hljs-variable">$to_encoding</span></span>): </span><span><span class="hljs-keyword">string</span></span><span>
</span></span>
  • $str: The string to be converted.

  • $convmap: The character conversion mapping rules, defined as an array specifying which characters should be converted into numeric entities.

  • $from_encoding: The source character set (e.g., UTF-8, ISO-8859-1, etc.).

  • $to_encoding: The target character set (e.g., UTF-8, ISO-8859-1, etc.).

This function converts characters in the string into HTML or XML numeric entities according to the rules specified in the $convmap parameter. The configuration of the $convmap array is critical for the conversion results.


2. Detailed Explanation of the $convmap Parameter

The $convmap parameter is an array composed of subarrays, each with 4 elements. Each subarray defines a conversion range or rule. Its structure looks like this:

<span><span><span class="hljs-variable">$convmap</span></span><span> = [
    [from_char_code, to_char_code, from_charset, to_charset],
    </span><span><span class="hljs-comment">// more rules</span></span><span>
];
</span></span>

2.1 Explanation of Subarray Elements

  • From Char Code: The starting character code for numeric entity conversion, usually an integer representing the character’s position in the source charset.

  • To Char Code: The ending character code for conversion, defining the range.

  • From Charset: Specifies the encoding of the input string.

  • To Charset: Specifies the encoding of the converted characters, typically UTF-8.

For example, when configuring $convmap, you can define character ranges to control which characters will be converted into numeric entities.


3. Tips for Configuring the $convmap Parameter

Properly configuring $convmap allows you to precisely control conversion rules. Below are some useful tips:

3.1 Conversion for Specific Characters

If you only want certain characters to be converted into numeric entities, you can define specific character code ranges. For instance, if you want to convert all non-ASCII characters, configure $convmap to cover characters outside the ASCII range.

<span><span><span class="hljs-variable">$convmap</span></span><span> = [
    [</span><span><span class="hljs-number">0x80</span></span>, </span><span><span class="hljs-number">0x10FFFF</span></span>, </span><span><span class="hljs-string">'UTF-8'</span></span>, </span><span><span class="hljs-string">'UTF-8'</span></span>]
];
</span></span>

This configuration means all characters with UTF-8 encoding values greater than or equal to 0x80 (non-ASCII) will be converted into numeric entities.

3.2 Conversion of HTML Special Characters

If you’re working with HTML content and want to convert certain special symbols (such as <, >, &), you can set the appropriate ranges in $convmap.

<span><span><span class="hljs-variable">$convmap</span></span><span> = [
    [</span><span><span class="hljs-number">0x20</span></span>, </span><span><span class="hljs-number">0x2F</span></span>, </span><span><span class="hljs-string">'UTF-8'</span></span>, </span><span><span class="hljs-string">'UTF-8'</span></span>],  </span><span><span class="hljs-comment">// Convert ASCII punctuation</span></span>
    [</span><span><span class="hljs-number">0x3A</span></span>, </span><span><span class="hljs-number">0x40</span></span>, </span><span><span class="hljs-string">'UTF-8'</span></span>, </span><span><span class="hljs-string">'UTF-8'</span></span>],  </span><span><span class="hljs-comment">// Convert colon through @ symbol</span></span>
];
</span></span>

With this setup, all matching characters will be converted into their numeric entity equivalents.

3.3 Using Unicode Ranges

For Unicode characters, you can define broader ranges to ensure multilingual characters and special symbols are properly converted. This is especially useful when dealing with multilingual text.

<span><span><span class="hljs-variable">$convmap</span></span><span> = [
    [</span><span><span class="hljs-number">0x3000</span></span>, </span><span><span class="hljs-number">0x303F</span></span>, </span><span><span class="hljs-string">'UTF-8'</span></span>, </span><span><span class="hljs-string">'UTF-8'</span></span>],  </span><span><span class="hljs-comment">// Convert CJK symbols and punctuation</span></span>
];
</span></span>

This configuration converts all Unicode characters between 0x3000 and 0x303F into numeric entities, covering symbols used in Chinese, Japanese, Korean, and other languages.


4. Important Notes When Using mb_encode_numericentity

Although mb_encode_numericentity is powerful, there are several important considerations:

4.1 Choosing the Right Encoding

Ensure the specified encodings ($from_encoding and $to_encoding) are correct. For instance, if the source string is in UTF-8 but the target is ISO-8859-1, you must explicitly set them to prevent garbled text or incorrect conversions.

4.2 Coverage of Conversion Ranges

When defining $convmap, ensure the character ranges provide adequate coverage. Too narrow a range may leave some characters unconverted, while overly broad ranges may unnecessarily affect other characters. It’s best to tailor ranges according to actual requirements.

4.3 Performance Concerns

For very large strings or multiple charset conversions, mb_encode_numericentity may impact performance. Consider splitting the conversion into smaller units or limiting conversion to specific character sets to avoid unnecessary processing.

4.4 Compatibility Issues

Support for mb_encode_numericentity may vary across PHP versions and environments. Ensure that the MBString extension is properly installed and that its version supports the features you need.


5. Practical Example

Here’s a real-world example demonstrating how to use mb_encode_numericentity to process text containing special characters:

<span><span><span class="hljs-variable">$str</span></span><span> = </span><span><span class="hljs-string">"This is a test string containing &lt;, &gt;, and &amp; symbols."</span></span><span>;
</span><span><span class="hljs-variable">$convmap</span></span><span> = [
    [</span><span><span class="hljs-number">0x80</span></span>, </span><span><span class="hljs-number">0x10FFFF</span></span>, </span><span><span class="hljs-string">'UTF-8'</span></span>, </span><span><span class="hljs-string">'UTF-8'</span></span>]  </span><span><span class="hljs-comment">// Convert all non-ASCII characters</span></span>
];
</span><span><span class="hljs-variable">$result</span></span><span> = </span><span><span class="hljs-title function_ invoke__">mb_encode_numericentity</span></span><span>(</span><span><span class="hljs-variable">$str</span></span>, </span><span><span class="hljs-variable">$convmap</span></span>, </span><span><span class="hljs-string">'UTF-8'</span></span>, </span><span><span class="hljs-string">'UTF-8'</span></span>);
</span><span><span class="hljs-keyword">echo</span></span><span> </span><span><span class="hljs-variable">$result</span></span>;
</span></span>

The output will be all non-ASCII characters converted into numeric entities, such as:

<span><span>This is a test string containing </span><span><span class="hljs-selector-tag">&amp;</span></span><span><span class="hljs-selector-id">#60</span></span><span>;, </span><span><span class="hljs-selector-tag">&amp;</span></span><span><span class="hljs-selector-id">#62</span></span><span>; and </span><span><span class="hljs-selector-tag">&amp;</span></span><span><span class="hljs-selector-id">#38</span></span><span>; symbols.
</span></span>

This method is very useful for preventing XSS attacks, HTML rendering errors, and similar issues.