In PHP, the mb_decode_numericalentity function is used to convert numeric entities in a string into corresponding characters. This is very useful when dealing with multibyte character encoding, such as converting text containing HTML entities. Correctly setting its range array is the key to ensuring that the function works properly.
string mb_decode_numericentity ( string $string , array $map , string $encoding = mb_internal_encoding() )
$string : The string to be converted.
$map : Range array, defining the conversion range and offset of numerical entities.
$encoding : character encoding, default to internal encoding.
A range array is a one-dimensional array, and its length must be multiples of 4. Each 4 elements represent a conversion range:
[
start_codepoint, end_codepoint, offset, mask,
start_codepoint, end_codepoint, offset, mask,
...
]
start_codepoint : Start Unicode code point (decimal).
end_codepoint : End Unicode code point (decimal).
offset : The offset (usually 0) applied to the numerical entity value.
mask : mask, used for bit operations, usually 0xFFFF or 0xFFFFFFF.
Note: mb_decode_numericality will only convert characters whose numerical entity code points fall within the start_codepoint to end_codepoint range.
Suppose the Unicode range covered by the numerical entity we want to convert is the basic multitext plane (BMP), i.e. from 0x0 to 0xFFFF:
$map = [
0x0, 0xFFFF, 0, 0xFFFF
];
Here's the explanation:
Converts numerical entities from 0 to 65535 for all Unicode code points.
Offset 0 means that the original numerical entity code point is not adjusted.
The mask 0xFFFF is used to ensure that the conversion result is limited to the 16-bit range.
<?php
// The string to be converted,Contains numerical entities
$input = "Hello 你好!"; // “Hello Hello!”
// Set the conversion range array,ConvertBMPThe entity of the scope
$map = [0x0, 0xFFFF, 0, 0xFFFF];
// usemb_decode_numericentityConvert
$output = mb_decode_numericentity($input, $map, 'UTF-8');
echo $output; // Output:Hello Hello!
?>
The range array must be a multiple of 4 , otherwise the function will be invalid.
Offset is generally set to 0 unless otherwise required.
The mask is usually 0xFFFF (16 bits) or 0xFFFFFFF (32 bits), depending on the Unicode range.
If the numerical entity is out of range, the function does not convert it.
The range array of mb_decode_numericalentity determines which numerical entities are converted.
Set reasonable start and end code points and masks to ensure that the target characters can be parsed correctly.
For most ordinary scenes, set [0x0, 0xFFFF, 0, 0xFFF] to satisfy the BMP range character conversion.
Mastering the range array settings allows you to flexibly handle various multi-byte encoded character entities, avoiding garbled code and parsing errors.