Current Location: Home> Latest Articles> How to Use mb_encode_numericentity and mb_decode_numericentity for Data Encoding and Decoding?

How to Use mb_encode_numericentity and mb_decode_numericentity for Data Encoding and Decoding?

gitbox 2025-06-10

1. Introduction to mb_encode_numericentity Function

mb_encode_numericentity is used to encode characters in a string that fall within specified ranges into numeric entity format. Numeric entities resemble HTML’s &#xxxx;, where xxxx is the Unicode code point of the character.

Function prototype:

string mb_encode_numericentity(string $str, array $convmap, string $encoding = mb_internal_encoding(), bool $is_hex = false)
  • $str: The string to be encoded.

  • $convmap: Conversion map array specifying which character ranges to encode. It consists of groups of four integers representing start code point, end code point, offset, and mask.

  • $encoding: String encoding, defaults to the internal encoding.

  • $is_hex: If set to true, the numeric entities will be represented in hexadecimal.


2. Introduction to mb_decode_numericentity Function

mb_decode_numericentity converts strings containing numeric entities back into their original multibyte character strings.

Function prototype:

string mb_decode_numericentity(string $str, array $convmap, string $encoding = mb_internal_encoding())
  • $str: The string to be decoded.

  • $convmap: Conversion map array, same as in mb_encode_numericentity.

  • $encoding: String encoding.


3. Practical Example: Encoding and Decoding Chinese Characters

Suppose we want to encode a string containing Chinese characters, converting all Chinese characters into numeric entities, and then decode it back to the original string.

// String to encode
$str = "你好,世界!Hello World!";
// Encode: convert Chinese characters to numeric entities
$encoded = mb_encode_numericentity($str, $convmap, "UTF-8");
echo "Encoded:\n" . $encoded . "\n";
// Decode: convert numeric entities back to Chinese characters
$decoded = mb_decode_numericentity($encoded, $convmap, "UTF-8");
echo "Decoded:\n" . $decoded . "\n";
?>

Output:

Encoded:
你好,世界!Hello World!
Decoded:
你好,世界!Hello World!

4. Application in URL Scenarios

Sometimes, it's necessary to encode multibyte characters in URLs to ensure safe and standardized transmission. Here is a simple example encoding the path part of a URL into numeric entities, then decoding it back.

$url = "https://gitbox.net/路径/测试";
// Encode only the path part
$parsed = parse_url($url);
$encoded_path = mb_encode_numericentity(urldecode($parsed['path']), $convmap, "UTF-8");
// Construct the encoded URL
$encoded_url = $parsed['scheme'] . "://" . $parsed['host'] . $encoded_path;
echo "Encoded URL:\n" . $encoded_url . "\n";
// Decode the path
$decoded_path = mb_decode_numericentity($encoded_path, $convmap, "UTF-8");
// Restore full URL
$decoded_url = $parsed['scheme'] . "://" . $parsed['host'] . $decoded_path;
echo "Decoded URL:\n" . $decoded_url . "\n";
?>

Output:

Encoded URL:
https://gitbox.net/车路/测试
Decoded URL:
https://gitbox.net/路径/测试

5. Summary

  • mb_encode_numericentity converts characters within specified ranges into numeric entity encoding, effectively protecting multibyte characters from corruption during transmission or storage.

  • mb_decode_numericentity restores encoded numeric entities back to their corresponding multibyte characters.

  • Used together, these functions provide a convenient way to handle string encoding issues in multilingual environments, especially when dealing with URLs containing special characters or when safe data transmission is needed.

Mastering these functions can greatly improve the flexibility and security of handling multibyte strings.