When processing multi-byte strings (such as UTF-8 encoded Chinese, Japanese, Korean, etc.), ordinary string functions often fail to correctly identify the boundaries of characters, resulting in incorrect operation results. This is especially true when performing reverse string operations. This article will explore how to obtain the encoding information of a multibyte string through mb_get_info() , and combine other mb_series functions to achieve the correct inversion of the string.
PHP's built-in functions such as strrev() can only handle ASCII characters, and will be reversed byte byte for multi-byte characters (such as "you" in Chinese is 3 bytes in UTF-8), resulting in garbled code. Example:
$str = "Hello,world";
echo strrev($str); // Output garbled code
The reason is that strrev() does not know how many bytes a "character" is.
PHP's mbstring extension provides a collection of functions that handle multibyte strings. We can use mb_get_info() to confirm the current multibyte configuration, and combine mb_strlen() and mb_substr() to achieve safe string inversion.
$info = mb_get_info();
print_r($info);
This will return an array including internal encoding ( internal_encoding ), HTTP input/output encoding, etc.:
Array
(
[internal_encoding] => UTF-8
[http_input] => pass
[http_output] => pass
...
)
We can safely invert a string based on the current encoding information:
function mb_strrev($str, $encoding = null) {
if ($encoding === null) {
$encoding = mb_internal_encoding();
}
$length = mb_strlen($str, $encoding);
$reversed = '';
for ($i = $length - 1; $i >= 0; $i--) {
$reversed .= mb_substr($str, $i, 1, $encoding);
}
return $reversed;
}
$str = "Hello,world";
echo mb_strrev($str); // Output:World,OK you
In this example, we use mb_internal_encoding() (which is determined by the internal_encoding field provided by mb_get_info() ) to ensure that the correct character encoding is used.
If you are dealing with strings from different sources (such as user uploads), the encoding may not be uniform. You can use mb_detect_encoding() combined with mb_convert_encoding() to convert:
$str = file_get_contents('https://gitbox.net/data.txt');
$encoding = mb_detect_encoding($str, mb_detect_order(), true);
if ($encoding !== 'UTF-8') {
$str = mb_convert_encoding($str, 'UTF-8', $encoding);
}
echo mb_strrev($str);
In this way, no matter whether the user uploads GB2312, BIG5 or UTF-8 encoded text, it can be converted into unified encoding and then reversed.
mb_get_info() itself does not directly participate in string inversion operations, but it provides critical encoding information, allowing us to select appropriate mb_function parameters. By correctly obtaining and setting the encoding, combining mb_strlen() and mb_substr() , we can safely and reliably invert multibyte strings.
This is especially important when applying internationally, processing user input, or building systems for the Asian market. If you are building such a project, be sure to enable mbstring extension and always pay attention to the acquisition and use of encoded information.