How to provide encoding support in string inversion operation?

gitbox 2025-05-11

When processing multi-byte strings (such as UTF-8 encoded Chinese, Japanese, Korean, etc.), ordinary string functions often fail to correctly identify the boundaries of characters, resulting in incorrect operation results. This is especially true when performing reverse string operations. This article will explore how to obtain the encoding information of a multibyte string through mb_get_info() , and combine other mb_series functions to achieve the correct inversion of the string.

The challenge of multi-byte strings

PHP's built-in functions such as strrev() can only handle ASCII characters, and will be reversed byte byte for multi-byte characters (such as "you" in Chinese is 3 bytes in UTF-8), resulting in garbled code. Example:

 $str = "Hello，world";
echo strrev($str); // Output garbled code

The reason is that strrev() does not know how many bytes a "character" is.

Solution: Use the mbstring function

PHP's mbstring extension provides a collection of functions that handle multibyte strings. We can use mb_get_info() to confirm the current multibyte configuration, and combine mb_strlen() and mb_substr() to achieve safe string inversion.

1. Get the current multibyte environment

 $info = mb_get_info();
print_r($info);

This will return an array including internal encoding ( internal_encoding ), HTTP input/output encoding, etc.:

 Array
(
    [internal_encoding] => UTF-8
    [http_input] => pass
    [http_output] => pass
    ...
)

2. String inversion function

We can safely invert a string based on the current encoding information:

 function mb_strrev($str, $encoding = null) {
    if ($encoding === null) {
        $encoding = mb_internal_encoding();
    }
    
    $length = mb_strlen($str, $encoding);
    $reversed = '';
    
    for ($i = $length - 1; $i >= 0; $i--) {
        $reversed .= mb_substr($str, $i, 1, $encoding);
    }
    
    return $reversed;
}

$str = "Hello，world";
echo mb_strrev($str); // Output：World，OK you

In this example, we use mb_internal_encoding() (which is determined by the internal_encoding field provided by mb_get_info() ) to ensure that the correct character encoding is used.

Dynamic encoding support

If you are dealing with strings from different sources (such as user uploads), the encoding may not be uniform. You can use mb_detect_encoding() combined with mb_convert_encoding() to convert:

 $str = file_get_contents('https://gitbox.net/data.txt');
$encoding = mb_detect_encoding($str, mb_detect_order(), true);

if ($encoding !== 'UTF-8') {
    $str = mb_convert_encoding($str, 'UTF-8', $encoding);
}

echo mb_strrev($str);

In this way, no matter whether the user uploads GB2312, BIG5 or UTF-8 encoded text, it can be converted into unified encoding and then reversed.

Summarize

mb_get_info() itself does not directly participate in string inversion operations, but it provides critical encoding information, allowing us to select appropriate mb_function parameters. By correctly obtaining and setting the encoding, combining mb_strlen() and mb_substr() , we can safely and reliably invert multibyte strings.

This is especially important when applying internationally, processing user input, or building systems for the Asian market. If you are building such a project, be sure to enable mbstring extension and always pay attention to the acquisition and use of encoded information.