How to solve the garbled problem in character encoding conversion using the mb_get_info function?

gitbox 2025-05-11

When developing multilingual websites or processing external data sources, the confusion of character encoding is often a headache. If the encoding settings are incorrect, garbled code will occur at the least, and data will be lost and function will be abnormal.
In PHP, the mbstring extension provides many functions for processing multibyte characters, among which mb_get_info() is a very useful tool for positioning encoding problems.

What is mb_get_info()

mb_get_info() is a function in the PHP mbstring extension to obtain the configuration information of the current multibyte string processing.
It can return the current internal encoding, HTTP input/output encoding, language settings, etc., helping us quickly understand the current encoding environment of the script.

Function signature is very simple:

 array mb_get_info ([ string $type = "all" ] )

If no parameters are passed, all configuration information will be returned by default. If you pass a type, such as "internal_encoding" , only the corresponding information will be returned.

Why does character encoding cause garbled code?

PHP is not very "sensitive" to character encoding by default, especially when handling multi-byte encodings such as UTF-8, GBK, and Shift-JIS, garbled code will occur if it is slightly improperly processed.
Common reasons are:

The input encoding is inconsistent with the script processing
The encoding settings of the output are incorrect
Encoding errors during database connection
Server default locale setting issues

At this time, if you don’t know what encoding is used in the current environment, it will be difficult to prescribe the right medicine. And mb_get_info() can help us find out.

How to troubleshoot garbled code with mb_get_info() ?

Here is a simple example that demonstrates how to use mb_get_info() to locate problems:

 <?php
// View all configurations for current multibyte processing
$info = mb_get_info();
print_r($info);

// Focus on internal coding
echo "Internal Encoding: " . mb_internal_encoding() . PHP_EOL;

// Set the encoding to UTF-8，Avoid garbled code
mb_internal_encoding("UTF-8");

// Check it again
echo "New Internal Encoding: " . mb_internal_encoding() . PHP_EOL;
?>

The output may be similar to:

 Array
(
    [internal_encoding] => UTF-8
    [http_output] => UTF-8
    [http_input] => UTF-8
    [language] => neutral
    ...
)
Internal Encoding: UTF-8
New Internal Encoding: UTF-8

If you find that internal_encoding is not UTF-8 (for example, ISO-8859-1 ), then it is likely that it is the source of garbled code.
By directly resetting with mb_internal_encoding("UTF-8") , you can avoid the problem of garbled code.

Practical case: Prevent garbled code output to web pages

Suppose you have a simple interface that returns the user input to the front end, like this:

 <?php
header('Content-Type: text/html; charset=UTF-8');

// Check the current internal code
if (mb_internal_encoding() !== 'UTF-8') {
    mb_internal_encoding('UTF-8');
}

// Assume user input（Probably from the form、Interface, etc.）
$user_input = "Hello，world！";

// Output
echo htmlspecialchars($user_input, ENT_QUOTES, 'UTF-8');
?>

Here, even if the user submits other encodings, such as GB2312, the server uniformly processes them into UTF-8 and outputs them, the probability of garbled code can be greatly reduced.
If more comprehensive encoding detection is required, you can also use it with mb_detect_encoding() to automatically identify and convert.

Additional: Document address

If you want to know more about the usage of mb_get_info() and mbstring , you can refer to the official documentation:
https://gitbox.net/php/manual/zh/function.mb-get-info.php