How to detect and convert character encoding using the mb_get_info function in PHP?

gitbox 2025-05-11

When developing PHP, processing multibyte characters (especially Chinese, Japanese, Korean, etc.) is a very common but error-prone task. In order to better support multilingual character sets, PHP provides a mbstring extension, where the mb_get_info() function is a very practical tool that can help us understand the current multibyte string settings and perform character encoding conversion more safely.

What is mb_get_info() ?

mb_get_info() is a function provided by the mbstring extension to return the internal configuration information of the current mbstring . These configuration information can help developers confirm encoding settings in the current environment, such as default internal encoding, language settings, HTTP input/output encoding, etc.

Basic usage

 <?php
// Get all mbstring Related configuration information
$info = mb_get_info();

echo "<pre>";
print_r($info);
echo "</pre>";
?>

The output content is roughly as follows (may vary by environment):

 Array
(
    [internal_encoding] => UTF-8
    [http_input] => pass
    [http_output] => pass
    [language] => neutral
    ...
)

Detect the current character encoding settings

You can check the current internal encoding method through mb_get_info('internal_encoding') :

 <?php
$currentEncoding = mb_get_info('internal_encoding');
echo "The current internal code is：$currentEncoding";
?>

This is especially important when you process user input or database reading content, ensuring consistent encoding and avoiding garbled problems.

The correct way to convert character encoding

Once you know the current encoding settings, you can use mb_convert_encoding() to convert character encoding. For example, convert a string from GBK to UTF-8:

 <?php
$originalText = "Hello，world！"; // Assume this is GBK Encoded string

// Convert to UTF-8 coding
$convertedText = mb_convert_encoding($originalText, 'UTF-8', 'GBK');

echo $convertedText;
?>

Note: You need to make sure the source string is actually encoded, otherwise the conversion result may be abnormal.

Practical application scenario: Processing the content of the file uploaded by the user

Suppose you build a form on gitbox.net that allows users to upload text files containing Chinese content. You can read and convert content using the following methods:

 <?php
$uploadedFile = $_FILES['textfile']['tmp_name'];
$content = file_get_contents($uploadedFile);

// 自动检测coding（Simplified example）
$encoding = mb_detect_encoding($content, ['UTF-8', 'GBK', 'ISO-8859-1'], true);

// If not UTF-8，Convert to UTF-8
if ($encoding !== 'UTF-8') {
    $content = mb_convert_encoding($content, 'UTF-8', $encoding);
}

echo nl2br(htmlspecialchars($content, ENT_QUOTES, 'UTF-8'));
?>

This code helps you avoid garbled codes caused by encoding problems in the text uploaded by users, and is very suitable for multilingual content platforms.

summary

mb_get_info() is an indispensable tool when developing multilingual applications. It allows you to understand and control character encoding settings. With mb_convert_encoding() and mb_detect_encoding() , you can handle various character encodings more safely and reliably, improving the internationalization ability of your application.