Current Location: Home> Latest Articles> How to determine whether a string is UTF-8 encoding through PHP's mb_get_info function?

How to determine whether a string is UTF-8 encoding through PHP's mb_get_info function?

gitbox 2025-05-29

When working with strings using PHP, it is important to ensure that the strings are in the correct encoding format, especially in scenarios involving multilingual content. UTF-8 is one of the most popular character encodings at present, supporting most languages ​​in the world. This article will introduce how to determine whether a string is UTF-8 encoding through the mb_get_info function, combined with other mbstring extension functions.

Understand the mb_get_info function

First of all, mb_get_info() itself is used to obtain the configuration information of the mbstring extension, such as the current internal encoding, detection order, etc. It does not directly detect string encoding , but it can tell us whether the current environment is configured to detect UTF-8.
Usually, we will combine the mb_detect_encoding() function to implement encoding detection.

To view the current mbstring configuration, you can write it like this:

 <?php
// Check mbstring Current configuration
print_r(mb_get_info());
?>

The output information includes:

  • internal_encoding

  • http_input

  • http_output

  • language

  • encoding_translation

  • detect_order

  • substitute_character and so on.

If the detect_order contains UTF-8 , we can use mb_detect_encoding() to detect whether the string is UTF-8.

Determine whether the string is UTF-8

Combined with the mb_detect_encoding() function, you can judge this way:

 <?php
function is_utf8($string) {
    // If needed,You can ensure first detect_order Included in UTF-8
    $info = mb_get_info();
    if (strpos($info['detect_order'], 'UTF-8') === false) {
        // Manually set the detection order
        mb_detect_order(['UTF-8', 'ISO-8859-1', 'ASCII']);
    }

    // use mb_detect_encoding Come to test
    return mb_detect_encoding($string, 'UTF-8', true) === 'UTF-8';
}

// Example
$text = "This is a test";

if (is_utf8($text)) {
    echo "The string is UTF-8 coding";
} else {
    echo "String is not UTF-8 coding";
}
?>

Explain this code:

  • First check the current character detection order through mb_get_info() ;

  • If there is no UTF-8 , use mb_detect_order() to temporarily adjust it;

  • Use mb_detect_encoding() and pass in the third parameter true to indicate strict detection;

  • Finally, determine whether the return result is 'UTF-8' .

Pay attention to small details

  • Make sure PHP has mbstring extensions installed and enabled.

  • mb_detect_encoding() is not 100% accurate, especially in short strings or pure English strings, but is reliable enough for general applications.

  • If your application deals with encoding problems a lot, it is recommended to unify the encoding formats of input and output, and explicitly set the internal encoding through mb_internal_encoding('UTF-8') .

Sample project hosting address

If you want to see the full example, you can visit: https://gitbox.net/php/utf8-check-demo