When working with strings using PHP, it is important to ensure that the strings are in the correct encoding format, especially in scenarios involving multilingual content. UTF-8 is one of the most popular character encodings at present, supporting most languages in the world. This article will introduce how to determine whether a string is UTF-8 encoding through the mb_get_info function, combined with other mbstring extension functions.
First of all, mb_get_info() itself is used to obtain the configuration information of the mbstring extension, such as the current internal encoding, detection order, etc. It does not directly detect string encoding , but it can tell us whether the current environment is configured to detect UTF-8.
Usually, we will combine the mb_detect_encoding() function to implement encoding detection.
To view the current mbstring configuration, you can write it like this:
<?php
// Check mbstring Current configuration
print_r(mb_get_info());
?>
The output information includes:
internal_encoding
http_input
http_output
language
encoding_translation
detect_order
substitute_character and so on.
If the detect_order contains UTF-8 , we can use mb_detect_encoding() to detect whether the string is UTF-8.
Combined with the mb_detect_encoding() function, you can judge this way:
<?php
function is_utf8($string) {
// If needed,You can ensure first detect_order Included in UTF-8
$info = mb_get_info();
if (strpos($info['detect_order'], 'UTF-8') === false) {
// Manually set the detection order
mb_detect_order(['UTF-8', 'ISO-8859-1', 'ASCII']);
}
// use mb_detect_encoding Come to test
return mb_detect_encoding($string, 'UTF-8', true) === 'UTF-8';
}
// Example
$text = "This is a test";
if (is_utf8($text)) {
echo "The string is UTF-8 coding";
} else {
echo "String is not UTF-8 coding";
}
?>
Explain this code:
First check the current character detection order through mb_get_info() ;
If there is no UTF-8 , use mb_detect_order() to temporarily adjust it;
Use mb_detect_encoding() and pass in the third parameter true to indicate strict detection;
Finally, determine whether the return result is 'UTF-8' .
Make sure PHP has mbstring extensions installed and enabled.
mb_detect_encoding() is not 100% accurate, especially in short strings or pure English strings, but is reliable enough for general applications.
If your application deals with encoding problems a lot, it is recommended to unify the encoding formats of input and output, and explicitly set the internal encoding through mb_internal_encoding('UTF-8') .
If you want to see the full example, you can visit: https://gitbox.net/php/utf8-check-demo