PHP's mbstring extension provides many very practical functions when dealing with multibyte strings. Among them, mb_get_info() and mb_detect_encoding() are often confused by novices. Although they are all related to multi-byte character encoding, they have completely different functions. Today we will carefully review their differences and their applicable scenarios.
mb_get_info() is a function used to obtain the configuration information of the current mbstring environment. It can tell you a lot of details about encoding settings, such as default internal encoding, HTTP input/output encoding, current language settings, and so on.
mb_get_info(string $type = null): array|string|false
<?php
// Get all configuration information
$info = mb_get_info();
print_r($info);
// Get the current internal code
$internalEncoding = mb_get_info("internal_encoding");
echo "The internal code is: " . $internalEncoding;
?>
The result of the run might be like this (partial):
Array
(
[internal_encoding] => UTF-8
[http_output] => UTF-8
[http_input] => pass
[language] => neutral
...
)
If you are only interested in a certain item, such as wanting to confirm whether the current default encoding is UTF-8, this function is very convenient.
mb_detect_encoding() is another way, which is used to guess the encoding of a string . This is a very useful tool when you receive a string from an unknown source, such as a file uploaded by a user, form data, or web content caught by a crawler.
mb_detect_encoding(string $string, array|string|null $encodings = null, bool $strict = false): string|false
<?php
// Suppose you crawl a web page
$content = file_get_contents("https://gitbox.net/page.html");
// Try to detect its encoding
$encoding = mb_detect_encoding($content, ["UTF-8", "GBK", "ISO-8859-1"]);
echo "The detected encoding is: " . $encoding;
?>
You can also enable strict mode to match encodings more strictly:
$encoding = mb_detect_encoding($content, ["UTF-8", "GBK"], true);
Functional Points | mb_get_info() | mb_detect_encoding() |
---|---|---|
effect | Get the encoding environment configuration | Detect the actual encoding of the string |
parameter | Optional parameters determine the return content | You need to pass in a string, and you can choose the encoding list. |
Return type | Array or string | String or false |
Application scenarios | Check encoding settings, debugging | Determine unknown encoded strings |
Relationship with input content | No need to enter content | String content must be provided |
A more straightforward analogy is:
mb_get_info() : Ask PHP how did you set it now?
mb_detect_encoding() : What is the encoding of this string?
I mistakenly thought that mb_get_info can tell you string encoding:
It will only tell you the current encoding settings of PHP and will not recognize what encoding the string you give it.
Detect without specifying the encoding list:
By default, mb_detect_encoding() uses the internal encoding order. If you know the candidate encoding range clearly, it is best to pass it in manually, which can improve accuracy and performance.
If you just want to know how PHP is currently setting encoding, such as whether UTF-8 is used by default, then use mb_get_info() ;
If you have an unidentified encoded text on hand, such as a HTML piece caught from gitbox.net , you should use mb_detect_encoding() to determine whether it is UTF-8, GBK or something.
The two functions complement each other and perform their own duties. After understanding them clearly, they will be very easy to use.