In development, we often need to count the number of Chinese characters in a string. PHP provides a function mb_strlen that can accurately calculate the length of Chinese characters. Here is a simple example:
In the code above, we define a string $string containing Chinese characters, and use the mb_strlen function to get the number of Chinese characters. Note that the encoding is set to 'utf-8' to ensure that Chinese characters are counted correctly.
When counting Chinese and English characters, the results often differ. For example, the string “PHP实时统计中文字数” contains 10 Chinese characters, but if we use strlen to count the English characters, the result is 18.
As shown in the code above, the character length calculated using strlen includes both English and Chinese characters with different encoding methods.
When calculating the number of Chinese and English characters, there is a difference between the strlen and mb_strlen functions. Specifically, strlen counts byte length, while mb_strlen counts the number of characters.
As shown above, the result returned by strlen is 68, which represents the byte length of the string, not the number of characters.
It’s important to note that Chinese and English characters are handled differently in PHP. In the strlen function, one Chinese character typically occupies three bytes, so there is a difference between character count and byte count. You can verify this with the following code:
With this code, we can clearly see that the number of English characters and Chinese characters differ in the same string.
In summary, strlen is used for counting byte lengths and is suitable for English characters, while mb_strlen is used to handle Chinese characters and provides an accurate character count. Understanding the difference between these two functions is crucial when working with strings containing mixed languages.