mb_strcut() is different from mb_substr() . It is intercepted in units, not characters. For UTF-8 encoded strings, a Chinese character usually takes 3 bytes. If you truncate the string from a byte position in the middle, it may cause incomplete characters, thus displaying garbled or illegal characters at the output.
For example:
$str = 'Hello,world';
$cut = mb_strcut($str, 0, 4, 'UTF-8');
echo $cut;
The result of this code output is likely to be garbled, because the word "you" occupies 3 bytes in UTF-8. mb_strcut intercepts the first 4 bytes, and just truncates part of the second character "good" , causing garbled.
If you don't specifically need byte-level control, it is recommended to use mb_substr() , which operates based on character length rather than byte length, and is more suitable for handling multibyte strings:
$str = 'Hello,world';
$cut = mb_substr($str, 0, 2, 'UTF-8');
echo $cut; // Output:Hello
If you have to use mb_strcut() (for example, to limit the storage byte length), you need to combine mb_strlen() and character encoding for more precise control. You can first use mb_substr() to get characters, and then use strlen() to determine whether it exceeds the byte length range.
function safe_mb_strcut($string, $start, $length, $encoding = 'UTF-8') {
$substr = '';
$i = 0;
$byteCount = 0;
while ($i < mb_strlen($string, $encoding)) {
$char = mb_substr($string, $i, 1, $encoding);
$charLen = strlen($char);
if ($byteCount + $charLen > $length) {
break;
}
$substr .= $char;
$byteCount += $charLen;
$i++;
}
return $substr;
}
$str = 'Hello,world';
$cut = safe_mb_strcut($str, 0, 6); // The total number of bytes is6
echo $cut; // Output:Hello
When using the mb_series function, be sure to set or confirm that the internal character encoding is what you expect (such as UTF-8). You can use the following methods to set globally:
mb_internal_encoding('UTF-8');
In addition, it can also be checked and debugged in the following ways:
echo mb_detect_encoding($str); // Check string encoding
Suppose you want to intercept part of the content containing the URL in the description, you can use the above method to intercept safely without destroying the URL. For example:
$str = 'For more information, please visit:https://gitbox.net/docs/php-guide.html';
$cut = safe_mb_strcut($str, 0, 40);
echo $cut;
You can ensure that the output does not destroy the URL structure or cause garbled code, which is especially suitable for social platform summary, email preview and other scenarios.