: After using mb_strcut to intercept the string, garbled code appears, or the characters at the end of the intercepted string are incomplete.
Reason : mb_strcut is to intercept strings by bytes, not by character. If the intercepted position falls exactly in the middle of a multibyte character, the character will break and thus display garbled code.
<?php
$str = "Hello,world";
echo mb_strcut($str, 0, 5, "UTF-8");
// The output may be garbled,because“you”yes3byte,“good”yes3byte,Intercept5byte会截断“good”Character
?>
Error phenomenon : The interception result is incorrect, or the output is abnormal.
Cause : If the correct encoding is not explicitly specified, mb_strcut uses the internal default encoding (usually the value of mb_internal_encoding() ), which may not match the actual encoding of the string, resulting in an interception error.
<?php
$str = "こんにちは";
echo mb_strcut($str, 0, 4); // No encoding specified,默认可能不yes UTF-8,The result is abnormal
?>
Error phenomenon : function error or behavior abnormality.
Cause : The first two parameters of mb_strcut (string, start position) and the third parameter (intercept length) must be integers or values that can be converted into integers, and the start position and length cannot be negative (length can be omitted). Passing in non-integral or negative values will cause an error.
<?php
$str = "Hello World";
echo mb_strcut($str, "a", 5); // The starting position should be an integer,传入Character符串会出错
?>
Since mb_strcut is intercepted by bytes, make sure that the intercept length does not cause multibyte characters to be truncated. The common method is to first calculate the multi-byte character length, then intercept the corresponding byte length as needed, or use mb_substr to intercept by character instead.
<?php
$str = "Hello,world";
// use mb_substr 按Character符Intercept,避免截断Character符
echo mb_substr($str, 0, 2, "UTF-8"); // Output:Hello
?>
If you have to use mb_strcut , make sure the number of bytes intercepted is the boundary of the full character:
<?php
$str = "Hello,world";
$length = 6; // 3byte * 2个Character符
echo mb_strcut($str, 0, $length, "UTF-8"); // Output:Hello
?>
To avoid problems caused by default encoding mismatch, character encoding parameters should always be specified when calling mb_strcut , usually "UTF-8" .
<?php
$str = "こんにちは";
echo mb_strcut($str, 0, 6, "UTF-8");
?>
Before using mb_strcut , make sure that the incoming starting position and length parameters are non-negative integers. You can type conversion and verification through functions such as intval() or filter_var() to avoid errors.
<?php
$start = intval($_GET['start'] ?? 0);
$length = intval($_GET['length'] ?? 10);
$str = "Hello, world";
echo mb_strcut($str, $start, $length, "UTF-8");
?>
<?php
function safe_mb_strcut(string $string, int $start, int $length = null, string $encoding = 'UTF-8'): string {
// Make sure the start position and length are non-negative integers
$start = max(0, $start);
if ($length !== null) {
$length = max(0, $length);
}
// 获取Character符串byte长度
$byteLength = strlen(mb_convert_encoding($string, 'UTF-8'));
if ($start > $byteLength) {
return '';
}
if ($length === null) {
$length = $byteLength - $start;
} else if ($start + $length > $byteLength) {
$length = $byteLength - $start;
}
return mb_strcut($string, $start, $length, $encoding);
}
// use示例
$str = "Hello,GitBoxuser!";
echo safe_mb_strcut($str, 0, 9, "UTF-8"); // Intercept前3个汉Character
?>
Through the above analysis and examples, the key to correctly using mb_strcut is:
explicitly specify character encoding;
Ensure that the parameter type is correct and valid;
Note that bytes are not truncated by multi-byte characters, or use mb_substr instead of intercepting by character.
After mastering these techniques, mb_strcut will be more reliable when processing multibyte strings, avoiding common mistakes.