When processing multibyte strings in PHP, the commonly used function is mb_strcut , which can intercept strings based on the number of bytes and is suitable for processing multibyte encoded text such as UTF-8. However, many developers often encounter encoding errors when using mb_strcut to intercept strings, resulting in garbled intercepting results or truncating half a character. This article will explain in detail how to correctly use mb_strcut to avoid encoding errors and share practical tips.
mb_strcut is one of PHP's multibyte string functions for intercepting strings by byte length. Unlike mb_substr , mb_strcut is intercepted based on bytes, not characters. It can more accurately control the interception length when dealing with multi-byte encoding, avoiding garbled characters being truncated.
Function prototype:
mb_strcut(string $str, int $start, ?int $length = null, ?string $encoding = null): string
$str : Enter a string.
$start : Start position, calculated by number of bytes.
$length : The number of bytes intercepted (optional).
$encoding : string encoding, internal encoding is used by default.
When we use mb_strcut to intercept the string, if $start or $length falls inappropriately in the middle of the multi-byte character, garbled code will appear because the truncated character bytes are incomplete. Especially for UTF-8 encoding, a Chinese character is generally composed of 3 bytes. When intercepting bytes, it is necessary to ensure that the starting point and end point are both character boundaries.
When calling mb_strcut , explicitly specifying the encoding of the string is the first step to avoid problems caused by inconsistent default encoding.
$encoding = 'UTF-8';
$result = mb_strcut($str, $start, $length, $encoding);
Before intercepting, use mb_strlen to get the string character length to avoid $start and $length from out of range. At the same time, combine mb_substr to ensure that half a character is not truncated.
$length = 10;
if (mb_strlen($str, $encoding) > $length) {
$result = mb_substr($str, 0, $length, $encoding);
} else {
$result = $str;
}
If you have to intercept by the number of bytes, first calculate the complete number of characters corresponding to the intercepted byte range, and then use mb_substr to intercept.
function safe_mb_strcut(string $str, int $start, int $length, string $encoding = 'UTF-8'): string {
$substr = mb_strcut($str, $start, $length, $encoding);
// mb_strcut Sometimes half a character may be truncated,Transcoding confirms whether it is valid
if (mb_check_encoding($substr, $encoding)) {
return $substr;
}
// If incomplete,Reduce length,Until complete
while ($length > 0 && !mb_check_encoding($substr, $encoding)) {
$length--;
$substr = mb_strcut($str, $start, $length, $encoding);
}
return $substr;
}
$str = "This is a test string,Includes Chinese andEnglish";
$start = 0;
$length = 15; // Intercept by bytes
$result = safe_mb_strcut($str, $start, $length, 'UTF-8');
echo $result;
This avoids the garbled problem caused by byte truncation.
mb_strcut intercepts multi-byte strings by bytes. Pay attention to character boundaries to avoid truncating half a character.
Identify encoding parameters to ensure that the function behavior is consistent.
The encoding integrity of the intercepted results can be verified in combination with mb_check_encoding .
Combining mb_strlen and mb_substr is more secure when character interception is needed.
Through the above techniques, encoding errors during multi-byte string interception in PHP can be effectively avoided, and the accuracy of text processing and user experience can be ensured.
<?php
function safe_mb_strcut(string $str, int $start, int $length, string $encoding = 'UTF-8'): string {
$substr = mb_strcut($str, $start, $length, $encoding);
if (mb_check_encoding($substr, $encoding)) {
return $substr;
}
while ($length > 0 && !mb_check_encoding($substr, $encoding)) {
$length--;
$substr = mb_strcut($str, $start, $length, $encoding);
}
return $substr;
}
$str = "This is a test string,Includes Chinese andEnglish";
$start = 0;
$length = 15;
echo safe_mb_strcut($str, $start, $length, 'UTF-8');
?>
If you want to learn more about PHP string processing, you can access the following resources:
$url = "https://gitbox.net/php/manual/zh/function.mb-strcut.php";