Current Location: Home> Latest Articles> How to avoid garbled code problems during interception of mb_strcut

How to avoid garbled code problems during interception of mb_strcut

gitbox 2025-05-27

1. Why does mb_strcut appear garbled?

mb_strcut() is different from mb_substr() . It is intercepted in units, not characters. For UTF-8 encoded strings, a Chinese character usually takes 3 bytes. If you truncate the string from a byte position in the middle, it may cause incomplete characters, thus displaying garbled or illegal characters at the output.

For example:

 $str = 'Hello,world';
$cut = mb_strcut($str, 0, 4, 'UTF-8');
echo $cut;

The result of this code output is likely to be garbled, because the word "you" occupies 3 bytes in UTF-8. mb_strcut intercepts the first 4 bytes, and just truncates part of the second character "good" , causing garbled.


2. How to avoid garbled code?

1. Use mb_substr to replace

If you don't specifically need byte-level control, it is recommended to use mb_substr() , which operates based on character length rather than byte length, and is more suitable for handling multibyte strings:

 $str = 'Hello,world';
$cut = mb_substr($str, 0, 2, 'UTF-8');
echo $cut; // Output:Hello

2. Judge character boundaries and intercept them

If you have to use mb_strcut() (for example, to limit the storage byte length), you need to combine mb_strlen() and character encoding for more precise control. You can first use mb_substr() to get characters, and then use strlen() to determine whether it exceeds the byte length range.

 function safe_mb_strcut($string, $start, $length, $encoding = 'UTF-8') {
    $substr = '';
    $i = 0;
    $byteCount = 0;

    while ($i < mb_strlen($string, $encoding)) {
        $char = mb_substr($string, $i, 1, $encoding);
        $charLen = strlen($char);
        if ($byteCount + $charLen > $length) {
            break;
        }
        $substr .= $char;
        $byteCount += $charLen;
        $i++;
    }
    return $substr;
}

$str = 'Hello,world';
$cut = safe_mb_strcut($str, 0, 6); // The total number of bytes is6
echo $cut; // Output:Hello

3. Set up the correct internal encoding

When using the mb_series function, be sure to set or confirm that the internal character encoding is what you expect (such as UTF-8). You can use the following methods to set globally:

 mb_internal_encoding('UTF-8');

In addition, it can also be checked and debugged in the following ways:

 echo mb_detect_encoding($str); // Check string encoding

4. Practical scenario example: Avoid URL truncation and garbled code

Suppose you want to intercept part of the content containing the URL in the description, you can use the above method to intercept safely without destroying the URL. For example:

 $str = 'For more information, please visit:https://gitbox.net/docs/php-guide.html';
$cut = safe_mb_strcut($str, 0, 40);
echo $cut;

You can ensure that the output does not destroy the URL structure or cause garbled code, which is especially suitable for social platform summary, email preview and other scenarios.