What errors do the mb_strcut function often encounter when processing strings in a database? How to avoid it?

gitbox 2025-05-29

mb_strcut is a very practical function when using PHP to process multibyte strings. It can intercept the specified byte length of a string without garbled code problems caused by truncating multibyte characters like traditional substr . However, when we operate strings in the database, especially when it comes to multilingual content and encoding conversion, the use of mb_strcut is also prone to some errors. This article will analyze these common errors and their avoidance in detail.

1. Introduction to mb_strcut

mb_strcut is to cut off the specified number of bytes from the string, not the number of characters. It is designed for multi-byte encoding, avoiding garbled code caused by truncating a multi-byte character.

The function prototype is as follows:

 mb_strcut(string $string, int $start, int $length = null, string $encoding = null): string

$string : Enter a string.
$start : The starting byte position.
$length : The intercepted byte length (optional).
$encoding : character encoding, default is internal encoding.

2. Common errors and causes

1. Inconsistent encoding results in truncation error

The string encoding stored in the database is inconsistent with the encoding used by mb_strcut , which will cause abnormal interception results. For example, the database field is UTF-8 encoding, but the program uses the default internal encoding (probably ISO-8859-1), which will cause byte truncation position errors.

Error manifestations:
The intercept result is garbled, the characters are incomplete, and even the program throws an exception.

How to avoid it:

Identify $encoding , for example:

 mb_strcut($string, 0, 10, 'UTF-8');

Ensure that the encoding of database connections and query results is consistent with the encoding in the program. MySQL can be executed by:

 SET NAMES 'utf8mb4';

Or specify when PDO connection:

 new PDO('mysql:host=...;dbname=...', $user, $pass, [
    PDO::MYSQL_ATTR_INIT_COMMAND => "SET NAMES utf8mb4"
]);

2. Error in calculation of starting byte position

The $start parameter of mb_strcut is the byte position, not the character position. If developers are used to passing character positions as byte positions, it will lead to intercept position deviation.

Error manifestations:
The intercepted string starts off from expectations, which may result in missing characters or garbled code.

How to avoid it:

When using mb_strpos to get the byte position, please specify the same encoding.
If you want to use character positions, you must first convert the character position to byte position.

Example:

 $pos_char = 3; // 1.3Characters
$pos_byte = strlen(mb_substr($string, 0, $pos_char, 'UTF-8'));
$result = mb_strcut($string, $pos_byte, 10, 'UTF-8');

3. Incorrect intercept length causes character truncation

$length is the byte length. If the intercepted length is in the middle of multi-byte characters, mb_strcut will safely truncate to the full character boundary, but if the logical error causes the length to be set improperly, it may affect the interception effect.

How to avoid it:

Calculate the byte length reasonably according to the requirements.
If you want to intercept a fixed number of characters, you can use it in combination with mb_substr .

3. Practical Examples

Assuming that a Chinese string is stored in the database, we want to intercept the first 10 bytes.

 <?php
// Get strings from database
$string = "Hello，Welcomemb_strcutfunction！";

// Specify the encoding
$encoding = 'UTF-8';

// Before intercept10Bytes
$result = mb_strcut($string, 0, 10, $encoding);

echo $result;
?>

In this example, mb_strcut will ensure that half of the Chinese character will not be truncated and that the output string will not be garbled.

4. Summary

When using mb_strcut , be sure to clearly encode it and be consistent with the database encoding.
Note that $start and $length are both byte units, not character units, so they need to be calculated carefully.
Coordinate with the database character set settings to avoid errors caused by encoding mismatch.
For character intercept, it is recommended to use mb_substr . mb_strcut is more suitable for scenarios where byte intercept is intercepted.

Mastering the above skills can effectively avoid common errors in database string processing by mb_strcut , and ensure that the program outputs correct and safe multi-byte strings.

 <?php
// Example：Secure intercept multibyte strings in database

// Assume that the database is connected，And the character set isutf8mb4

// Read strings from database
$query = "SELECT content FROM articles WHERE id = 1";
$result = $pdo->query($query);
$row = $result->fetch(PDO::FETCH_ASSOC);

$content = $row['content'];
$encoding = 'UTF-8';

// Before intercept50Bytes，Avoid garbled code
$snippet = mb_strcut($content, 0, 50, $encoding);

echo $snippet;
?>

If you want to learn more about multibyte string processing, you can visit:
https://gitbox.net/php/manual/zh/function.mb-strcut.php