In PHP, when dealing with multi-byte strings (such as Chinese, Japanese, Korean, etc.), ordinary string functions may have problems of truncation and garbled code. To solve this problem, PHP provides the mb_strcut function, which is specifically used for intercepting multibyte strings. This article will introduce the basic usage of mb_strcut in detail, and use examples to help you easily master the intercepting techniques of multi-byte strings.
The mb_strcut function is a member of the PHP multi-byte string function library. Its main function is to intercept the number of bytes of a specified length from the specified byte position of the string, which is suitable for processing multi-byte character sets. Its interception is based on bytes, but it will ensure that multi-byte characters will not be cut into half, thereby avoiding garbled code.
The function signature is as follows:
mb_strcut(string $str, int $start, ?int $length = null, ?string $encoding = null): string
$str : The string to be intercepted.
$start : The starting position, unit is bytes.
$length : The intercepted length, unit is bytes. If omitted, the end of the string is intercepted.
$encoding : The encoding of a string, the default is internal encoding (usually UTF-8).
Although both mb_strcut and mb_substr can intercept multi-byte strings, their logic is different:
mb_substr intercepts the string by the number of characters (for example, intercepts the 5 characters starting with the third character).
mb_strcut truncates strings by number of bytes (avoid truncating multibyte characters causing garbled code).
For example, if a Chinese character contains Chinese, a Chinese character occupies 3 bytes in UTF-8 encoding, and the byte range specified by mb_strcut is more granular, and characters will not be disassembled during intercepting.
Here is a simple example showing how to intercept Chinese strings with mb_strcut .
<?php
$text = "Hello,world!"; // This is a Chinese sentence,Contains multibyte characters
// Intercept by bytes,Starting location0,length6byte
$result = mb_strcut($text, 0, 6, 'UTF-8');
echo $result; // Output "Hello"
?>
explain:
The Chinese "you" and "good" each occupy 3 bytes, and the intercepted 6 bytes are exactly 2 complete Chinese characters.
If you use the substr function to intercept 6 bytes, the characters may be truncated and garbled.
Avoid garbled code : When processing strings containing multibyte characters, use mb_strcut first to ensure that the intercepted result does not destroy the character structure.
Specifying encoding : It is recommended to always specify encoding parameters, usually UTF-8 , to prevent problems caused by different default encodings.
Use in combination with strlen : To intercept the first half of a string, you can first use mb_strlen to get the character length, and then use mb_strcut to determine the corresponding byte length.
Suppose you want to cut and splice a URL from a multibyte string, you can write it like this:
<?php
$text = "Visit our official website:";
$url = "https://gitbox.net/path/to/resource";
$result = mb_strcut($text, 0, 12, 'UTF-8'); // Intercept6个中文字符的bytelength
echo $result . $url;
?>
Output:
Visit our official website:https://gitbox.net/path/to/resource
mb_strcut is an ideal function for handling multi-byte string truncation, intercepting by byte without truncating characters.
It is suitable for processing UTF-8-encoded Chinese, Japanese and other strings to avoid garbled code.
It is recommended to specify the encoding parameters clearly when using it to ensure compatibility.
Combined with practical applications, you can easily intercept strings and splice URLs or other content.
By mastering mb_strcut , you can better handle multi-byte strings, improving the robustness and user experience of PHP programs.