PHP, as a widely-used server-side language, often requires substring operations. Because Chinese characters occupy multiple bytes, traditional string functions cannot handle them correctly. This article shares practical methods to help you easily extract Chinese strings.
PHP’s built-in mb_substr() function is designed for multibyte characters and can accurately extract Chinese substrings. It counts characters rather than bytes.
<span class="fun">mb_substr(string $str, int $start, int $length, string $charset = 'UTF-8'): string</span>
Parameter explanation:
Example code:
$str = "PHP中截取中文字符串";
$substr = mb_substr($str, 0, 6);
echo $substr; // Outputs: "PHP中截"
Besides built-in functions, regular expressions combined with preg_match() can extract Chinese characters. Regex can match Unicode ranges for Chinese characters.
Example code:
$str = "PHP中截取中文字符串";
preg_match('/^([\x{4e00}-\x{9fa5}]+)/u', $str, $matches);
echo $matches[1]; // Outputs: "PHP"
For more flexible logic, you can write a custom function combining mb_strlen() and mb_substr() to extract substrings with ellipsis.
function substr_cn($str, $length) {
$len = mb_strlen($str, 'UTF-8');
if ($len > $length) {
$str = mb_substr($str, 0, $length, 'UTF-8') . '...';
}
return $str;
}
$str = "PHP中截取中文字符串";
$substr = substr_cn($str, 6);
echo $substr; // Outputs: "PHP中截..."
When using these methods, please pay attention to:
Extracting Chinese strings is a common and important task in PHP development. Using the built-in mb_substr() function, regular expressions, or custom functions, you can flexibly and efficiently extract Chinese substrings. Proper handling of character encoding and length is key to accurate results. We hope this article offers practical guidance for your projects.