Current Location: Home> Latest Articles> How Beginners Can Use mb_substr to Extract Part of a String: A Complete Basic Tutorial

How Beginners Can Use mb_substr to Extract Part of a String: A Complete Basic Tutorial

gitbox 2025-06-11

When dealing with PHP strings, particularly those that include Chinese or other multibyte characters, mb_substr() is a very practical function. It is part of the mbstring (Multibyte String) extension, specially designed to handle multibyte encoded strings such as UTF-8. For beginners, learning to use mb_substr() effectively prevents garbled text issues and ensures accurate substring extraction.

1. What is mb_substr()?

mb_substr() is used to extract a substring from a multibyte string. Its basic syntax is as follows:

mb_substr(string $string, int $start, ?int $length = null, ?string $encoding = null): string

Parameter explanation:

  • $string: The original string to operate on;

  • $start: Starting position (zero-based index);

  • $length (optional): The length of the substring to extract;

  • $encoding (optional): Character encoding, defaults to internal encoding (usually UTF-8).

2. Why not use substr()?

If your string is purely English, using substr() usually works fine. However, if it contains Chinese, Japanese, or other non-ASCII characters, substr() can easily cause garbled output or incorrect substring extraction. For example:

$str = "你好,世界!";
echo substr($str, 0, 2);  // Outputs garbled text

The above code outputs garbled characters because substr() processes the string by bytes, and Chinese characters typically occupy 3 bytes each.

Using mb_substr() handles this correctly:

$str = "你好,世界!";
echo mb_substr($str, 0, 2, "UTF-8");  // Outputs: 你好

3. Example explanations

Let's get familiar with mb_substr() through a few examples.

Example 1: Extract the first few characters from a string

$str = "PHP教程:从零开始学习";
echo mb_substr($str, 0, 5, "UTF-8");  // Outputs: PHP教程:从

Example 2: Extract a middle section of a string

$str = "欢迎来到gitbox.net的PHP教学专区";
echo mb_substr($str, 4, 6, "UTF-8");  // Outputs: 到gitbox.net

Example 3: Specify only the start position to extract to the end

$str = "学习PHP很有趣";
echo mb_substr($str, 3, null, "UTF-8");  // Outputs: PHP很有趣

Example 4: Use a negative index to start extraction from the end

$str = "程序员的日常生活";
echo mb_substr($str, -4, 2, "UTF-8");  // Outputs: 日常

4. How to set the default encoding

You can use mb_internal_encoding() to set the default encoding, so you don't have to specify "UTF-8" each time:

mb_internal_encoding("UTF-8");
$str = "深入浅出PHP开发";
echo mb_substr($str, 2, 3);  // Outputs: 浅出PHP

5. Summary

mb_substr() is the standard PHP tool for handling multibyte characters (like Chinese) in strings. It is safer and more accurate than the traditional substr(). Mastering this function is a fundamental skill for internationalization projects and Chinese website development. Remember: when working with non-English strings, always prioritize using mb_substr().

For more PHP basics tutorials, visit gitbox.net. You'll find rich beginner resources and code examples to help you quickly become a qualified PHP developer.