Current Location: Home> Latest Articles> Practical Guide to Extracting Complex String Segments Using mb_strstr and mb_strpos

Practical Guide to Extracting Complex String Segments Using mb_strstr and mb_strpos

gitbox 2025-09-29

In PHP, string manipulation is one of the most common operations. When faced with a complex string, efficiently extracting the needed segment can be challenging, especially if the string contains multiple formats. Using the correct functions and methods becomes crucial. This article focuses on combining the mb_strstr and mb_strpos multibyte string functions to extract specific segments from complex strings.

1. Introduction to mb_strstr and mb_strpos

Before diving into practical examples, let's understand the basic usage of these two functions.

  • mb_strstr: This function searches for the first occurrence of a string within another string. Unlike the regular strstr function, mb_strstr is designed to support multibyte encodings (such as UTF-8 and Shift-JIS) and can handle these encodings correctly.

    Syntax:

    <span><span><span class="hljs-title function_ invoke__">mb_strstr</span></span><span>(</span><span><span class="hljs-keyword">string</span></span><span> </span><span><span class="hljs-variable">$haystack</span></span><span>, </span><span><span class="hljs-keyword">string</span></span><span> </span><span><span class="hljs-variable">$needle</span></span><span>, </span><span><span class="hljs-keyword">bool</span></span><span> </span><span><span class="hljs-variable">$before_needle</span></span> = </span><span><span class="hljs-literal">false</span></span>, </span><span><span class="hljs-keyword">string</span></span> </span><span><span class="hljs-variable">$encoding</span></span> = </span><span><span class="hljs-literal">null</span></span>): </span><span><span class="hljs-keyword">string</span>|</span><span><span class="hljs-literal">false</span></span>
    </span></span>

    Parameters:

    • $haystack: The target string to search.

    • $needle: The substring to find.

    • $before_needle: If set to true, returns the portion before $needle.

    • $encoding: Specifies the character encoding, default is null (system default).

  • mb_strpos: This function finds the first occurrence of a substring within another string, returning the index of the substring in the target string. It also supports multibyte encodings.

    Syntax:

    <span><span><span class="hljs-title function_ invoke__">mb_strpos</span></span><span>(</span><span><span class="hljs-keyword">string</span></span><span> </span><span><span class="hljs-variable">$haystack</span></span><span>, </span><span><span class="hljs-keyword">string</span></span><span> </span><span><span class="hljs-variable">$needle</span></span>, </span><span><span class="hljs-keyword">int</span></span><span> </span><span><span class="hljs-variable">$offset</span></span> = </span><span><span class="hljs-number">0</span></span><span>, </span><span><span class="hljs-keyword">string</span></span> </span><span><span class="hljs-variable">$encoding</span></span> = </span><span><span class="hljs-literal">null</span></span>): </span><span><span class="hljs-keyword">int</span>|</span><span><span class="hljs-literal">false</span></span>
    </span></span>

    Parameters:

    • $haystack: The target string to search.

    • $needle: The substring to find.

    • $offset: Specifies the starting position for the search.

    • $encoding: Specifies the character encoding, default is null.

2. Use Case: Extracting Complex String Segments

Suppose we have a string containing extensive information, such as a user's profile, with multiple fields like name, email, address, etc. Our task is to extract the value of a specific field.

For example, given the following string:

<span><span><span class="hljs-variable">$user_info</span></span><span> = </span><span><span class="hljs-string">"Name: Zhang San, Email: [email protected], Address: Chaoyang District, Beijing"</span></span><span>;
</span></span>

We want to extract the email segment. To achieve this, we can use mb_strpos and mb_strstr together for searching and extracting.

3. Practical Implementation

3.1 Locate the Email Position

First, we need to find the position of the "Email" field within the string using mb_strpos:

<span><span><span class="hljs-variable">$email_position</span></span><span> = </span><span><span class="hljs-title function_ invoke__">mb_strpos</span></span><span>(</span><span><span class="hljs-variable">$user_info</span></span><span>, </span><span><span class="hljs-string">"Email: "</span></span><span>);
</span></span>

This function returns the position of "Email: " in the string.

3.2 Extract Email Using mb_strstr

Next, we use mb_strstr to extract the email starting from "Email: ". We set the $before_needle parameter to false to extract from "Email: " to the end of the string:

<span><span><span class="hljs-variable">$email_info</span></span><span> = </span><span><span class="hljs-title function_ invoke__">mb_strstr</span></span><span>(</span><span><span class="hljs-variable">$user_info</span></span><span>, </span><span><span class="hljs-string">"Email: "</span></span><span>, </span><span><span class="hljs-literal">false</span></span><span>);
</span></span>

At this point, $email_info contains:

<span>Email</span><span>: zhangsan</span><span><span class="hljs-meta">@example</span></span><span>.com, Address</span><span>: Chaoyang District, Beijing
</span></span>

However, we only need the email, so further processing is required.

3.3 Extract the Pure Email Address

To get the pure email address, we can use mb_strpos to find the end position of the email and then extract it using mb_substr.

First, locate the start position of the email (after "Email: "), then find the first comma, which marks the end of the email:

<span><span><span class="hljs-variable">$email_start</span></span><span> = </span><span><span class="hljs-title function_ invoke__">mb_strpos</span></span><span>(</span><span><span class="hljs-variable">$email_info</span></span><span>, </span><span><span class="hljs-string">"Email: "</span></span>) + </span><span><span class="hljs-number">6</span></span>; </span><span><span class="hljs-comment">// +6 accounts for the length of "Email: "</span></span>
</span><span><span class="hljs-variable">$email_end</span></span><span> = </span><span><span class="hljs-title function_ invoke__">mb_strpos</span></span><span>(</span><span><span class="hljs-variable">$email_info</span></span><span>, </span><span><span class="hljs-string">","</span></span>, </span><span><span class="hljs-variable">$email_start</span></span>); </span><span><span class="hljs-comment">// Find comma starting from the email start position</span></span>
</span>

Then, use mb_substr to extract the email:

<span><span><span class="hljs-variable">$email</span></span><span> = </span><span><span class="hljs-title function_ invoke__">mb_substr</span></span><span>(</span><span><span class="hljs-variable">$email_info</span></span><span>, </span><span><span class="hljs-variable">$email_start</span></span><span>, </span><span><span class="hljs-variable">$email_end</span></span> - </span><span><span class="hljs-variable">$email_start</span></span>);
</span></span>

Now, $email contains:

<span>zhangsan</span><span><span class="hljs-keyword">@example</span></span><span>.com
</span></span>

4. Summary

By combining mb_strstr and mb_strpos, we can flexibly extract desired segments from complex strings. The key points are:

  • Use mb_strpos to find the index position of a substring.

  • Use mb_strstr to extract the string from a specified position.

  • Use mb_substr to further refine the extraction to the precise segment needed.

This method is especially suitable for handling strings containing multiple types of information, helping efficiently obtain the required data in real-world development.