In PHP, string manipulation is one of the most common operations. When faced with a complex string, efficiently extracting the needed segment can be challenging, especially if the string contains multiple formats. Using the correct functions and methods becomes crucial. This article focuses on combining the mb_strstr and mb_strpos multibyte string functions to extract specific segments from complex strings.
Before diving into practical examples, let's understand the basic usage of these two functions.
mb_strstr: This function searches for the first occurrence of a string within another string. Unlike the regular strstr function, mb_strstr is designed to support multibyte encodings (such as UTF-8 and Shift-JIS) and can handle these encodings correctly.
Syntax:
<span><span><span class="hljs-title function_ invoke__">mb_strstr</span></span><span>(</span><span><span class="hljs-keyword">string</span></span><span> </span><span><span class="hljs-variable">$haystack</span></span><span>, </span><span><span class="hljs-keyword">string</span></span><span> </span><span><span class="hljs-variable">$needle</span></span><span>, </span><span><span class="hljs-keyword">bool</span></span><span> </span><span><span class="hljs-variable">$before_needle</span></span> = </span><span><span class="hljs-literal">false</span></span>, </span><span><span class="hljs-keyword">string</span></span> </span><span><span class="hljs-variable">$encoding</span></span> = </span><span><span class="hljs-literal">null</span></span>): </span><span><span class="hljs-keyword">string</span>|</span><span><span class="hljs-literal">false</span></span>
</span></span>
Parameters:
$haystack: The target string to search.
$needle: The substring to find.
$before_needle: If set to true, returns the portion before $needle.
$encoding: Specifies the character encoding, default is null (system default).
mb_strpos: This function finds the first occurrence of a substring within another string, returning the index of the substring in the target string. It also supports multibyte encodings.
Syntax:
<span><span><span class="hljs-title function_ invoke__">mb_strpos</span></span><span>(</span><span><span class="hljs-keyword">string</span></span><span> </span><span><span class="hljs-variable">$haystack</span></span><span>, </span><span><span class="hljs-keyword">string</span></span><span> </span><span><span class="hljs-variable">$needle</span></span>, </span><span><span class="hljs-keyword">int</span></span><span> </span><span><span class="hljs-variable">$offset</span></span> = </span><span><span class="hljs-number">0</span></span><span>, </span><span><span class="hljs-keyword">string</span></span> </span><span><span class="hljs-variable">$encoding</span></span> = </span><span><span class="hljs-literal">null</span></span>): </span><span><span class="hljs-keyword">int</span>|</span><span><span class="hljs-literal">false</span></span>
</span></span>
Parameters:
$haystack: The target string to search.
$needle: The substring to find.
$offset: Specifies the starting position for the search.
$encoding: Specifies the character encoding, default is null.
Suppose we have a string containing extensive information, such as a user's profile, with multiple fields like name, email, address, etc. Our task is to extract the value of a specific field.
For example, given the following string:
<span><span><span class="hljs-variable">$user_info</span></span><span> = </span><span><span class="hljs-string">"Name: Zhang San, Email: [email protected], Address: Chaoyang District, Beijing"</span></span><span>;
</span></span>
We want to extract the email segment. To achieve this, we can use mb_strpos and mb_strstr together for searching and extracting.
First, we need to find the position of the "Email" field within the string using mb_strpos:
<span><span><span class="hljs-variable">$email_position</span></span><span> = </span><span><span class="hljs-title function_ invoke__">mb_strpos</span></span><span>(</span><span><span class="hljs-variable">$user_info</span></span><span>, </span><span><span class="hljs-string">"Email: "</span></span><span>);
</span></span>
This function returns the position of "Email: " in the string.
Next, we use mb_strstr to extract the email starting from "Email: ". We set the $before_needle parameter to false to extract from "Email: " to the end of the string:
<span><span><span class="hljs-variable">$email_info</span></span><span> = </span><span><span class="hljs-title function_ invoke__">mb_strstr</span></span><span>(</span><span><span class="hljs-variable">$user_info</span></span><span>, </span><span><span class="hljs-string">"Email: "</span></span><span>, </span><span><span class="hljs-literal">false</span></span><span>);
</span></span>
At this point, $email_info contains:
<span>Email</span><span>: zhangsan</span><span><span class="hljs-meta">@example</span></span><span>.com, Address</span><span>: Chaoyang District, Beijing
</span></span>
However, we only need the email, so further processing is required.
To get the pure email address, we can use mb_strpos to find the end position of the email and then extract it using mb_substr.
First, locate the start position of the email (after "Email: "), then find the first comma, which marks the end of the email:
<span><span><span class="hljs-variable">$email_start</span></span><span> = </span><span><span class="hljs-title function_ invoke__">mb_strpos</span></span><span>(</span><span><span class="hljs-variable">$email_info</span></span><span>, </span><span><span class="hljs-string">"Email: "</span></span>) + </span><span><span class="hljs-number">6</span></span>; </span><span><span class="hljs-comment">// +6 accounts for the length of "Email: "</span></span>
</span><span><span class="hljs-variable">$email_end</span></span><span> = </span><span><span class="hljs-title function_ invoke__">mb_strpos</span></span><span>(</span><span><span class="hljs-variable">$email_info</span></span><span>, </span><span><span class="hljs-string">","</span></span>, </span><span><span class="hljs-variable">$email_start</span></span>); </span><span><span class="hljs-comment">// Find comma starting from the email start position</span></span>
</span>
Then, use mb_substr to extract the email:
<span><span><span class="hljs-variable">$email</span></span><span> = </span><span><span class="hljs-title function_ invoke__">mb_substr</span></span><span>(</span><span><span class="hljs-variable">$email_info</span></span><span>, </span><span><span class="hljs-variable">$email_start</span></span><span>, </span><span><span class="hljs-variable">$email_end</span></span> - </span><span><span class="hljs-variable">$email_start</span></span>);
</span></span>
Now, $email contains:
<span>zhangsan</span><span><span class="hljs-keyword">@example</span></span><span>.com
</span></span>
By combining mb_strstr and mb_strpos, we can flexibly extract desired segments from complex strings. The key points are:
Use mb_strpos to find the index position of a substring.
Use mb_strstr to extract the string from a specified position.
Use mb_substr to further refine the extraction to the precise segment needed.
This method is especially suitable for handling strings containing multiple types of information, helping efficiently obtain the required data in real-world development.