<span><span><span class="hljs-title function_ invoke__">mb_internal_encoding</span></span><span>(</span><span><span class="hljs-string">"UTF-8"</span></span><span>); </span><span><span class="hljs-comment">// Set internal character encoding</span></span><span>
<p></span>$pattern = "[,,]+"; // Match one or more English or Chinese commas<br>
$result = mb_split($pattern, $text);</p>
<p>print_r($result);<br>
</span>
Output:
<span><span>Array
(
[</span><span><span class="hljs-meta">0</span></span><span>] => Apple
[</span><span><span class="hljs-meta">1</span></span><span>] => Banana
[</span><span><span class="hljs-meta">2</span></span><span>] => Orange
[</span><span><span class="hljs-meta">3</span></span><span>] => Grape
[</span><span><span class="hljs-meta">4</span></span><span>] => Watermelon
)
</span></span>
mb_split automatically recognizes Chinese commas and correctly splits the string.
In some complex scenarios, we may want to check if a certain pattern exists in the string before splitting. This can be done using mb_ereg:
<span><span><span class="hljs-keyword">if</span></span><span> (</span><span><span class="hljs-title function_ invoke__">mb_ereg</span></span><span>(</span><span><span class="hljs-string">"[,,]"</span></span><span>, </span><span><span class="hljs-variable">$text</span></span><span>)) {
</span><span><span class="hljs-variable">$result</span></span><span> = </span><span><span class="hljs-title function_ invoke__">mb_split</span></span><span>(</span><span><span class="hljs-string">"[,,]+"</span></span><span>, </span><span><span class="hljs-variable">$text</span></span><span>);
} </span><span><span class="hljs-keyword">else</span></span><span> {
</span><span><span class="hljs-variable">$result</span></span><span> = [</span><span><span class="hljs-variable">$text</span></span><span>]; </span><span><span class="hljs-comment">// No splitting needed</span></span><span>
}
</span></span>
This approach avoids unnecessary splitting operations and improves efficiency.
Unified character encoding: Use mb_internal_encoding("UTF-8") to ensure all functions use UTF-8 encoding consistently, avoiding garbled text.
Regex pattern writing: mbstring’s regex syntax differs slightly, so pay attention to brackets and escape characters.
Performance considerations: For large texts, using mb_ereg to check before splitting can improve performance by reducing multiple regex matches.
By properly combining mb_ereg and mb_split, you can:
Accurately handle multibyte string splitting
Flexibly support multiple delimiters
Optimize performance when pre-matching conditions apply
Mastering these techniques allows you to handle multilingual text smoothly and avoid common encoding and splitting errors.
Complete Example Code:
<span><span><span class="hljs-title function_ invoke__">mb_internal_encoding</span></span><span>(</span><span><span class="hljs-string">"UTF-8"</span></span><span>);
<p></span>$text = "Apple,Banana,Orange,Grape,Watermelon";</p>
<p>if (mb_ereg("[,,]", $text)) {<br>
$result = mb_split("[,,]+", $text);<br>
} else {<br>
$result = [$text];<br>
}</p>
<p>print_r($result);<br>
</span>
Running this will produce the correct split results.