<span><span><span class="hljs-title function_ invoke__">mb_internal_encoding</span></span><span>(</span><span><span class="hljs-string">"UTF-8"</span></span><span>); </span><span><span class="hljs-comment">// Set internal character encoding</span></span><span>
<p></span>$pattern = "[,,]+"; // Match one or more English or Chinese commas<br>
$result = mb_split($pattern, $text);</p>
<p>print_r($result);<br>
</span>
Output:
<span><span>Array
(
[</span><span><span class="hljs-meta">0</span></span><span>] => Apple
[</span><span><span class="hljs-meta">1</span></span><span>] => Banana
[</span><span><span class="hljs-meta">2</span></span><span>] => Orange
[</span><span><span class="hljs-meta">3</span></span><span>] => Grape
[</span><span><span class="hljs-meta">4</span></span><span>] => Watermelon
)
</span></span>
mb_split automatically recognizes Chinese commas and correctly splits the string.
In some complex scenarios, we may want to check if a string contains a certain pattern before splitting. You can use mb_ereg for this:
<span><span><span class="hljs-keyword">if</span></span><span> (</span><span><span class="hljs-title function_ invoke__">mb_ereg</span></span><span>(</span><span><span class="hljs-string">"[,,]"</span></span><span>, </span><span><span class="hljs-variable">$text</span></span><span>)) {
</span><span><span class="hljs-variable">$result</span></span><span> = </span><span><span class="hljs-title function_ invoke__">mb_split</span></span><span>(</span><span><span class="hljs-string">"[,,]+"</span></span><span>, </span><span><span class="hljs-variable">$text</span></span><span>);
} </span><span><span class="hljs-keyword">else</span></span><span> {
</span><span><span class="hljs-variable">$result</span></span><span> = [</span><span><span class="hljs-variable">$text</span></span><span>]; </span><span><span class="hljs-comment">// No splitting needed</span></span><span>
}
</span></span>
This approach avoids unnecessary splitting and improves efficiency.
Unified character encoding: Use mb_internal_encoding("UTF-8") to ensure all functions use UTF-8 encoding and prevent garbled text.
Regex syntax: mbstring regex syntax differs slightly; pay attention to brackets and escape characters.
Performance: For large texts, pre-checking with mb_ereg before splitting can improve efficiency.
By combining mb_ereg and mb_split appropriately, you can:
Accurately split multibyte-encoded strings
Support multiple delimiters flexibly
Optimize performance with pre-matching
Mastering these techniques helps you handle multilingual text with ease, avoiding common encoding and splitting issues.
Complete example code:
<span><span><span class="hljs-title function_ invoke__">mb_internal_encoding</span></span><span>(</span><span><span class="hljs-string">"UTF-8"</span></span><span>);
<p></span>$text = "Apple,Banana,Orange,Grape,Watermelon";</p>
<p>if (mb_ereg("[,,]", $text)) {<br>
$result = mb_split("[,,]+", $text);<br>
} else {<br>
$result = [$text];<br>
}</p>
<p>print_r($result);<br>
</span>
Running this code produces the correct split results.