In PHP, the mb_substr_count function is used to count how many times a substring appears within another string. This function is part of the multibyte string extension (mbstring), designed to handle strings containing multibyte characters such as UTF-8. Unlike substr_count, mb_substr_count correctly counts occurrences when dealing with multibyte characters.
However, what happens when we pass an empty string as a parameter to mb_substr_count? This article explores in detail how mb_substr_count behaves when encountering empty strings and highlights some important considerations.
The function prototype for mb_substr_count is as follows:
<span><span><span class="hljs-keyword">int</span></span><span> </span><span><span class="hljs-title function_ invoke__">mb_substr_count</span></span><span>(</span><span><span class="hljs-keyword">string</span></span><span> </span><span><span class="hljs-variable">$haystack</span></span>, </span><span><span class="hljs-keyword">string</span></span> <span><span class="hljs-variable">$needle</span></span>, </span><span><span class="hljs-keyword">string</span></span> <span><span class="hljs-variable">$encoding</span></span> = </span><span><span class="hljs-literal">null</span></span><span>);
</span></span>
$haystack: The target string to be searched.
$needle: The substring to look for within the target string.
$encoding: Optional parameter specifying the character encoding. Defaults to the internal encoding.
The function returns the number of times $needle appears in $haystack.
If we pass an empty string as the substring ($needle) to mb_substr_count, according to the PHP documentation, the return value will be the number of times the substring appears in the target string.
<span><span><span class="hljs-variable">$haystack</span></span> = </span><span><span class="hljs-string">"Hello, world!"</span></span>;
<span><span class="hljs-variable">$needle</span></span> = </span><span><span class="hljs-string">""</span></span>;
<span><span class="hljs-variable">$count</span></span> = </span><span class="hljs-title function_ invoke__">mb_substr_count</span></span>(</span><span class="hljs-variable">$haystack</span></span>, </span><span class="hljs-variable">$needle</span></span>);
<span><span class="hljs-keyword">echo</span></span> </span><span class="hljs-variable">$count</span></span>; </span><span class="hljs-comment">// Output: 0</span></span>
</span></span>
In this code, $needle is an empty string. Even though the target string $haystack is not empty, PHP treats an empty string as a substring that “never appears.” Thus, the function returns 0.
If the target string $haystack passed to mb_substr_count is an empty string ($haystack = ""), the return value will always be 0, regardless of what $needle is. This is because an empty string cannot contain any substrings.
<span><span class="hljs-variable">$haystack</span></span> = </span><span><span class="hljs-string">""</span></span>;
<span><span class="hljs-variable">$needle</span></span> = </span><span><span class="hljs-string">"Hello"</span></span>;
<span><span class="hljs-variable">$count</span></span> = </span><span class="hljs-title function_ invoke__">mb_substr_count</span></span>(</span><span class="hljs-variable">$haystack</span></span>, </span><span class="hljs-variable">$needle</span></span>);
<span><span class="hljs-keyword">echo</span></span> </span><span class="hljs-variable">$count</span></span>; </span><span class="hljs-comment">// Output: 0</span></span>
</span></span>
Here, even though $needle is "Hello," the empty $haystack contains nothing, so the function returns 0.
From the two examples above, we can see that the behavior of mb_substr_count with empty strings is intuitive:
If the substring is empty, the return value is 0 because there is nothing to match.
If the target string is empty, the return value is also 0 because an empty string cannot contain any substrings.
These are common “edge cases” for mb_substr_count and should be kept in mind when using it.
Although handling empty strings usually poses no issue, for better readability and efficiency, it is recommended to avoid passing empty strings to mb_substr_count. If $needle is empty, it’s best to check beforehand, for example:
<span><span class="hljs-keyword">if</span></span> (</span><span class="hljs-variable">$needle</span></span> !== </span><span class="hljs-string">""</span></span>) {
</span><span class="hljs-variable">$count</span></span> = </span><span class="hljs-title function_ invoke__">mb_substr_count</span></span>(</span><span class="hljs-variable">$haystack</span></span>, </span><span class="hljs-variable">$needle</span></span>);
} </span><span class="hljs-keyword">else</span></span> {
</span><span class="hljs-variable">$count</span></span> = </span><span class="hljs-number">0</span></span>;
}
</span></span>
This avoids unnecessary function calls and can improve performance.
Empty string as needle: If the substring is empty, mb_substr_count returns 0.
Empty string as haystack: If the target string is empty, mb_substr_count also returns 0.
Performance optimization: Avoid passing empty strings as needle. Checking before calling mb_substr_count improves efficiency.
Understanding these details helps developers use mb_substr_count more effectively in PHP, avoiding unnecessary bugs and performance issues.