In PHP, the substr_count() function is a very efficient tool, which is used to calculate the number of times a substring appears in a string. However, substr_count() is by default. This is both a convenience and a pitfall for developers: in some scenarios we need to distinguish case-sensitively to process sensitive information, such as distinguishing the frequency of "Word" and "word".
This article will explain through examples how to use substr_count() to achieve case-sensitive substring counting, and analyze its behavior and potential considerations.
The function signature of substr_count() is as follows:
int substr_count ( string $haystack , string $needle [, int $offset = 0 [, int $length ]] )
$haystack is the main string to be searched for.
$needle is the substring we want to count.
$offset and $length are optional parameters that allow us to search only for a certain part of the string.
Example:
$text = "GitBox.net yes gitbox.net Example of subdomain name";
$count = substr_count($text, "gitbox.net");
echo "gitbox.net Appeared {$count} Second-rate";
The output will be:
gitbox.net Appeared 1 Second-rate
As you can see, substr_count() is case sensitive, so it does not count GitBox.net .
Since substr_count() itself is case sensitive, we don't need to deal with it specifically. Just explicitly pass in the case we want to match. For example, if we want to count "gitbox.net" and "GitBox.net" with different cases, we can do this:
$text = "GitBox.net yes gitbox.net Subdomain name,GITBOX.NET yes另一个形式";
$lowercaseCount = substr_count($text, "gitbox.net");
$capitalizedCount = substr_count($text, "GitBox.net");
$uppercaseCount = substr_count($text, "GITBOX.NET");
echo "gitbox.net Appeared {$lowercaseCount} Second-rate\n";
echo "GitBox.net Appeared {$capitalizedCount} Second-rate\n";
echo "GITBOX.NET Appeared {$uppercaseCount} Second-rate\n";
Output result:
gitbox.net Appeared 1 Second-rate
GitBox.net Appeared 1 Second-rate
GITBOX.NET Appeared 1 Second-rate
This means that the function matches characters exactly, including upper and lower case when processing different forms of strings.
Although this article focuses on case-sensitive counting, if you want to ignore case statistics in reverse, you can convert the string to the same case form first, and then use substr_count() to process:
$text = "GitBox.net yes gitbox.net Subdomain name,GITBOX.NET yes另一个形式";
$textLower = strtolower($text);
$count = substr_count($textLower, "gitbox.net");
echo "Case insensitive gitbox.net Appeared {$count} Second-rate";
substr_count() does not search for overlapping substrings. For example: Searching aa will only return 2 times in aaaa , not 3 times.
The string being searched cannot be false or null , otherwise a warning may be thrown.
Case-sensitive counting is very important when processing user input or security-related content.
Substr_count() is an ideal choice when it is necessary to precisely control the case count of substrings. It has the characteristics of case sensitivity, and developers only need to pass in the correct form of a needle to count specific case formats. Mastering this feature helps write more rigorous PHP code when processing sensitive data, text analysis or data verification.