Current Location: Home> Latest Articles> How to Use mb_convert_kana with preg_replace for Input Normalization

How to Use mb_convert_kana with preg_replace for Input Normalization

gitbox 2025-08-18

1. mb_convert_kana Function Overview

mb_convert_kana is a PHP multibyte string function used to convert full-width and half-width characters in Japanese text, such as kana. This function is especially useful for handling full-width and half-width characters in input, ensuring consistent formatting.

The common syntax for the mb_convert_kana function is as follows:

<span><span><span class="hljs-title function_ invoke__">mb_convert_kana</span></span><span>(</span><span><span class="hljs-variable">$str</span></span><span>, </span><span><span class="hljs-variable">$option</span></span><span>, </span><span><span class="hljs-variable">$encoding</span></span><span>);
</span></span>
  • $str is the string to be converted.

  • $option specifies the conversion options, which can include:

    • 'a' converts all full-width alphanumeric characters to half-width.

    • 'A' converts full-width alphabetic characters to half-width.

    • 'k' converts full-width kana characters to half-width.

    • 'K' converts full-width kana characters (including long vowel marks) to half-width.

    • 'h' converts full-width punctuation to half-width.

    • 'H' converts full-width punctuation (including periods, commas, etc.) to half-width.

    • 'c' converts full-width numbers to half-width.

  • $encoding is the character encoding (e.g., UTF-8), defaulting to SJIS.

For example, to convert a string containing full-width alphanumeric characters to half-width:

<span><span><span class="hljs-variable">$str</span></span><span> = </span><span><span class="hljs-string">"ABC123"</span></span><span>;
</span><span><span class="hljs-variable">$converted</span></span><span> = </span><span><span class="hljs-title function_ invoke__">mb_convert_kana</span></span><span>(</span><span><span class="hljs-variable">$str</span></span><span>, </span><span><span class="hljs-string">'a'</span></span><span>); </span><span><span class="hljs-comment">// Result: 'ABC123'</span></span><span>
</span></span>

2. preg_replace Function Overview

preg_replace is a PHP regular expression function used to replace content in a string based on a regex pattern. It allows for complex pattern matching and replacement operations, which is particularly useful for removing special characters or formatting input data.

The basic usage of preg_replace is as follows:

<span><span><span class="hljs-title function_ invoke__">preg_replace</span></span><span>(</span><span><span class="hljs-variable">$pattern</span></span><span>, </span><span><span class="hljs-variable">$replacement</span></span><span>, </span><span><span class="hljs-variable">$subject</span></span><span>);
</span></span>
  • $pattern is the regex pattern.

  • $replacement is the string to replace with.

  • $subject is the string to process.

For example, to replace all digits in a string with asterisks:

<span><span><span class="hljs-variable">$str</span></span><span> = </span><span><span class="hljs-string">"abc123xyz"</span></span><span>;
</span><span><span class="hljs-variable">$result</span></span><span> = </span><span><span class="hljs-title function_ invoke__">preg_replace</span></span><span>(</span><span><span class="hljs-string">"/\d/"</span></span><span>, </span><span><span class="hljs-string">"*"</span></span><span>, </span><span><span class="hljs-variable">$str</span></span><span>); </span><span><span class="hljs-comment">// Result: 'abc***xyz'</span></span><span>
</span></span>

3. Combining mb_convert_kana and preg_replace

By combining mb_convert_kana and preg_replace, we can achieve more precise control over input. For example, when processing user input, we may need to convert full-width characters to half-width, remove extra spaces, or strip other non-alphanumeric characters. The following example demonstrates how to use both functions together for input normalization.

Suppose we have a form where users may enter strings with full-width characters, spaces, or special symbols. We want to normalize the input before saving to ensure data consistency.

Example: Input String Normalization

<span><span><span class="hljs-comment">// Original user input</span></span><span>
</span><span><span class="hljs-variable">$user_input</span></span><span> = </span><span><span class="hljs-string">"  ABC 123 !@#"</span></span><span>;
<p></span>// Use mb_convert_kana to convert full-width characters to half-width<br>
$normalized_input = mb_convert_kana($user_input, 'a');</p>
<p>// Use preg_replace to remove extra spaces and special characters<br>
$normalized_input = preg_replace("/[^a-zA-Z0-9]/", "", $normalized_input);</p>
<p>// Output the result<br>
echo $normalized_input;  // Result: 'ABC123'<br>
</span>

Explanation:

  1. mb_convert_kana($user_input, 'a'): Converts all full-width characters (including alphanumeric and kana) to half-width characters.

  2. preg_replace("/[^a-zA-Z0-9]/", "", $normalized_input): Removes all non-alphanumeric characters from the string (including spaces and punctuation).

4. Use Cases

This combination is particularly useful in the following scenarios:

  • Form Submission: Users often enter data with inconsistent formatting, such as a mix of full-width and half-width characters, spaces, or punctuation. Using these functions ensures uniform formatting.

  • Database Storage: Ensuring data consistency during storage is critical for later processing. Normalizing input prevents errors caused by inconsistent formats.

  • Search Functionality: Normalizing input improves the accuracy of fuzzy searches or keyword matching.

5. Conclusion

By combining mb_convert_kana and preg_replace, developers can perform more flexible and precise input normalization. This approach not only standardizes character formats but also removes unnecessary symbols and spaces, improving data consistency and quality. In practice, it is commonly used in form input processing, database storage, and search optimization, making it a highly practical technique.