Current Location: Home> Latest Articles> Tips for Using mb_substitute_character and mb_convert_encoding Together

Tips for Using mb_substitute_character and mb_convert_encoding Together

gitbox 2025-09-28

1. The Role of mb_substitute_character

mb_substitute_character is a setting in PHP’s multibyte string library that defines the substitution character when character encoding conversion fails. When a character cannot be converted correctly, mb_substitute_character determines how PHP handles the unconvertible character. By default, PHP uses ? as the substitution character if a conversion fails.

For example, if converting from UTF-8 to GBK and encountering characters that cannot be represented in GBK, the default behavior will replace those characters with a question mark ?.

<span><span><span class="hljs-title function_ invoke__">mb_substitute_character</span></span><span>(</span><span><span class="hljs-string">&#039;none&#039;</span></span><span>); </span><span><span class="hljs-comment">// Do not use a substitution character</span></span><span>
</span><span><span class="hljs-title function_ invoke__">mb_substitute_character</span></span><span>(</span><span><span class="hljs-string">&#039;?&#039;</span></span><span>);    </span><span><span class="hljs-comment">// Use "?" as the substitution character</span></span><span>
</span></span>

By calling mb_substitute_character, you can control the form of the substitution character, even setting it to an empty string or a specific symbol.

2. The Role of mb_convert_encoding

mb_convert_encoding is a PHP function used to convert character encodings. It supports multiple encoding formats, such as UTF-8, GBK, and ISO-8859-1.

<span><span><span class="hljs-variable">$str</span></span><span> = </span><span><span class="hljs-string">"Chinese string"</span></span><span>;
</span><span><span class="hljs-variable">$converted_str</span></span><span> = </span><span><span class="hljs-title function_ invoke__">mb_convert_encoding</span></span><span>(</span><span><span class="hljs-variable">$str</span></span><span>, </span><span><span class="hljs-string">&#039;ISO-8859-1&#039;</span></span><span>, </span><span><span class="hljs-string">&#039;UTF-8&#039;</span></span><span>);
</span><span><span class="hljs-keyword">echo</span></span><span> </span><span><span class="hljs-variable">$converted_str</span></span><span>;
</span></span>

This converts $str from UTF-8 to ISO-8859-1. During conversion, characters in the source string that cannot be represented in the target encoding may be replaced according to the mb_substitute_character setting.

3. Tips for Using Them Together

When converting a string from one encoding to another and handling unconvertible characters, coordinating mb_substitute_character and mb_convert_encoding is essential. Here are some practical tips:

3.1 Set an Appropriate Substitution Character

If characters in the source string cannot be represented in the target encoding, you can use mb_substitute_character to avoid garbled output. For example, replace unconvertible characters with a specific symbol or string.

<span><span><span class="hljs-title function_ invoke__">mb_substitute_character</span></span><span>(</span><span><span class="hljs-string">&#039;!&#039;</span></span><span>);
</span><span><span class="hljs-variable">$str</span></span><span> = </span><span><span class="hljs-title function_ invoke__">mb_convert_encoding</span></span><span>(</span><span><span class="hljs-string">&#039;Hello, World&#039;</span></span><span>, </span><span><span class="hljs-string">&#039;ASCII&#039;</span></span><span>, </span><span><span class="hljs-string">&#039;UTF-8&#039;</span></span><span>);
</span><span><span class="hljs-keyword">echo</span></span><span> </span><span><span class="hljs-variable">$str</span></span><span>; </span><span><span class="hljs-comment">// Output: "???,!"</span></span><span>
</span></span>

In this example, the Chinese characters in Hello, World cannot be represented in ASCII, so the substitution character ! replaces the unconvertible characters.

3.2 Avoid Unnecessary Substitutions

If you want to avoid character substitution entirely and either return the original string or stop execution on error, set mb_substitute_character to none. Unconvertible characters will not be replaced, and PHP may throw warnings or return unpredictable results.

<span><span><span class="hljs-title function_ invoke__">mb_substitute_character</span></span><span>(</span><span><span class="hljs-string">&#039;none&#039;</span></span><span>);
</span><span><span class="hljs-variable">$str</span></span><span> = </span><span><span class="hljs-title function_ invoke__">mb_convert_encoding</span></span><span>(</span><span><span class="hljs-string">&#039;Hello, World&#039;</span></span><span>, </span><span><span class="hljs-string">&#039;ASCII&#039;</span></span><span>, </span><span><span class="hljs-string">&#039;UTF-8&#039;</span></span><span>);
</span><span><span class="hljs-keyword">echo</span></span><span> </span><span><span class="hljs-variable">$str</span></span><span>; </span><span><span class="hljs-comment">// Warning or error occurs if conversion fails</span></span><span>
</span></span>
3.3 Use an Appropriate Target Encoding

Ensure the target encoding can represent all characters from the source encoding. If the source string mainly uses a specific character set and the target encoding supports a wider range of characters (such as UTF-8), you can avoid using substitution characters. Generally, UTF-8 is a universal choice compatible with nearly all languages.

<span><span><span class="hljs-variable">$str</span></span><span> = </span><span><span class="hljs-string">"Chinese string"</span></span><span>;
</span><span><span class="hljs-variable">$converted_str</span></span><span> = </span><span><span class="hljs-title function_ invoke__">mb_convert_encoding</span></span><span>(</span><span><span class="hljs-variable">$str</span></span><span>, </span><span><span class="hljs-string">&#039;UTF-8&#039;</span></span><span>, </span><span><span class="hljs-string">&#039;GBK&#039;</span></span><span>);
</span><span><span class="hljs-keyword">echo</span></span><span> </span><span><span class="hljs-variable">$converted_str</span></span><span>; </span><span><span class="hljs-comment">// Conversion succeeds normally</span></span><span>
</span></span>

If you are unsure whether the target encoding supports all characters, it’s best to validate the conversion results in advance or use mb_substitute_character to handle characters that may fail to convert.

3.4 Use Error Handling When Needed with mb_convert_encoding

In some cases, when using mb_convert_encoding with unconvertible characters, you may need error handling to ensure smooth conversion. You can use @ to suppress warnings and handle errors according to your requirements.

<span><span><span class="hljs-title function_ invoke__">mb_substitute_character</span></span><span>(</span><span><span class="hljs-string">&#039;none&#039;</span></span><span>);
</span><span><span class="hljs-variable">$str</span></span><span> = @</span><span><span class="hljs-title function_ invoke__">mb_convert_encoding</span></span><span>(</span><span><span class="hljs-string">&#039;Unconvertible characters&#039;</span></span><span>, </span><span><span class="hljs-string">&#039;UTF-8&#039;</span></span><span>, </span><span><span class="hljs-string">&#039;ISO-8859-1&#039;</span></span><span>);
</span><span><span class="hljs-keyword">if</span></span><span> (</span><span><span class="hljs-variable">$str</span></span><span> === </span><span><span class="hljs-literal">false</span></span><span>) {
    </span><span><span class="hljs-keyword">echo</span></span><span> </span><span><span class="hljs-string">"Character conversion failed"</span></span><span>;
}
</span></span>

4. Conclusion

mb_substitute_character and mb_convert_encoding are two powerful PHP functions that help handle complex scenarios in character encoding conversion. Proper use of these functions can prevent garbled text and provide a better user experience.

  • Setting an appropriate substitution character (via mb_substitute_character) helps avoid unnecessary garbled text when conversion fails.

  • When using mb_convert_encoding, selecting a suitable target encoding and ensuring compatibility with the source string reduces the risk of character loss.

By skillfully applying these two functions, PHP developers can handle character encoding more effectively, improving cross-platform and internationalization support.