Current Location: Home> Latest Articles> How to Combine mb_scrub and htmlspecialchars to Prevent XSS Attacks?

How to Combine mb_scrub and htmlspecialchars to Prevent XSS Attacks?

gitbox 2025-07-17

In web development, cross-site scripting (XSS) is a common and dangerous security threat. Attackers inject malicious script code, causing the browser to perform unintended actions such as stealing user information, hijacking sessions, or even taking control of the browser. To prevent XSS, developers typically apply strict filtering and encoding to user input. In PHP, htmlspecialchars() is one of the most commonly used defense mechanisms. However, if the user-submitted content contains invalid or illegal character sequences, using htmlspecialchars() alone may not fully prevent vulnerabilities. In such cases, it is necessary to combine mb_scrub() for more secure handling.

What is mb_scrub?

mb_scrub() is a function introduced in PHP 8.2 that "cleanses" multi-byte strings containing illegal characters to make them valid. Multi-byte characters, if truncated during transmission or processing, may result in invalid character sequences. If these illegal sequences are passed directly to htmlspecialchars(), under certain conditions, they may bypass the expected escaping mechanism.

For example, an illegal UTF-8 byte sequence may be incorrectly parsed by the browser, leading to script injection.

<span><span><span class="hljs-comment">// Example: Input with illegal bytes</span></span><span>
</span><span><span class="hljs-variable">$input</span></span><span> = </span><span><span class="hljs-string">"\xC0&lt;script&gt;alert(&#039;XSS&#039;);&lt;/script&gt;"</span></span><span>;

</span><span><span class="hljs-comment">// Direct use of htmlspecialchars (unsafe)</span></span><span>
</span><span><span class="hljs-keyword">echo</span></span><span> </span><span><span class="hljs-title function_ invoke__">htmlspecialchars</span></span><span>(</span><span><span class="hljs-variable">$input</span></span><span>, ENT_QUOTES, </span><span><span class="hljs-string">&#039;UTF-8&#039;</span></span><span>);
</span></span>

In the example above, if $input contains an illegal UTF-8 byte sequence, the browser may ignore those bytes and execute the subsequent