Current Location: Home> Latest Articles> Example Code and Detailed Explanation for Using mb_scrub to Filter Filename Strings

Example Code and Detailed Explanation for Using mb_scrub to Filter Filename Strings

gitbox 2025-07-01

When handling file uploads or file operations, the security and validity of filenames are crucial. Invalid filenames may lead to system errors, security vulnerabilities, or even file overwriting risks. This article introduces a PHP-based custom function mb_scrub, designed to filter and standardize filename strings, particularly in scenarios involving multibyte characters.

Design Philosophy of the mb_scrub Function

  1. Consistent Encoding
    To correctly handle multibyte characters (such as Chinese, Japanese, Korean, etc.), the input string is first converted to UTF-8 encoding using mb_convert_encoding, avoiding character truncation or garbled text.

  2. Character Filtering
    Using regular expressions, a set of specific allowed characters is retained. Common characters allowed in filenames include:

    • Alphabetical letters (uppercase and lowercase)

    • Numbers

    • Underscore _

    • Period .

    • Hyphen -

    These characters meet the needs of most filenames while avoiding the risks posed by special characters.

  3. Avoid Special Prefixes
    Filenames beginning with a dot usually indicate hidden or special system files. Using ltrim removes any leading dots, ensuring that hidden files are not unintentionally created.

  4. Protection Against Empty Strings
    If the filtered filename is an empty string, a default filename (e.g., default_filename) is automatically assigned to prevent errors during subsequent file operations.

Explanation of the Example Code

In the example, the input filename is 非法文件名<>:"/\|?*.txt, which contains many invalid characters. After processing with mb_scrub, only the allowed characters remain. The output example is as follows:

<span><span><span class="hljs-section">Original Filename: 非法文件名&lt;&gt;:"/\|?*.txt</span></span><span>
Filtered Filename: txt
</span></span>

The example is relatively simple, but in actual projects, the allowed character set can be adjusted based on requirements, or additional rules (such as length restrictions, file extension checks, etc.) can be applied.

Usage Recommendations

  • Combine with File Extension Checks: While filtering the filename, it is important to also verify the legitimacy of the file extension to prevent malicious script uploads.

  • Limit Filename Length: Avoid excessively long filenames that may cause system compatibility issues.

  • Path Safety: Filename filtering cannot replace path safety checks, which are crucial to prevent vulnerabilities like directory traversal.

  • Multilingual Support: mb_scrub works well with multibyte characters, making it suitable for international applications.

By using functions like mb_scrub appropriately, the security and stability of filename handling can be significantly improved, making it an essential step in PHP file uploads or file operations.