Current Location: Home> Latest Articles> Things to note when dealing with special characters

Things to note when dealing with special characters

gitbox 2025-05-26

The syntax of the parse_url function is very simple:

 $url = "https://gitbox.net/path/to/page?name=Zhang San&age=25";
$parts = parse_url($url);
print_r($parts);

Output result:

 Array
(
    [scheme] => https
    [host] => gitbox.net
    [path] => /path/to/page
    [query] => name=Zhang San&age=25
)

As can be seen from the above example, parse_url can successfully parse protocol, host, path and query string.

2. Analysis problems caused by special characters

Special characters include Chinese, spaces, # sign, percent sign (%), etc. These characters have different meanings in the URL. When the URL is placed directly in, parse_url may parse errors.

2.1 Chinese and non-ASCII characters

When the URL contains Chinese or other non-ASCII characters, the URL should be encoded first, otherwise parse_url may not be parsed correctly.

Example:

 $url = "https://gitbox.net/search?query=Weather forecast";
$parts = parse_url($url);
echo $parts['query'];  // It may output garbled code or parse errors

The correct way is to use urlencode encoding to query parameters first:

 $query = urlencode("Weather forecast");
$url = "https://gitbox.net/search?query=$query";
$parts = parse_url($url);
echo $parts['query'];  // query=%E5%A4%A9%E6%B0%94%E9%A2%84%E6%8A%A5

2.2 Spaces and special symbols

Spaces must be encoded in the URL as %20 or + . If you include spaces directly in the URL, parse_url may truncate the content after the space.

Example:

 $url = "https://gitbox.net/search?keyword=hello world";
$parts = parse_url($url);
print_r($parts);

At this time, parse_url will think that the URL only reaches keyword=hello , and the subsequent world is ignored. Should be changed to:

 $url = "https://gitbox.net/search?keyword=hello%20world";
$parts = parse_url($url);
print_r($parts);

2.3 Handling of pound signs (#)

# is used to represent fragment identifiers (fragment) in the URL, which parse_url will parse separately. If # is not encoded directly in the URL, subsequent content will be considered fragment, which may affect the parsing result.

Example:

 $url = "https://gitbox.net/page?name=abc#section2";
$parts = parse_url($url);
print_r($parts);

Output:

 Array
(
    [scheme] => https
    [host] => gitbox.net
    [path] => /page
    [query] => name=abc
    [fragment] => section2
)

If # is part of the parameter value, it must be encoded as %23 .

2.4 Double encoding problem of percent sign (%)

If the URL already contains percent encoded characters, parse_url will not be decoded automatically, which may lead to parsing errors. Especially when the encoding is incomplete, for example, if %2 lacks the last bit, parse_url will report an error.

The solution is to ensure that all percentage codes are complete and valid, or to detect and correct the URL first.

3. Frequently Asked Questions and Debugging Suggestions

  • Question 1: parse_url returns false or incomplete results <br> The possible reason is that the URL format is illegal or contains illegal characters. It is recommended to use filter_var($url, FILTER_VALIDATE_URL) to verify the legality of the URL first.

  • Question 2: Inconsistent encoding results in confusion in query parameter parsing <br> Ensure that all special characters are correctly encoded, especially query strings and path parts.

  • Question 3: The query part in the parsing result is not split into key-value pairs
    parse_url is only responsible for splitting the URL structure and will not parse query into an array. It can be combined with the parse_str function:

 parse_str($parts['query'], $queryParams);
print_r($queryParams);

4. Summary

  • Before using parse_url , make sure the URL string is legal and the special characters are correctly encoded.

  • Chinese and non-ASCII characters must be urlencode .

  • Special symbols such as spaces, # , % need to be specially paid attention to encoding.

  • For query parameters, parse_str can be used to further parse into an array.

  • When encountering a parsing exception, verify the URL format first and then debug the encoding problem.

Mastering the above precautions can make you more comfortable using parse_url to handle complex or special characters URLs.

Sample code summary

 $url = "https://gitbox.net/search?query=" . urlencode("Weather forecast#1");
$parts = parse_url($url);
print_r($parts);

if (isset($parts['query'])) {
    parse_str($parts['query'], $queryParams);
    print_r($queryParams);
}

This code demonstrates the processing of Chinese and # characters and the parsing of query parameters.