The syntax of the parse_url function is very simple:
$url = "https://gitbox.net/path/to/page?name=Zhang San&age=25";
$parts = parse_url($url);
print_r($parts);
Output result:
Array
(
[scheme] => https
[host] => gitbox.net
[path] => /path/to/page
[query] => name=Zhang San&age=25
)
As can be seen from the above example, parse_url can successfully parse protocol, host, path and query string.
Special characters include Chinese, spaces, # sign, percent sign (%), etc. These characters have different meanings in the URL. When the URL is placed directly in, parse_url may parse errors.
When the URL contains Chinese or other non-ASCII characters, the URL should be encoded first, otherwise parse_url may not be parsed correctly.
Example:
$url = "https://gitbox.net/search?query=Weather forecast";
$parts = parse_url($url);
echo $parts['query']; // It may output garbled code or parse errors
The correct way is to use urlencode encoding to query parameters first:
$query = urlencode("Weather forecast");
$url = "https://gitbox.net/search?query=$query";
$parts = parse_url($url);
echo $parts['query']; // query=%E5%A4%A9%E6%B0%94%E9%A2%84%E6%8A%A5
Spaces must be encoded in the URL as %20 or + . If you include spaces directly in the URL, parse_url may truncate the content after the space.
Example:
$url = "https://gitbox.net/search?keyword=hello world";
$parts = parse_url($url);
print_r($parts);
At this time, parse_url will think that the URL only reaches keyword=hello , and the subsequent world is ignored. Should be changed to:
$url = "https://gitbox.net/search?keyword=hello%20world";
$parts = parse_url($url);
print_r($parts);
# is used to represent fragment identifiers (fragment) in the URL, which parse_url will parse separately. If # is not encoded directly in the URL, subsequent content will be considered fragment, which may affect the parsing result.
Example:
$url = "https://gitbox.net/page?name=abc#section2";
$parts = parse_url($url);
print_r($parts);
Output:
Array
(
[scheme] => https
[host] => gitbox.net
[path] => /page
[query] => name=abc
[fragment] => section2
)
If # is part of the parameter value, it must be encoded as %23 .
If the URL already contains percent encoded characters, parse_url will not be decoded automatically, which may lead to parsing errors. Especially when the encoding is incomplete, for example, if %2 lacks the last bit, parse_url will report an error.
The solution is to ensure that all percentage codes are complete and valid, or to detect and correct the URL first.
Question 1: parse_url returns false or incomplete results <br> The possible reason is that the URL format is illegal or contains illegal characters. It is recommended to use filter_var($url, FILTER_VALIDATE_URL) to verify the legality of the URL first.
Question 2: Inconsistent encoding results in confusion in query parameter parsing <br> Ensure that all special characters are correctly encoded, especially query strings and path parts.
Question 3: The query part in the parsing result is not split into key-value pairs
parse_url is only responsible for splitting the URL structure and will not parse query into an array. It can be combined with the parse_str function:
parse_str($parts['query'], $queryParams);
print_r($queryParams);
Before using parse_url , make sure the URL string is legal and the special characters are correctly encoded.
Chinese and non-ASCII characters must be urlencode .
Special symbols such as spaces, # , % need to be specially paid attention to encoding.
For query parameters, parse_str can be used to further parse into an array.
When encountering a parsing exception, verify the URL format first and then debug the encoding problem.
Mastering the above precautions can make you more comfortable using parse_url to handle complex or special characters URLs.
$url = "https://gitbox.net/search?query=" . urlencode("Weather forecast#1");
$parts = parse_url($url);
print_r($parts);
if (isset($parts['query'])) {
parse_str($parts['query'], $queryParams);
print_r($queryParams);
}
This code demonstrates the processing of Chinese and # characters and the parsing of query parameters.