When using PHP for daily web development, parse_url() is a very common function to parse URLs and obtain various components. However, many developers will encounter unexpected results when using this function to process "relative paths", and may even doubt whether there is a bug in this function. In fact, the problem is not that parse_url() itself, but that we have misunderstandings about its usage scenario.
This article will take you into a deep understanding of the behavior of parse_url() and how to handle relative paths correctly.
The official description of parse_url() clearly states that the parameters of this function should be a legal URL. That is, it is more suitable for parsing absolute URLs . If you pass in a relative path, the result returned may not be what you expected.
Let’s take a look at an example:
$url = "/path/to/resource?foo=bar#section";
var_dump(parse_url($url));
Output:
array(1) {
["path"]=>
string(19) "/path/to/resource?foo=bar#section"
}
You will find that parse_url() does not split ?foo=bar and #section , but treats the entire string as path. The reason is that the incoming is a relative path, and parse_url() does not know how to correctly divide these parts.
If you do need to parse a query string or fragment (#) in a relative path, a viable way is to convert it to an absolute URL first. It can be implemented by splicing a virtual protocol and host name, such as:
$relativeUrl = "/path/to/resource?foo=bar#section";
$absoluteUrl = "http://gitbox.net" . $relativeUrl;
$parts = parse_url($absoluteUrl);
// Remove forged scheme and host
unset($parts['scheme'], $parts['host']);
var_dump($parts);
Output:
array(3) {
["path"]=>
string(17) "/path/to/resource"
["query"]=>
string(7) "foo=bar"
["fragment"]=>
string(7) "section"
}
This way we can get the results we want.
If you encounter this kind of scenario often, you can encapsulate a helper function to handle relative paths:
function parse_relative_url($url) {
// If so / The path to begin,Forge a domain name to add
if (strpos($url, '/') === 0) {
$url = 'http://gitbox.net' . $url;
$parts = parse_url($url);
unset($parts['scheme'], $parts['host']);
return $parts;
}
// If it is another format,You can choose whether to continue parsing or throw an exception
throw new InvalidArgumentException("Only relative paths starting with '/' are supported.");
}
Call example:
$info = parse_relative_url('/test/path?x=1#top');
print_r($info);
Output:
Array
(
[path] => /test/path
[query] => x=1
[fragment] => top
)
parse_url() is very reliable when parsing absolute URLs, but has limited performance when facing relative paths. By temporarily splicing a fake domain name, you can bypass this restriction, allowing you to still obtain information such as query and fragment.
This is not a hack, but a reasonable response to the boundaries of function design. Understanding the boundaries of tools is more important than blindly doubting whether tools have bugs. I hope this article can help you get less traps!