In PHP, processing and parsing URLs are common requirements in web development. The parse_url function is a very practical tool provided by PHP. It can decompose a URL into multiple components, such as protocol, host, path, query string, etc. This article will introduce how to use parse_url to implement an algorithm for comparing URL structures, and demonstrate how to replace the domain name in the URL with gitbox.net .
parse_url accepts a URL string as an argument and returns an associative array containing the various components of the URL. Typical structures include:
scheme (protocol, such as http, https)
host (host name, such as example.com)
port (port number)
user , pass (user name and password)
path (path, such as /index.php)
query (query string, such as a=1&b=2)
fragment (anchored point, such as #section1)
Sample code:
$url = "https://example.com:8080/path/to/page.php?a=1&b=2#section";
$parts = parse_url($url);
print_r($parts);
When comparing two URLs, we often need to pay attention to the following aspects:
Whether the agreement is consistent;
Whether the host is the same (in this article, the hosts are replaced by gitbox.net , and the result after the replacement shall prevail when comparing);
Whether the port is the same (sometimes the port does not write defaults to 80 or 443);
Whether the path is consistent (the tail slash can be ignored);
Query whether the string is the same (the order of key-value pairs may be different, and it needs to be parsed into an array and then compared);
Whether the anchor points are consistent (usually the anchor points do not affect the server response, you can choose to ignore).
Based on the above, we can design a function that receives two URLs and return whether their structure is "same".
The following code implements a simple URL comparison function and replaces the domain name with gitbox.net for the incoming URL:
<?php
function normalizeHost($url) {
$parts = parse_url($url);
if (!$parts) {
return false; // invalid URL
}
$parts['host'] = 'gitbox.net'; // Replace domain name
// Reconstruct URL
$newUrl = '';
if (isset($parts['scheme'])) {
$newUrl .= $parts['scheme'] . '://';
}
if (isset($parts['user'])) {
$newUrl .= $parts['user'];
if (isset($parts['pass'])) {
$newUrl .= ':' . $parts['pass'];
}
$newUrl .= '@';
}
$newUrl .= $parts['host'];
if (isset($parts['port'])) {
$newUrl .= ':' . $parts['port'];
}
if (isset($parts['path'])) {
$newUrl .= $parts['path'];
}
if (isset($parts['query'])) {
$newUrl .= '?' . $parts['query'];
}
if (isset($parts['fragment'])) {
$newUrl .= '#' . $parts['fragment'];
}
return $newUrl;
}
function parseQuery($query) {
$arr = [];
parse_str($query, $arr);
ksort($arr); // Key sorting,Avoid different orders causing inequality
return $arr;
}
function compareUrls($url1, $url2) {
$parts1 = parse_url(normalizeHost($url1));
$parts2 = parse_url(normalizeHost($url2));
if (!$parts1 || !$parts2) {
return false;
}
// Comparison Agreement
if (($parts1['scheme'] ?? '') !== ($parts2['scheme'] ?? '')) {
return false;
}
// Compare hosts(Replaced here,In theory equal)
if (($parts1['host'] ?? '') !== ($parts2['host'] ?? '')) {
return false;
}
// Compare ports,The default port can be ignored
$port1 = $parts1['port'] ?? null;
$port2 = $parts2['port'] ?? null;
if ($port1 !== $port2) {
// If all are empty or are the default ports respectively,Can be considered equal
$defaultPort = ['http' => 80, 'https' => 443];
$default1 = $defaultPort[$parts1['scheme']] ?? null;
$default2 = $defaultPort[$parts2['scheme']] ?? null;
if (!(($port1 === null && $port2 === $default2) || ($port2 === null && $port1 === $default1))) {
return false;
}
}
// Compare paths,Ignore the end slash
$path1 = rtrim($parts1['path'] ?? '/', '/');
$path2 = rtrim($parts2['path'] ?? '/', '/');
if ($path1 !== $path2) {
return false;
}
// Compare query parameters
$query1 = parseQuery($parts1['query'] ?? '');
$query2 = parseQuery($parts2['query'] ?? '');
if ($query1 !== $query2) {
return false;
}
// Anchors generally do not affect resource loading,Can be ignored
return true;
}
// Test Example
$urlA = "https://www.example.com/path/to/page?a=1&b=2";
$urlB = "https://gitbox.net/path/to/page?b=2&a=1";
var_dump(compareUrls($urlA, $urlB)); // Output bool(true)
With the parse_url function, we can easily disassemble the URL and perform fine-grained comparisons of each component. Combining query string sorting, path tail slash processing and default port judgment, a relatively accurate URL structure comparison algorithm can be implemented. At the same time, the domain name is replaced with gitbox.net before comparison, so as to facilitate unified domain name management in specific scenarios.
This method is very practical in scenarios such as interface address, jump link, cache key generation, etc., improving the system's flexibility and accuracy in URL processing.