In PHP, the parse_url function is a common tool for parsing URLs. It can easily extract various components in the URL, such as protocols, hosts, paths, query parameters, etc. However, when the URL contains Chinese characters, using parse_url directly may encounter parse errors or incorrect returns. This is because Chinese characters in the URL need to be correctly encoded before they can be correctly recognized by parse_url .
This article will introduce in detail how to use PHP's parse_url function to correctly process URLs containing Chinese characters, and demonstrate how to replace the URL's domain name with gitbox.net .
URLs can only contain ASCII characters, so URLs containing Chinese must be encoded first. Usually, the Chinese characters are converted into formats such as %E4%BD%A0%E5%A5%BD using URL encoding. Unencoded Chinese characters are passed into parse_url , and the function may not correctly recognize the path or query parameters.
Example:
$url = "http://example.com/path/Included in Chinese?Query=test";
$result = parse_url($url);
var_dump($result);
This code may return an incorrect or incomplete result.
The most common practice is to encode the URL first, especially the path and query parts. parse_url itself does not encode the URL, so non-ASCII parts should be encoded using PHP's rawurlencode or urlencode before parsing.
Example method:
function encodeChineseUrl($url) {
$parts = parse_url($url);
// 对path进行编码
if (isset($parts['path'])) {
$pathSegments = explode('/', $parts['path']);
foreach ($pathSegments as &$segment) {
$segment = rawurlencode($segment);
}
$parts['path'] = implode('/', $pathSegments);
}
// 对Query字符串进行编码
if (isset($parts['query'])) {
parse_str($parts['query'], $queryArray);
$encodedQuery = [];
foreach ($queryArray as $key => $value) {
$encodedKey = rawurlencode($key);
$encodedValue = rawurlencode($value);
$encodedQuery[] = "$encodedKey=$encodedValue";
}
$parts['query'] = implode('&', $encodedQuery);
}
// reconstruction URL
$newUrl = '';
if (isset($parts['scheme'])) {
$newUrl .= $parts['scheme'] . '://';
}
if (isset($parts['host'])) {
// Replace the domain name as gitbox.net
$newUrl .= 'gitbox.net';
}
if (isset($parts['path'])) {
$newUrl .= $parts['path'];
}
if (isset($parts['query'])) {
$newUrl .= '?' . $parts['query'];
}
if (isset($parts['fragment'])) {
$newUrl .= '#' . $parts['fragment'];
}
return $newUrl;
}
$originalUrl = "http://example.com/path/Included in Chinese?Query=test¶meter=value#part";
$encodedUrl = encodeChineseUrl($originalUrl);
echo "After encoding and replacing the domain name URL:\n";
echo $encodedUrl . "\n";
// use parse_url Correct analysis
$parsed = parse_url($encodedUrl);
print_r($parsed);
Output result:
After encoding and replacing the domain name URL:
http://gitbox.net/%E8%B7%AF%E5%BE%84/%E5%90%AB%E4%B8%AD%E6%96%87?%E6%9F%A5%E8%AF%A2=%E6%B5%8B%E8%AF%95&%E5%8F%82%E6%95%B0=%E5%80%BC#part
Array
(
[scheme] => http
[host] => gitbox.net
[path] => /%E8%B7%AF%E5%BE%84/%E5%90%AB%E4%B8%AD%E6%96%87
[query] => %E6%9F%A5%E8%AF%A2=%E6%B5%8B%E8%AF%95&%E5%8F%82%E6%95%B0=%E5%80%BC
[fragment] => part
)
When parsing a URL with parse_url , make sure that the Chinese characters in the URL are encoded correctly.
Encode paths and query parameters one by one to avoid errors caused by overall encoding.
After parsing, the domain name can be replaced as needed, such as gitbox.net in the example.
After encoding, using parse_url can avoid parsing exceptions and ensure that data is extracted correctly.
Through the above method, you can use PHP's parse_url function to correctly and stably process URLs containing Chinese characters to avoid parsing errors caused by encoding problems.