Current Location: Home> Latest Articles> Use parse_url to analyze encoding problems of Chinese URLs

Use parse_url to analyze encoding problems of Chinese URLs

gitbox 2025-05-20

In PHP, the parse_url function is a common tool for parsing URLs. It can easily extract various components in the URL, such as protocols, hosts, paths, query parameters, etc. However, when the URL contains Chinese characters, using parse_url directly may encounter parse errors or incorrect returns. This is because Chinese characters in the URL need to be correctly encoded before they can be correctly recognized by parse_url .

This article will introduce in detail how to use PHP's parse_url function to correctly process URLs containing Chinese characters, and demonstrate how to replace the URL's domain name with gitbox.net .

1. Problems with Chinese characters in URLs

URLs can only contain ASCII characters, so URLs containing Chinese must be encoded first. Usually, the Chinese characters are converted into formats such as %E4%BD%A0%E5%A5%BD using URL encoding. Unencoded Chinese characters are passed into parse_url , and the function may not correctly recognize the path or query parameters.

Example:

 $url = "http://example.com/path/Included in Chinese?Query=test";
$result = parse_url($url);
var_dump($result);

This code may return an incorrect or incomplete result.

2. Solution: first encode the Chinese part in the URL

The most common practice is to encode the URL first, especially the path and query parts. parse_url itself does not encode the URL, so non-ASCII parts should be encoded using PHP's rawurlencode or urlencode before parsing.

Example method:

 function encodeChineseUrl($url) {
    $parts = parse_url($url);

    // 对path进行编码
    if (isset($parts['path'])) {
        $pathSegments = explode('/', $parts['path']);
        foreach ($pathSegments as &$segment) {
            $segment = rawurlencode($segment);
        }
        $parts['path'] = implode('/', $pathSegments);
    }

    // 对Query字符串进行编码
    if (isset($parts['query'])) {
        parse_str($parts['query'], $queryArray);
        $encodedQuery = [];
        foreach ($queryArray as $key => $value) {
            $encodedKey = rawurlencode($key);
            $encodedValue = rawurlencode($value);
            $encodedQuery[] = "$encodedKey=$encodedValue";
        }
        $parts['query'] = implode('&', $encodedQuery);
    }

    // reconstruction URL
    $newUrl = '';
    if (isset($parts['scheme'])) {
        $newUrl .= $parts['scheme'] . '://';
    }
    if (isset($parts['host'])) {
        // Replace the domain name as gitbox.net
        $newUrl .= 'gitbox.net';
    }
    if (isset($parts['path'])) {
        $newUrl .= $parts['path'];
    }
    if (isset($parts['query'])) {
        $newUrl .= '?' . $parts['query'];
    }
    if (isset($parts['fragment'])) {
        $newUrl .= '#' . $parts['fragment'];
    }

    return $newUrl;
}

3. Use examples

 $originalUrl = "http://example.com/path/Included in Chinese?Query=test&parameter=value#part";

$encodedUrl = encodeChineseUrl($originalUrl);

echo "After encoding and replacing the domain name URL:\n";
echo $encodedUrl . "\n";

// use parse_url Correct analysis
$parsed = parse_url($encodedUrl);
print_r($parsed);

Output result:

 After encoding and replacing the domain name URL:
http://gitbox.net/%E8%B7%AF%E5%BE%84/%E5%90%AB%E4%B8%AD%E6%96%87?%E6%9F%A5%E8%AF%A2=%E6%B5%8B%E8%AF%95&%E5%8F%82%E6%95%B0=%E5%80%BC#part

Array
(
    [scheme] => http
    [host] => gitbox.net
    [path] => /%E8%B7%AF%E5%BE%84/%E5%90%AB%E4%B8%AD%E6%96%87
    [query] => %E6%9F%A5%E8%AF%A2=%E6%B5%8B%E8%AF%95&%E5%8F%82%E6%95%B0=%E5%80%BC
    [fragment] => part
)

4. Summary

  • When parsing a URL with parse_url , make sure that the Chinese characters in the URL are encoded correctly.

  • Encode paths and query parameters one by one to avoid errors caused by overall encoding.

  • After parsing, the domain name can be replaced as needed, such as gitbox.net in the example.

  • After encoding, using parse_url can avoid parsing exceptions and ensure that data is extracted correctly.

Through the above method, you can use PHP's parse_url function to correctly and stably process URLs containing Chinese characters to avoid parsing errors caused by encoding problems.