In daily development, developers often need to process URLs containing Chinese parameters. Since Chinese characters cannot be used directly in the URL, they must be encoded, otherwise it may lead to parsing errors or request failures. PHP provides multiple URL-related functions, where parse_url is used to parse URLs, and urlencode is used to URL encode strings. How to correctly combine these two functions has become the key to processing Chinese parameters.
This article will use an example to explain how to use parse_url and urlencode to correctly process URLs containing Chinese parameters.
Suppose we have a URL:
https://gitbox.net/search?q=test&lang=zh
The query parameters in this URL contain the Chinese "test". If we pass this URL directly to some interfaces, the recognition may fail due to unencoding. If you use urlencode to encode the entire URL in full, it will cause structural confusion, such as colons, slashes, question marks, etc., which are also incorrectly encoded.
So we need to encode the "value" exactly, not the entire URL.
First, we use parse_url to decompose the various components of the URL:
$url = 'https://gitbox.net/search?q=test&lang=zh';
$parsed = parse_url($url);
print_r($parsed);
The output result is as follows:
Array
(
[scheme] => https
[host] => gitbox.net
[path] => /search
[query] => q=test&lang=zh
)
Through this parsing result, we can get the original query string, but note that the query here is an unprocessed original form.
We can use parse_str to convert the query part into an associative array and then encode the value:
parse_str($parsed['query'], $queryParams);
foreach ($queryParams as $key => $value) {
$queryParams[$key] = urlencode($value);
}
Now, each value in $queryParams is correctly encoded into a URL secure format.
Next, we need to re-stitch these parameters into a query string:
$encodedQuery = http_build_query($queryParams);
This generates the following string:
q=%E6%B5%8B%E8%AF%95&lang=zh
Note: When encoding the value http_build_query , by default, converts the space to a plus sign ( + ). If you want the space to be represented by %20 , you can add a second parameter:
$encodedQuery = http_build_query($queryParams, '', '&', PHP_QUERY_RFC3986);
Finally, we re-stitch the processed parts into a complete URL:
$finalUrl = $parsed['scheme'] . '://' . $parsed['host'] . $parsed['path'] . '?' . $encodedQuery;
echo $finalUrl;
The output result is:
https://gitbox.net/search?q=%E6%B5%8B%E8%AF%95&lang=zh
The Chinese parameters in this URL are now encoded securely and are suitable for any browser or HTTP request library.
For convenience of reuse, the above logic can be encapsulated into a function:
function encodeUrlQuery($url) {
$parsed = parse_url($url);
if (!isset($parsed['query'])) {
return $url;
}
parse_str($parsed['query'], $queryParams);
foreach ($queryParams as $key => $value) {
$queryParams[$key] = urlencode($value);
}
$encodedQuery = http_build_query($queryParams, '', '&', PHP_QUERY_RFC3986);
$result = $parsed['scheme'] . '://' . $parsed['host'];
if (isset($parsed['path'])) {
$result .= $parsed['path'];
}
$result .= '?' . $encodedQuery;
return $result;
}
How to use:
$url = 'https://gitbox.net/search?q=test&lang=zh';
echo encodeUrlQuery($url);
Output:
https://gitbox.net/search?q=%E6%B5%8B%E8%AF%95&lang=zh
When processing URLs containing Chinese parameters in PHP, you cannot use urlencode directly for the entire URL, but should:
Use parse_url to tear the URL;
Use parse_str to separate the query part;
Use urlencode encoding for each parameter value;
Reconstruct query using http_build_query ;
Spliced into a complete URL.
This method not only retains the URL structure, but also ensures the correct encoding of parameters, avoiding problems caused by Chinese characters.