當前位置: 首頁> 最新文章列表> 利用convert_cyr_string 函數對URL 參數進行編碼轉換,避免亂碼問題的實用技巧

利用convert_cyr_string 函數對URL 參數進行編碼轉換,避免亂碼問題的實用技巧

gitbox 2025-09-17

一、問題場景與思路概覽

典型流程是:瀏覽器或客戶端把參數進行URL 編碼後發來(例如?name=%E4%F0%E0%E2%E5%F2 ),服務器接收到的是百分號轉義的字節序列。要恢復為正確的UTF-8 文本,通常需要兩步:

  1. 對URL 編碼進行解碼( rawurldecode / urldecode ),得到原始字節序列。 php.net +1

  2. 將該字節序列從正確的單字節編碼(例如windows-1251、koi8-r、cp866 等)轉換為UTF-8。對於常見的西里爾編碼, convert_cyr_string可以在服務器PHP 版本支持的情況下完成字符集間的轉換。 php.net

注意: convert_cyr_string從PHP 7.4 開始被棄用並在PHP 8.0 中移除;在新環境中應優先使用mb_convert_encoding / iconv或第三方UTF-8 庫。下文同時給出兼容與替代方案。 php.net


二、 convert_cyr_string支持的編碼代碼(簡明)

convert_cyr_string(string $str, string $from, string $to)使用單字符標識編碼,常見標識如下:

  • k — KOI8-R

  • w — Windows-1251

  • i — ISO-8859-5

  • a / d — x-cp866(DOS CP866)

  • m — x-mac-cyrillic。 php.net


三、實用代碼模板(基於convert_cyr_string

下面給出一個實用的PHP 函數:接收一個可能被URL 編碼且使用某種西里爾編碼的字符串(來自query string 或path segment),解碼並轉換到UTF-8。注意:使用前請確保你的PHP 版本仍然支持convert_cyr_string (PHP ≤ 7.3)。若你的運行環境是PHP 8+,請跳到下一節看替代方案。

 <span><span><span class="hljs-meta">&lt;?php</span></span><span>
<span class="hljs-comment">/**
 * 將 URL 參數(可能為單字節西里爾編碼並進行百分號轉義)標準化為 UTF-8 字符串。
 *
 * $rawUrlPart: 原始 URL 部分(例如 $_GET['name'],或從 PATH_INFO/路由中取得的片段)
 * $sourceCode: 源編碼標識,使用 convert_cyr_string 的單字母代碼('w','k','i','a','d','m')
 *
 * 返回 UTF-8 字符串(若無法轉換則返回原始經過 rawurldecode 的字符串)
 */</span>
</span><span><span class="hljs-function"><span class="hljs-keyword">function</span></span></span><span> </span><span><span class="hljs-title">normalize_cyrillic_url_param</span></span><span>(</span><span><span class="hljs-params"><span class="hljs-keyword">string</span></span></span><span> </span><span><span class="hljs-variable">$rawUrlPart</span></span><span>, </span><span><span class="hljs-keyword">string</span></span><span> </span><span><span class="hljs-variable">$sourceCode</span></span><span> = </span><span><span class="hljs-string">'w'</span></span><span>): </span><span><span class="hljs-title">string</span></span><span> {
    </span><span><span class="hljs-comment">// 先把百分號轉義還原為原始字節</span></span><span>
    </span><span><span class="hljs-variable">$decoded</span></span><span> = </span><span><span class="hljs-title function_ invoke__">rawurldecode</span></span><span>(</span><span><span class="hljs-variable">$rawUrlPart</span></span><span>); </span><span><span class="hljs-comment">// 保留字節,不把 + 轉為空格(適用於 path segment);若來自 query 且有 +,可用 urldecode()</span></span><span>
    
    </span><span><span class="hljs-comment">// 如果系統有 convert_cyr_string(注意:在 PHP 8+ 已移除)</span></span><span>
    </span><span><span class="hljs-keyword">if</span></span><span> (</span><span><span class="hljs-title function_ invoke__">function_exists</span></span><span>(</span><span><span class="hljs-string">'convert_cyr_string'</span></span><span>)) {
        </span><span><span class="hljs-comment">// 先把單字節編碼(sourceCode)轉換為 windows-1251('w'),</span></span><span>
        </span><span><span class="hljs-comment">// 然後再把 windows-1251 轉為 UTF-8(使用 mb_convert_encoding)</span></span><span>
        </span><span><span class="hljs-variable">$asWin1251</span></span><span> = </span><span><span class="hljs-title function_ invoke__">convert_cyr_string</span></span><span>(</span><span><span class="hljs-variable">$decoded</span></span><span>, </span><span><span class="hljs-variable">$sourceCode</span></span><span>, </span><span><span class="hljs-string">'w'</span></span><span>);
        </span><span><span class="hljs-comment">// 將 windows-1251 二进制字节轉為 UTF-8 字符串</span></span><span>
        </span><span><span class="hljs-keyword">if</span></span><span> (</span><span><span class="hljs-title function_ invoke__">function_exists</span></span><span>(</span><span><span class="hljs-string">'mb_convert_encoding'</span></span><span>)) {
            </span><span><span class="hljs-keyword">return</span></span><span> </span><span><span class="hljs-title function_ invoke__">mb_convert_encoding</span></span><span>(</span><span><span class="hljs-variable">$asWin1251</span></span><span>, </span><span><span class="hljs-string">'UTF-8'</span></span><span>, </span><span><span class="hljs-string">'Windows-1251'</span></span><span>);
        } </span><span><span class="hljs-keyword">else</span></span><span> {
            </span><span><span class="hljs-comment">// 作為後備,嘗試 iconv(如果可用)</span></span><span>
            </span><span><span class="hljs-keyword">if</span></span><span> (</span><span><span class="hljs-title function_ invoke__">function_exists</span></span><span>(</span><span><span class="hljs-string">'iconv'</span></span><span>)) {
                </span><span><span class="hljs-variable">$utf8</span></span><span> = @</span><span><span class="hljs-title function_ invoke__">iconv</span></span><span>(</span><span><span class="hljs-string">'CP1251'</span></span><span>, </span><span><span class="hljs-string">'UTF-8//IGNORE'</span></span><span>, </span><span><span class="hljs-variable">$asWin1251</span></span><span>);
                </span><span><span class="hljs-keyword">return</span></span><span> </span><span><span class="hljs-variable">$utf8</span></span><span> !== </span><span><span class="hljs-literal">false</span></span><span> ? </span><span><span class="hljs-variable">$utf8</span></span><span> : </span><span><span class="hljs-variable">$asWin1251</span></span><span>;
            }
            </span><span><span class="hljs-comment">// 都不可用時,返回原始解碼字符串</span></span><span>
            </span><span><span class="hljs-keyword">return</span></span><span> </span><span><span class="hljs-variable">$asWin1251</span></span><span>;
        }
    }

    </span><span><span class="hljs-comment">// 若沒有 convert_cyr_string(如 PHP 8+),直接返回原始解碼字符串,讓調用方使用替代方案</span></span><span>
    </span><span><span class="hljs-keyword">return</span></span><span> </span><span><span class="hljs-variable">$decoded</span></span><span>;
}
</span></span>

示例使用:

 <span><span><span class="hljs-comment">// 假設 URL 為: /?name=%D0%9C%D0%B8%D1%80</span></span><span>
</span><span><span class="hljs-variable">$raw</span></span><span> = </span><span><span class="hljs-variable">$_GET</span></span><span>[</span><span><span class="hljs-string">'name'</span></span><span>] ?? </span><span><span class="hljs-string">''</span></span><span>;
</span><span><span class="hljs-variable">$name</span></span><span> = </span><span><span class="hljs-title function_ invoke__">normalize_cyrillic_url_param</span></span><span>(</span><span><span class="hljs-variable">$raw</span></span><span>, </span><span><span class="hljs-string">'w'</span></span><span>); </span><span><span class="hljs-comment">// 假設客户端以 Windows-1251 發送</span></span><span>
</span><span><span class="hljs-keyword">echo</span></span><span> </span><span><span class="hljs-variable">$name</span></span><span>; </span><span><span class="hljs-comment">// 輸出正確的 UTF-8 名稱</span></span><span>
</span></span>

%轉義的解碼應根據你的參數來源選擇rawurldecode() (更適用於path segments)或urldecode() (query string 中+表示空格時)。關於兩者的差異與推薦用法請參見官方文檔。 php.net guides.codepath.com


四、PHP 8+(或想避免已棄用函數)——推薦替代做法

對新項目或PHP 8+ 環境,推薦使用mb_detect_encoding + mb_convert_encoding / iconv ,或讓客戶端統一使用UTF-8(最佳實踐)。示例:

 <span><span><span class="hljs-meta">&lt;?php</span></span><span>
</span><span><span class="hljs-function"><span class="hljs-keyword">function</span></span></span><span> </span><span><span class="hljs-title">normalize_cyrillic_url_param_modern</span></span><span>(</span><span><span class="hljs-params"><span class="hljs-keyword">string</span></span></span><span> </span><span><span class="hljs-variable">$rawUrlPart</span></span><span>, </span><span><span class="hljs-keyword">array</span></span><span> </span><span><span class="hljs-variable">$tryEncodings</span></span><span> = [</span><span><span class="hljs-string">'Windows-1251'</span></span><span>,</span><span><span class="hljs-string">'KOI8-R'</span></span><span>,</span><span><span class="hljs-string">'CP866'</span></span><span>]) : </span><span><span class="hljs-title">string</span></span><span> {
    </span><span><span class="hljs-variable">$decoded</span></span><span> = </span><span><span class="hljs-title function_ invoke__">rawurldecode</span></span><span>(</span><span><span class="hljs-variable">$rawUrlPart</span></span><span>);
    </span><span><span class="hljs-keyword">if</span></span><span> (</span><span><span class="hljs-title function_ invoke__">function_exists</span></span><span>(</span><span><span class="hljs-string">'mb_detect_encoding'</span></span><span>) &amp;&amp; </span><span><span class="hljs-title function_ invoke__">function_exists</span></span><span>(</span><span><span class="hljs-string">'mb_convert_encoding'</span></span><span>)) {
        </span><span><span class="hljs-comment">// 嘗試检测并转换到 UTF-8</span></span><span>
        </span><span><span class="hljs-keyword">foreach</span></span><span> (</span><span><span class="hljs-variable">$tryEncodings</span></span><span> </span><span><span class="hljs-keyword">as</span></span><span> </span><span><span class="hljs-variable">$enc</span></span><span>) {
            </span><span><span class="hljs-comment">// 检测字节串是否為此编码(檢測可能不可靠,故采用嘗試转换后判断)</span></span><span>
            </span><span><span class="hljs-variable">$maybe</span></span><span> = @</span><span><span class="hljs-title function_ invoke__">mb_convert_encoding</span></span><span>(</span><span><span class="hljs-variable">$decoded</span></span><span>, </span><span><span class="hljs-string">'UTF-8'</span></span><span>, </span><span><span class="hljs-variable">$enc</span></span><span>);
            </span><span><span class="hljs-keyword">if</span></span><span> (</span><span><span class="hljs-variable">$maybe</span></span><span> !== </span><span><span class="hljs-literal">false</span></span><span>) {
                </span><span><span class="hljs-comment">// 簡單驗證:轉換後再轉回去是否與原字節長度相近(不是 100% 可靠,但實用)</span></span><span>
                </span><span><span class="hljs-variable">$back</span></span><span> = @</span><span><span class="hljs-title function_ invoke__">mb_convert_encoding</span></span><span>(</span><span><span class="hljs-variable">$maybe</span></span><span>, </span><span><span class="hljs-variable">$enc</span></span><span>, </span><span><span class="hljs-string">'UTF-8'</span></span><span>);
                </span><span><span class="hljs-keyword">if</span></span><span> (</span><span><span class="hljs-variable">$back</span></span><span> !== </span><span><span class="hljs-literal">false</span></span><span> &amp;&amp; </span><span><span class="hljs-title function_ invoke__">strlen</span></span><span>(</span><span><span class="hljs-variable">$back</span></span><span>) &gt;= </span><span><span class="hljs-title function_ invoke__">strlen</span></span><span>(</span><span><span class="hljs-variable">$decoded</span></span><span>) - </span><span><span class="hljs-number">2</span></span><span>) {
                    </span><span><span class="hljs-keyword">return</span></span><span> </span><span><span class="hljs-variable">$maybe</span></span><span>;
                }
            }
        }
    }
    </span><span><span class="hljs-comment">// 最後退回原始解碼後的字符串(可能已經是 UTF-8)</span></span><span>
    </span><span><span class="hljs-keyword">return</span></span><span> </span><span><span class="hljs-variable">$decoded</span></span><span>;
}
</span></span>

此外,也可以使用成熟的第三方庫(例如voku/portable-utf8等)來處理複雜的編碼/規範化問題,尤其在需要高魯棒性的生產系統中非常建議。 GitHub


五、實戰建議與要點清單

  1. 優先級:最佳方案是讓所有客戶端統一使用UTF-8(在HTML <meta charset="utf-8"> 、HTTP header、API 文檔中明確約定)。這是從根本上避免亂碼的最佳做法。

  2. 解碼函數選擇:收到URL 編碼時,若是query 部分( ?a=b+c ), urldecode()會把+變成空格;若是path segment,優先用rawurldecode()php.net +1

  3. 服務器端轉換:在只能處理歷史數據或第三方系統時,採用上文所示的轉換鏈( rawurldecodeconvert_cyr_string (若可用)或mb_convert_encoding / iconv )將字節序列轉換到UTF-8。 php.net +1

  4. 檢測與回退:自動檢測編碼並非100% 準確。建議對關鍵場景增加“可信度”判斷(比如檢測後再反向轉換檢查一致性),並記錄失敗情況以便人工干預或增加特定規則。

  5. 棄用注意convert_cyr_string在PHP 7.4 標記為deprecated,並在PHP 8.0 移除;若你的代碼需要在現代PHP 環境長期運行,請實現兼容的替代方案( mb_convert_encoding / iconv / 第三方庫)。 php.net


六、快速對照示例(兩端兼容)

場景A:遺留客戶端以KOI8-R 發來參數(query),服務器希望得到UTF-8:

 <span><span><span class="hljs-variable">$raw</span></span><span> = </span><span><span class="hljs-variable">$_GET</span></span><span>[</span><span><span class="hljs-string">'q'</span></span><span>];                </span><span><span class="hljs-comment">// 原始 %xx 字符串</span></span><span>
</span><span><span class="hljs-variable">$bytes</span></span><span> = </span><span><span class="hljs-title function_ invoke__">rawurldecode</span></span><span>(</span><span><span class="hljs-variable">$raw</span></span><span>);      </span><span><span class="hljs-comment">// 得到二進製字節</span></span><span>
</span><span><span class="hljs-comment">// 若可用 convert_cyr_string:</span></span><span>
</span><span><span class="hljs-variable">$win</span></span><span> = </span><span><span class="hljs-title function_ invoke__">convert_cyr_string</span></span><span>(</span><span><span class="hljs-variable">$bytes</span></span><span>, </span><span><span class="hljs-string">'k'</span></span><span>, </span><span><span class="hljs-string">'w'</span></span><span>); </span><span><span class="hljs-comment">// koi8-r -&gt; windows-1251</span></span><span>
</span><span><span class="hljs-variable">$utf8</span></span><span> = </span><span><span class="hljs-title function_ invoke__">mb_convert_encoding</span></span><span>(</span><span><span class="hljs-variable">$win</span></span><span>, </span><span><span class="hljs-string">'UTF-8'</span></span><span>, </span><span><span class="hljs-string">'Windows-1251'</span></span><span>);
</span><span><span class="hljs-keyword">echo</span></span><span> </span><span><span class="hljs-variable">$utf8</span></span><span>;
</span></span>

場景B:PHP 8+ 環境,用現代方法嘗試自動轉換:

 <span><span><span class="hljs-variable">$raw</span></span><span> = </span><span><span class="hljs-variable">$_GET</span></span><span>[</span><span><span class="hljs-string">'q'</span></span><span>];
</span><span><span class="hljs-variable">$bytes</span></span><span> = </span><span><span class="hljs-title function_ invoke__">rawurldecode</span></span><span>(</span><span><span class="hljs-variable">$raw</span></span><span>);
</span><span><span class="hljs-variable">$utf8</span></span><span> = </span><span><span class="hljs-title function_ invoke__">normalize_cyrillic_url_param_modern</span></span><span>(</span><span><span class="hljs-variable">$raw</span></span><span>, [</span><span><span class="hljs-string">'Windows-1251'</span></span><span>,</span><span><span class="hljs-string">'KOI8-R'</span></span><span>,</span><span><span class="hljs-string">'CP866'</span></span><span>]);
</span><span><span class="hljs-keyword">echo</span></span><span> </span><span><span class="hljs-variable">$utf8</span></span><span>;
</span></span>

七、結語(要點回顧)

  • convert_cyr_string曾經是處理西里爾單字節編碼間互轉的便捷函數,支持的代碼包括k,w,i,a,d,m 。但該函數在PHP 7.4 棄用並在PHP 8.0 移除,建議新項目改用mb_convert_encoding / iconv或第三方庫。 php.net

  • 面對URL 參數亂碼問題,關鍵是:先正確解碼百分號( rawurldecode / urldecode ),再根據實際源編碼將字節序列轉換為UTF-8。對path 與query 的解碼函數選擇要注意兩者對空格( + )的處理差異。 php.net +1

  • 最穩妥的長期策略是統一使用UTF-8,並在接口文檔與客戶端實現中明確約定編碼規範;在不得已處理歷史或第三方數據時,採用上述轉換鏈並增加檢測與回退機制以保障魯棒性。 GitHub php.net

  • 相關標籤:

    URL