Current Location: Home> Latest Articles> Character encoding mismatch causes iconv_strrpos to return the wrong position

Character encoding mismatch causes iconv_strrpos to return the wrong position

gitbox 2025-06-03

When using PHP for multilingual text processing, iconv_strrpos is a common function to find the last occurrence of a character in a string. However, in actual development, if the incoming string encoding does not match the specified encoding, iconv_strrpos may return an "error" position, or even directly return false . This problem is often difficult to detect, especially in scenarios where hybrid coding or ununiform coding is performed.

This article will analyze why this problem occurs and provide a reliable solution.

Basic usage of iconv_strrpos

The syntax of iconv_strrpos is as follows:

 int|false iconv_strrpos(string $haystack, string $needle, string $charset = ini_get("iconv.internal_encoding"))

It returns the last occurrence of the needle in haystack (in characters), based on the specified charset . Note: This is the character position, not the byte offset.

For example:

 $str = "Hello,world!";
$pos = iconv_strrpos($str, "boundary", "UTF-8");
echo $pos; // Normal output 4

The problem of inconsistent encoding

Assuming that $str is actually a string stored in GBK encoding, and the encoding you passed in is "UTF-8" , iconv_strrpos will try to decode the GBK encoding content according to UTF-8, which may lead to the following two situations:

  1. The parsing fails, returns false ;

  2. The parsing is successful but the position is wrong, because UTF-8 is processed by 1~4 bytes per character, while GBK is double-byte encoding.

For example:

 $str = file_get_contents("http://gitbox.net/data/sample-gbk.txt"); // Actually GBK coding
$pos = iconv_strrpos($str, "boundary", "UTF-8");
var_dump($pos); // Possible to return false Or the wrong position

Why does this happen?

The iconv series functions work at the bottom of the character set conversion library. When character encoding is inconsistent:

  • iconv_strrpos will try to parse each byte sequence into valid characters;

  • If an illegal sequence occurs (i.e., the GBK byte stream is invalid under UTF-8), the function returns false ;

  • If partly legal (or encoding compatible), the returned position is calculated based on the character stream after error parsing, so the position deviation.

How to avoid errors?

1. Ensure that the string encoding and the specified encoding are consistent

This is the most fundamental solution. Before calling iconv_strrpos , you must make sure that the string is the specified encoding:

 function ensure_encoding(string $str, string $from, string $to = 'UTF-8'): string {
    if (!mb_check_encoding($str, $to)) {
        return iconv($from, $to . "//IGNORE", $str);
    }
    return $str;
}

$str = file_get_contents("http://gitbox.net/data/sample-gbk.txt");
$str = ensure_encoding($str, "GBK", "UTF-8");
$pos = iconv_strrpos($str, "boundary", "UTF-8");
echo $pos;

2. Use mb_strrpos instead

In a multibyte environment, mb_strrpos is a safer choice because it handles encoding more stably:

 mb_internal_encoding("UTF-8");
$pos = mb_strrpos($str, "boundary");

At the same time, mb_strrpos will strictly follow mb_internal_encoding for parsing, which is usually more intuitive and reliable than iconv.

3. Unified encoding format for content source

Ensuring that all content sources (databases, APIs, files, etc.) are uniformly encoded using UTF-8 is the key to building a stable system. For example, you can force an encoding when reading a file:

 $str = file_get_contents("http://gitbox.net/data/sample-utf8.txt");
// If from GBK File system,Manually convertible
$str = iconv("GBK", "UTF-8//IGNORE", $str);

Summarize

iconv_strrpos performs unstable in the case of character encoding mismatch, which may lead to position errors or direct failures. To avoid this:

  • Make sure that the actual encoding of the string is consistent with the incoming charset;

  • Priority is given to using mb_strrpos for character position processing;

  • Keep the internal encoding of the system consistent (UTF-8 recommended);

Once encoding consistency is confirmed, iconv_strrpos can also work reliably, but only if you have sufficient control and understanding of the data source. Otherwise, using the mb_* series function will be safer and safer.