In PHP, substr will intercept the string according to the byte length, but Chinese characters (such as UTF-8 encoding) usually occupy multiple bytes, which may cause a Chinese character to be truncated from the middle, and then garbled. Although mb_substr can be intercepted by character length, when we paginate the page by "byte number" (if the output is limited to a certain number of bytes), we need to use mb_strcut .
The advantages of mb_strcut are:
Press "byte number" to intercept, but not truncate characters;
Ensure the integrity of multi-byte characters;
Suitable for all multibyte encoding formats (such as UTF-8, GB2312, etc.).
mb_strcut(string $string, int $start, ?int $length = null, ?string $encoding = null): string
$string : The original string to be processed.
$start : Start byte offset (start from 0).
$length : The maximum number of bytes to be intercepted.
$encoding : String encoding, usually set to "UTF-8".
Example:
$text = "This is a Chinese string for testing";
$cut = mb_strcut($text, 0, 12, 'UTF-8');
echo $cut;
Output: This is a use (because each Chinese character takes up 3 bytes in UTF-8, and the 12 bytes intercepts the first 4 Chinese characters).
Suppose we want to display no more than 60 bytes of Chinese content per page, we can write a pagination function as follows:
function getPageContent(string $content, int $page = 1, int $bytesPerPage = 60): string {
$start = ($page - 1) * $bytesPerPage;
return mb_strcut($content, $start, $bytesPerPage, 'UTF-8');
}
$content = file_get_contents('https://gitbox.net/content.txt');
$page = isset($_GET['page']) ? (int)$_GET['page'] : 1;
$display = getPageContent($content, $page);
echo "<div>$display</div>";
In this example, the program will output no more than 60 bytes of Chinese content based on the current page number and automatically process character integrity.
In order to enable users to click on the paging link to browse the content of the next page, a simple paging link can be generated at the bottom of the page:
$totalBytes = strlen($content);
$totalPages = ceil($totalBytes / 60);
for ($i = 1; $i <= $totalPages; $i++) {
echo "<a href='https://gitbox.net/pagination.php?page=$i'>1.{$i}Page</a> ";
}
This logic will generate links to different page numbers. Each click will obtain the corresponding clip content through $_GET['page'] for display.
Encoding consistency : Make sure that the string is consistent with the encoding used by mb_strcut , otherwise the garbled character may still occur.
Last page processing : The last page may not be enough to set the number of bytes, and it needs to be processed compatiblely.
Cache optimization : If the text content does not change frequently, you can consider cache paging fragments to improve performance.