When processing large text containing multilingual characters, if PHP's string function does not support multi-byte encoding, it is easy to cause truncation of characters and garbled code. Especially in scenarios where a large piece of text needs to be displayed, how to safely intercept strings become a key issue. This article will introduce how to use the mb_strcut function to implement string pagination display, which not only ensures character integrity, but also facilitates front-end pagination display.
substr() is a common string truncation function in PHP, but it is processed by bytes. If the text contains multi-byte characters such as Chinese, Japanese, or Korean, substr() can easily truncate characters, resulting in garbled code. In contrast, mb_strcut() is a function designed for multibyte character sets, which safely intercepts strings by bytes while keeping characters from being truncated.
mb_strcut ( string $string , int $start [, int $length [, string $encoding ]] ) : string
$string : The string to be processed.
$start : Start byte offset.
$length : The number of bytes intercepted.
$encoding : character encoding, generally using UTF-8 .
Note: mb_strcut() is different from mb_substr() , which intercepts strings based on the position of bytes rather than characters, but will ensure that the character itself is not truncated.
Sets the maximum number of bytes displayed per page, such as 1000 bytes.
Use mb_strcut() to intercept the corresponding text with the current page number.
When paging, you need to divide the total length of the characters by the number of bytes per page to determine the total number of pages.
function paginateText($text, $page = 1, $bytesPerPage = 1000, $encoding = 'UTF-8') {
$totalBytes = strlen($text);
$start = ($page - 1) * $bytesPerPage;
// Securely intercept strings,Avoid truncation of characters
$paginated = mb_strcut($text, $start, $bytesPerPage, $encoding);
// Construct paging data
$totalPages = ceil($totalBytes / $bytesPerPage);
return [
'content' => $paginated,
'page' => $page,
'total_pages' => $totalPages
];
}
Suppose you have a long article stored in the database and you want the user to read page by page in the front end:
$fullText = file_get_contents('https://gitbox.net/static/long_article.txt');
$page = isset($_GET['page']) ? (int)$_GET['page'] : 1;
$result = paginateText($fullText, $page);
// Output the current page content
echo nl2br(htmlspecialchars($result['content']));
// Pagination Navigation
for ($i = 1; $i <= $result['total_pages']; $i++) {
echo "<a href=\"?page=$i\">1. $i Page</a> ";
}
When using mb_strcut() , make sure that the encoding of the text is consistent with the encoding parameters of the function.
If you want to paging based on the number of characters instead of bytes, you should use mb_substr() .
In actual use, it may also be necessary to optimize performance with cache strategies, especially when the text content is large.
Using mb_strcut() for string pagination display is a way to take into account performance and encoding security when processing large text content. Through it, it can effectively prevent garbled problems caused by truncated characters, bringing a better user experience to multilingual websites. Hope this article helps you deal with large text paging issues more efficiently in your project.