quoted-printable is an encoding method for mail transmission, which encodes non-ASCII characters into the format =XX (XX is the hexadecimal value of the character), thus ensuring that the content is not corrupted when transmitted in protocols such as SMTP. PHP provides quoted_printable_decode to restore the encoding.
Many developers have found that after decoding with quoted_printable_decode , unrecognized symbols or garbled codes will appear in the string. The main reasons are:
Character encoding mismatch
quoted-printable is only responsible for decoding the byte content. The decoded string is still a byte stream and needs to be converted correctly according to the actual encoding (such as UTF-8, ISO-8859-1, etc.), otherwise it will cause garbled code.
Escape characters are not fully processed <br> Some email content may use multiple encoding methods, or there may be encodings that do not strictly follow the quoted-printable specification, and an exception occurs during decoding.
Multi-byte characters are split and encoded <br> For multi-byte characters (such as Chinese and Japanese), quoted-printable may split the bytes into multiple parts when encoding, and it needs to be correctly combined after decoding.
Usually, the mail content declares a character set (charset) in the header information, such as UTF-8, GBK, etc. After decoding, it is recommended to use PHP's mb_convert_encoding function to convert the string into the correct encoding format.
<?php
// Assumptions $encoded yes quoted-printable Encoded string
$decoded = quoted_printable_decode($encoded);
// Convert to UTF-8 coding
$corrected = mb_convert_encoding($decoded, 'UTF-8', 'ISO-8859-1');
echo $corrected;
?>
If the email is UTF-8 encoding, the second parameter can be changed to the corresponding encoding.
In quoted-printable encoding, the soft line break ( =\r\n ) represents a folded line, but sometimes newlines or spaces will remain after decoding, affecting the display. You can use regular cleaning:
<?php
$decoded = quoted_printable_decode($encoded);
// Remove soft line breaks
$cleaned = preg_replace('/=\r?\n/', '', $decoded);
echo $cleaned;
?>
Make sure that the decoded string is a complete multi-byte sequence before transcoding. You can use mb_check_encoding to check the encoding validity to avoid garbled code due to byte incompleteness.
<?php
$decoded = quoted_printable_decode($encoded);
if (!mb_check_encoding($decoded, 'UTF-8')) {
// 可以尝试不同coding转换
$decoded = mb_convert_encoding($decoded, 'UTF-8', 'ISO-8859-1');
}
echo $decoded;
?>
When processing email content, it is recommended to read the Content-Type and charset information at the head of the email first, automatically identify and encode it, and process it in combination with quoted-printable decoding.
<?php
// Pseudocode example
$content_type = 'text/plain; charset=ISO-8859-1'; // parsing from email header
preg_match('/charset=([^;]+)/i', $content_type, $matches);
$charset = $matches[1] ?? 'UTF-8';
$decoded = quoted_printable_decode($encoded);
$corrected = mb_convert_encoding($decoded, 'UTF-8', $charset);
echo $corrected;
?>
Sometimes the string may have been encoded multiple times to avoid repeated calls to quoted_printable_decode on the same data, which may result in data corruption.
When using quoted_printable_decode to process encoded content, the key is to understand that it only does quoted-printable restore, and subsequent character encoding conversion and cleaning are the keys to ensuring the correct display of strings. Just master the following key points:
Read and respect the character encoding statement of emails
Use mb_convert_encoding for encoding conversion
Clean up soft line breaks and redundant characters
Check the integrity of multibyte encoding
This can effectively avoid special characters and garbled code problems after decoding, and improve the processing quality of email content.
<?php
// Comprehensive examples
$encoded = "Hello=20World=21=0D=0A=C3=A9"; // quoted-printable Example
$decoded = quoted_printable_decode($encoded);
// Assumptions邮件声明coding为 ISO-8859-1
$corrected = mb_convert_encoding($decoded, 'UTF-8', 'ISO-8859-1');
echo $corrected; // Output:Hello World! é
?>