Current Location: Home> Latest Articles> Solve encoding issues in serialize: How to deal with UTF-8 and other character sets?

Solve encoding issues in serialize: How to deal with UTF-8 and other character sets?

gitbox 2025-05-27

In PHP, the serialize() function is used to convert PHP variables into strings that can be stored or transferred. However, there is an encoding problem with serialize() function, especially when processing data from different character sets, you may encounter garbled or incorrect results. This article will explore how to solve the encoding problem of the serialize() function in PHP, especially when dealing with UTF-8 and other character sets.

1. Basic usage of serialize() function

The serialize() function converts a PHP variable into a string, which can be stored in a database or transmitted over the network. Here is a simple example:

 $data = ['name' => 'Zhang San', 'age' => 25];
$serializedData = serialize($data);
echo $serializedData;

At this time, Chinese characters in the $data array may be garbled after serialization due to character encoding problems, especially when the character set does not match the target environment.

2. Why are there coding problems?

PHP's serialize() function does not perform any encoding conversion to the data, it encodes and stores the original characters at the byte level. Therefore, if the incoming data contains contents of different character sets, the correct encoding may be lost after serialization.

For example, when you serialize a UTF-8-encoded string, if the target environment (such as the database, transport layer, or the system that reads the data) uses other character sets, deserialization may lead to garbled code.

3. How to solve the encoding problem during serialization?

To solve this problem, you first need to ensure that all data is uniformly encoded before serialization. Typically, we can perform character encoding conversion before processing the data, ensuring that they are all serialized in UTF-8 encoding.

3.1 Ensure that the data is UTF-8 encoded

Use PHP's mb_convert_encoding() function to ensure that the data is converted to UTF-8 encoding:

 $data = ['name' => 'Zhang San', 'age' => 25];

// Convert all string fields to UTF-8 coding
$data = array_map(function($item) {
    return is_string($item) ? mb_convert_encoding($item, 'UTF-8', 'auto') : $item;
}, $data);

$serializedData = serialize($data);
echo $serializedData;

In the above code, we use array_map() to iterate over the array and make sure that each string in the array is converted to UTF-8 encoding. This can effectively avoid garbled code problems caused by inconsistent encoding.

3.2 Ensure that the environment supports UTF-8

If your application needs to interact with a database or other system, it is important to ensure that the database and transport channels support UTF-8 encoding. For databases, it is usually possible to set the character set to UTF-8 and ensure correct encoding conversion is performed when accessing data.

In MySQL, make sure the database connection is encoded using UTF-8:

 // 设置数据库连接coding为 UTF-8
mysqli_set_charset($connection, 'utf8mb4');

4. Coding processing during deserialization

When deserializing ( unserialize() ), it is also necessary to ensure that the data is encoded correctly. If your application depends on other character sets, you may need to encode after deserialization.

 $unserializedData = unserialize($serializedData);

// If needed,可以将数据转换回特定coding
$unserializedData = array_map(function($item) {
    return is_string($item) ? mb_convert_encoding($item, 'auto', 'UTF-8') : $item;
}, $unserializedData);

5. Security issues of serialization and deserialization

In addition to encoding issues, you should also pay attention to security issues when using serialize() and unserialize() . The unserialize() function may be exploited for PHP object injection attacks, so when using unserialize() , it is best to limit the incoming parameter types to ensure that malicious code is not executed.

PHP provides the allowed_classes option to limit classes that can be created during deserialization:

 $unserializedData = unserialize($serializedData, ['allowed_classes' => false]);

This can effectively prevent deserialization of malicious classes.

6. Summary

PHP's serialize() and unserialize() functions are powerful tools for handling data storage and transmission, but their encoding issues also need to be taken seriously. When using these functions, ensuring consistency in data encoding, especially when UTF-8 and other character sets are involved, it is effective to avoid garbled and encoding errors.

The key to dealing with character set problems is:

  1. Ensure that all data is encoded in a unified character (such as UTF-8).

  2. Set up UTF-8-enabled character sets in the database and transport channels.

  3. Maintain consistent encoding processing during serialization and deserialization.

Through these steps, you can use serialize() and unserialize() functions more stably to ensure cross-platform and multi-environment compatibility.