xml_parse_into_struct() is a function based on the Expat XML parsing library, and its prototype is as follows:
int xml_parse_into_struct ( resource $parser , string $data , array &$values [, array &$index ] )
$parser : A parser created by xml_parser_create() .
$data : XML string to be parsed.
$values : An array of XML data structures arranged in order.
$index : optional, used to record the label name and corresponding index.
The function returns 1 to indicate the parsing is successful, and the function returns 0 to indicate the failure.
XML is a markup language with very strict formatting requirements, and common errors include:
Unclosed tags
The attribute is not quoted
Illegal characters (such as control characters)
Tag nesting error
Sample code:
$data = '<root><item>Test</root>'; // Missing </item> Label
$parser = xml_parser_create();
if (!xml_parse_into_struct($parser, $data, $values)) {
echo "XML Error: " . xml_error_string(xml_get_error_code($parser));
}
xml_parser_free($parser);
Output:
XML Error: mismatched tag
The encoding in the XML declaration (such as <?xml version="1.0" encoding="UTF-8"?> ) is inconsistent with the actual content encoding, which will cause parsing to fail.
For example, the file is declared as UTF-8, but the content is actually GBK encoded, throwing an illegal character error.
Solution :
Make sure the file encoding and XML declaration are consistent.
Use mb_convert_encoding() to convert to UTF-8.
$data = mb_convert_encoding($data, 'UTF-8', 'GBK');
< , > , & , " and ' in XML are special characters that need to be escaped:
< → <
> → >
& → &
For example:
$data = '<note>Tom & Jerry</note>'; // mistake:Not escaped &
Should be changed to:
$data = '<note>Tom & Jerry</note>';
Label names cannot start with numbers or special characters, nor can they contain spaces. For example:
<123tag>value</123tag> <!-- illegal -->
<tag name="a b">value</tag> <!-- Spaces in attribute values are not quoted -->
These two functions can help you quickly locate the problem.
if (!xml_parse_into_struct($parser, $data, $values)) {
echo "Error: " . xml_error_string(xml_get_error_code($parser)) .
" at line " . xml_get_current_line_number($parser);
}
During the troubleshooting process, you can paste XML into an online verification tool such as https://gitbox.net/tools/xml-validator/ to quickly discover syntax errors.
If the XML originates from a remote interface or external file, it is recommended to clean and log the following before parsing:
file_put_contents('/tmp/raw_xml.log', $data);
It is also recommended to use the following cleaning function:
function clean_xml($data) {
$data = trim($data);
// Remove BOM
$data = preg_replace('/^\xEF\xBB\xBF/', '', $data);
// Delete invisible characters
return preg_replace('/[^\x09\x0A\x0D\x20-\x7F\xA0-\xFF]/', '', $data);
}
Although xml_parse_into_struct() does not use DOM, using libxml_use_internal_errors() before reading XML is still helpful for overall debugging:
libxml_use_internal_errors(true);
For XML with clear structure and good format, SimpleXML is more recommended:
$xml = simplexml_load_string($data);
It provides a more friendly object interface and higher fault tolerance.