Current Location: Home> Latest Articles> Practical Guide to Parsing and Generating HTML DOM with PHP

Practical Guide to Parsing and Generating HTML DOM with PHP

gitbox 2025-08-08

Parsing HTML DOM in PHP

Parsing the HTML DOM is a common task in PHP development. PHP's built-in DOMDocument class makes it easy to load and manipulate HTML content. The example below demonstrates how to use DOMDocument to parse an HTML document and access its elements:

php
<?php
$html = '<h1>This is a sample title</h1>This is a sample paragraph.';
// Create a DOMDocument instance
$dom = new DOMDocument();
// Load the HTML content
$dom->loadHTML($html);
// Get the first h1 element
$h1 = $dom->getElementsByTagName('h1')->item(0);
$title = $h1->textContent;
// Get the first p element
$paragraph = $dom->getElementsByTagName('p')->item(0);
$content = $paragraph ? $paragraph->textContent : '';
// Output the results
echo '<h3>Title:</h3>';
echo $title;
echo '<h3>Paragraph content:</h3>';
echo $content;
?>

In this example, a DOMDocument object is created and the HTML string is loaded with loadHTML. The getElementsByTagName method is used to find specific elements, and textContent retrieves the text content, completing the HTML parsing process.

Generating HTML DOM with PHP

Besides parsing, PHP can dynamically create and build HTML structures using DOMDocument. The following example shows how to generate a complete HTML document:

php
<?php
// Create a new DOMDocument object
$dom = new DOMDocument();
// Create the html root element
$html = $dom->createElement('html');
// Create the body element
$body = $dom->createElement('body');
// Create h1 element and set text
$h1 = $dom->createElement('h1', 'This is a sample title');
$body->appendChild($h1);
// Create p element and set text
$p = $dom->createElement('p');
$p->textContent = 'This is a sample paragraph.';
$body->appendChild($p);
// Append body to html root
$html->appendChild($body);
// Append html root to DOMDocument
$dom->appendChild($html);
// Output the generated HTML code
echo $dom->saveHTML();
?>

This example sequentially creates the HTML root, body, h1, and p nodes. After setting text content, it uses appendChild to establish parent-child relationships and finally calls saveHTML to generate the HTML string.

Summary and Important Tips

With DOMDocument, PHP developers can flexibly parse and generate HTML DOM structures, which is useful for various web content processing scenarios. When using it, pay attention to character encoding and whitespace handling. Adjusting the encoding property or using methods like preserveWhiteSpace may be necessary to ensure the generated HTML meets expectations.

Mastering these techniques will significantly improve the efficiency and stability of PHP-based HTML content handling.