Current Location: Home> Latest Articles> How to Use the preg_match Function to Extract Specific Tag Content from HTML Source Code? Step-by-Step Guide

How to Use the preg_match Function to Extract Specific Tag Content from HTML Source Code? Step-by-Step Guide

gitbox 2025-08-21
<span><span><span class="hljs-meta">&lt;?php</span></span><span>
</span><span><span class="hljs-comment">// This article explains how to use the preg_match function in PHP to extract specific tag content from HTML source code.</span></span><span>
</span><span><span class="hljs-comment">// preg_match is a powerful regular expression matching tool, suitable for simple pattern matching.</span></span><span>
</span><span><span class="hljs-comment">// However, for complex HTML structures, it is recommended to use more stable methods such as DOMDocument.</span></span><span>
</span><span><span class="hljs-comment">// For learning purposes, this article will demonstrate the basic usage of preg_match for extracting tags.</span></span><span>
</span><span><span class="hljs-meta">?&gt;</span></span><span>
<p><hr></p>
<p><h1>How to Use the preg_match Function to Extract Specific Tag Content from HTML Source Code? Step-by-Step Guide</h1></p>
<p>In web development, we often need to extract specific tag content from HTML source code, such as page titles, image URLs, or paragraph text. While using a DOM parser is generally recommended for parsing HTML, in some simple cases, the <code>preg_match
  • $pattern: Regular expression

  • $subject: The string to search (HTML source code)

  • $matches: If a match is found, an array of matching results is returned

The return value is 1 if a match is found, and 0 if not.

2. Example: Extracting the Content of the Tag from HTML</h2> <p>Here is a simple example that shows how to extract the content inside the <title> tag:</p> <pre><code class="codes"><span><span><span class="hljs-variable">$html</span></span><span> = </span><span><span class="hljs-string">&#039;&lt;html&gt;&lt;head&gt;&lt;title&gt;This is the webpage title&lt;/title&gt;&lt;/head&gt;&lt;body&gt;Content&lt;/body&gt;&lt;/html&gt;&#039;</span></span><span>; </span><span><span class="hljs-variable">$pattern</span></span><span> = </span><span><span class="hljs-string">&#039;/&lt;title&gt;(.*?)&lt;\/title&gt;/i&#039;</span></span><span>; </span><span><span class="hljs-title function_ invoke__">preg_match</span></span><span>(</span><span><span class="hljs-variable">$pattern</span></span><span>, </span><span><span class="hljs-variable">$html</span></span><span>, </span><span><span class="hljs-variable">$matches</span></span><span>); </span><span><span class="hljs-keyword">echo</span></span><span> </span><span><span class="hljs-variable">$matches</span></span><span>[</span><span><span class="hljs-number">1</span></span><span>]; </span><span><span class="hljs-comment">// Output: This is the webpage title</span></span><span> </span></span></code></pre> <h3>Explanation of the Regular Expression:</h3> <ul> <li> <p><title> and : Match the opening and closing tags exactly

  • (.*?): Non-greedy match for the content in between

  • /i: Case-insensitive

  • 3. What if You Need to Extract Multiple Tags?

    preg_match only matches the first occurrence. If you want to match multiple identical tags, such as multiple

    paragraphs, you should use the preg_match_all function.

    Example:

    <span><span><span class="hljs-variable">$html</span></span><span> = </span><span><span class="hljs-string">&#039;&lt;p&gt;First paragraph&lt;/p&gt;&lt;p&gt;Second paragraph&lt;/p&gt;&#039;</span></span><span>;
    </span><span><span class="hljs-variable">$pattern</span></span><span> = </span><span><span class="hljs-string">&#039;/&lt;p&gt;(.*?)&lt;\/p&gt;/i&#039;</span></span><span>;
    </span><span><span class="hljs-title function_ invoke__">preg_match_all</span></span><span>(</span><span><span class="hljs-variable">$pattern</span></span><span>, </span><span><span class="hljs-variable">$html</span></span><span>, </span><span><span class="hljs-variable">$matches</span></span><span>);
    
    </span><span><span class="hljs-title function_ invoke__">print_r</span></span><span>(</span><span><span class="hljs-variable">$matches</span></span><span>[</span><span><span class="hljs-number">1</span></span><span>]); </span><span><span class="hljs-comment">// Output: Array ( [0] =&gt; First paragraph [1] =&gt; Second paragraph )</span></span><span>
    </span></span>

    4. Important Notes

    1. HTML Nesting Issues: Regular expressions cannot correctly parse nested tags. For example, extracting content inside

      Content
      may fail.

    2. Security: When handling user-submitted HTML, always sanitize properly to prevent XSS attacks.

    3. Performance: Regular expressions are less efficient for large-scale HTML parsing. For complex structures, it’s better to use DOMDocument.

    5. Conclusion

    Using preg_match to extract HTML tag content is very effective for simple HTML structures. When dealing with fixed, well-formatted content, it can complete the task quickly and efficiently. However, for complex or nested HTML, more professional parsing methods should be considered. Mastering preg_match not only enhances your regular expression skills but also allows you to handle text data with ease in specific scenarios.

    <span></span>
    • Related Tags:

      HTML