在 PHP 里,preg_replace() 是处理文本最趁手的“瑞士军刀”。它基于 PCRE(Perl Compatible Regular Expressions),既能做简单的查找替换,也能完成结构化重写、清洗数据、批量改名等复杂任务。本文将从零开始,围绕 preg_replace() 的函数签名、正则语法、常见场景与避坑要点,帮你迅速掌握正则替换的核心技巧。
函数签名:
<span><span><span class="hljs-keyword">mixed</span></span><span> </span><span><span class="hljs-title function_ invoke__">preg_replace</span></span><span>(
</span><span><span class="hljs-keyword">string</span></span><span>|</span><span><span class="hljs-keyword">array</span></span><span> </span><span><span class="hljs-variable">$pattern</span></span><span>,
</span><span><span class="hljs-keyword">string</span></span><span>|</span><span><span class="hljs-keyword">array</span></span><span> </span><span><span class="hljs-variable">$replacement</span></span><span>,
</span><span><span class="hljs-keyword">string</span></span><span>|</span><span><span class="hljs-keyword">array</span></span><span> </span><span><span class="hljs-variable">$subject</span></span><span>,
</span><span><span class="hljs-keyword">int</span></span><span> </span><span><span class="hljs-variable">$limit</span></span><span> = -</span><span><span class="hljs-number">1</span></span><span>,
</span><span><span class="hljs-keyword">int</span></span><span> &</span><span><span class="hljs-variable">$count</span></span><span> = </span><span><span class="hljs-literal">null</span></span><span>
)
</span></span>
$pattern:正则表达式(可为数组,表示多规则)
$replacement:替换内容(可为数组,与 pattern 一一对应)
$subject:待处理的字符串(或字符串数组)
$limit:替换次数上限(默认 -1 表示不限制)
$count:输出参数,返回实际替换次数
最小示例:
<span><span><span class="hljs-meta"><?php</span></span><span>
</span><span><span class="hljs-variable">$text</span></span><span> = </span><span><span class="hljs-string">"Color or Colour? I like the color blue."</span></span><span>;
</span><span><span class="hljs-variable">$result</span></span><span> = </span><span><span class="hljs-title function_ invoke__">preg_replace</span></span><span>(</span><span><span class="hljs-string">'/colou?r/i'</span></span><span>, </span><span><span class="hljs-string">'color'</span></span><span>, </span><span><span class="hljs-variable">$text</span></span><span>, -</span><span><span class="hljs-number">1</span></span><span>, </span><span><span class="hljs-variable">$count</span></span><span>);
</span><span><span class="hljs-keyword">echo</span></span><span> </span><span><span class="hljs-variable">$result</span></span><span>; </span><span><span class="hljs-comment">// Color or color? I like the color blue.</span></span><span>
</span><span><span class="hljs-keyword">echo</span></span><span> PHP_EOL . </span><span><span class="hljs-string">"Replaced: <span class="hljs-subst">$count</span></span></span><span>"; </span><span><span class="hljs-comment">// Replaced: 2</span></span><span>
</span></span>
/colou?r/i:? 让前面的 u 可选;i 修饰符忽略大小写。
常见分隔符有 / # ~ % { } ( ) 等。选择不与模式冲突的分隔符最省心:
<span><span><span class="hljs-title function_ invoke__">preg_replace</span></span><span>(</span><span><span class="hljs-string">'#https?://[^\s]+#'</span></span><span>, </span><span><span class="hljs-string">'[link]'</span></span><span>, </span><span><span class="hljs-variable">$text</span></span><span>);
</span></span>
当模式里有大量 / 时,改用 # 能避免大量转义。
i:忽略大小写
m:多行模式(^、$ 会匹配每一行的行首/行尾)
s:单行模式(. 也匹配换行)
u:按 UTF-8 处理(中文/emoji 场景强烈推荐)
x:忽略模式中的空白与注释(可读性更好)
U:懒惰量词反转(将量词默认从贪婪变为惰性)
示例(多行 + 单行):
<span><span><span class="hljs-variable">$log</span></span><span> = </span><span><span class="hljs-string">"ID:42\nPayload:\n{\n \"a\":1\n}\nEnd"</span></span><span>;
</span><span><span class="hljs-keyword">echo</span></span><span> </span><span><span class="hljs-title function_ invoke__">preg_replace</span></span><span>(</span><span><span class="hljs-string">'/^Payload:(.*)End$/ims'</span></span><span>, </span><span><span class="hljs-string">'[DATA HIDDEN]'</span></span><span>, </span><span><span class="hljs-variable">$log</span></span><span>);
</span></span>
字符类:[abc]、[^abc]、\d 数字、\w 单词字符、\s 空白
位置锚点:^ 行首,$ 行尾,\b 单词边界
量词:*(0+), +(1+), ?(0/1), {m,n}(范围)
贪婪/惰性:+ 是贪婪,+? 是惰性(尽可能少匹配)
示例(邮箱掩码):
<span><span><span class="hljs-variable">$email</span></span><span> = </span><span><span class="hljs-string">'[email protected]'</span></span><span>;
</span><span><span class="hljs-keyword">echo</span></span><span> </span><span><span class="hljs-title function_ invoke__">preg_replace</span></span><span>(</span><span><span class="hljs-string">'/(?<=.).+?(?=@)/'</span></span><span>, </span><span><span class="hljs-string">'***'</span></span><span>, </span><span><span class="hljs-variable">$email</span></span><span>);
</span><span><span class="hljs-comment">// a***@example.com</span></span><span>
</span></span>
使用前后查找 (?<=...)、(?=...) 精准定位替换范围,避免捕获多余字符。
捕获分组:(...) 会把匹配内容保存到 $1, $2, ...(在替换字符串中使用)
非捕获分组:(?:...) 仅分组不保存,性能更好
示例(姓名格式化:张三-李四 → 张三 & 李四):
<span><span><span class="hljs-variable">$name</span></span><span> = </span><span><span class="hljs-string">'张三-李四'</span></span><span>;
</span><span><span class="hljs-keyword">echo</span></span><span> </span><span><span class="hljs-title function_ invoke__">preg_replace</span></span><span>(</span><span><span class="hljs-string">'/^(\S+)\s*-\s*(\S+)$/u'</span></span><span>, </span><span><span class="hljs-string">'$1 & $2'</span></span><span>, </span><span><span class="hljs-variable">$name</span></span><span>);
</span><span><span class="hljs-comment">// 张三 & 李四</span></span><span>
</span></span>
示例(URL 标准化:HTTP://EXAMPLE.COM/Path → 小写域名):
<span><span><span class="hljs-variable">$url</span></span><span> = </span><span><span class="hljs-string">'HTTP://EXAMPLE.COM/Path'</span></span><span>;
</span><span><span class="hljs-keyword">echo</span></span><span> </span><span><span class="hljs-title function_ invoke__">preg_replace</span></span><span>(</span><span><span class="hljs-string">'/^(https?):\/\/([^\/]+)/ie'</span></span><span>, </span><span><span class="hljs-string">"'<span class="hljs-subst">$1</span></span></span><span>://'.strtolower('</span><span><span class="hljs-subst">$2</span></span><span>')", </span><span><span class="hljs-variable">$url</span></span><span>);
</span></span>
?? 老代码可能出现 /e 修饰符(已废弃),不要使用。请改用 回调(见下一节)。
当替换值需要计算(如大小写转换、动态编号、条件判断)时,用回调更安全:
<span><span><span class="hljs-variable">$input</span></span><span> = </span><span><span class="hljs-string">"HTTP://EXAMPLE.COM/Path and http://MiXeD.com/Another"</span></span><span>;
</span><span><span class="hljs-variable">$result</span></span><span> = </span><span><span class="hljs-title function_ invoke__">preg_replace_callback</span></span><span>(
</span><span><span class="hljs-string">'#\bhttps?://([^/\s]+)#i'</span></span><span>,
function (</span><span><span class="hljs-variable">$m</span></span><span>) {
</span><span><span class="hljs-comment">// $m[0] 是整个匹配,$m[1] 是域名</span></span><span>
</span><span><span class="hljs-variable">$scheme</span></span><span> = </span><span><span class="hljs-title function_ invoke__">stripos</span></span><span>(</span><span><span class="hljs-variable">$m</span></span><span>[</span><span><span class="hljs-number">0</span></span><span>], </span><span><span class="hljs-string">'https://'</span></span><span>) === </span><span><span class="hljs-number">0</span></span><span> ? </span><span><span class="hljs-string">'https://'</span></span><span> : </span><span><span class="hljs-string">'http://'</span></span><span>;
</span><span><span class="hljs-keyword">return</span></span><span> </span><span><span class="hljs-variable">$scheme</span></span><span> . </span><span><span class="hljs-title function_ invoke__">strtolower</span></span><span>(</span><span><span class="hljs-variable">$m</span></span><span>[</span><span><span class="hljs-number">1</span></span><span>]);
},
</span><span><span class="hljs-variable">$input</span></span><span>
);
</span><span><span class="hljs-keyword">echo</span></span><span> </span><span><span class="hljs-variable">$result</span></span><span>;
</span><span><span class="hljs-comment">// http://example.com/Path and http://mixed.com/Another</span></span><span>
</span></span>
还有一个适合处理大量数据的姐妹函数:preg_replace_callback_array(),可一次注册多条规则与其回调:
<span><span><span class="hljs-variable">$text</span></span><span> = </span><span><span class="hljs-string">"Price: 19.99 USD, Date: 2025-08-25"</span></span><span>;
</span><span><span class="hljs-variable">$result</span></span><span> = </span><span><span class="hljs-title function_ invoke__">preg_replace_callback_array</span></span><span>([
</span><span><span class="hljs-string">'/\b(\d+(?:\.\d{2})?)\s*USD\b/'</span></span><span> => fn(</span><span><span class="hljs-variable">$m</span></span><span>) => </span><span><span class="hljs-string">'$'</span></span><span> . </span><span><span class="hljs-variable">$m</span></span><span>[</span><span><span class="hljs-number">1</span></span><span>],
</span><span><span class="hljs-string">'/\b(\d{4})-(\d{2})-(\d{2})\b/'</span></span><span> => </span><span><span class="hljs-function"><span class="hljs-keyword">fn</span></span></span><span>(</span><span><span class="hljs-params"><span class="hljs-variable">$m</span></span></span><span>) => </span><span><span class="hljs-string">"<span class="hljs-subst">{$m[2]}</span></span></span><span>/</span><span><span class="hljs-subst">{$m[3]}</span></span><span>/</span><span><span class="hljs-subst">{$m[1]}</span></span><span>",
], </span><span><span class="hljs-variable">$text</span></span><span>);
</span><span><span class="hljs-keyword">echo</span></span><span> </span><span><span class="hljs-variable">$result</span></span><span>; </span><span><span class="hljs-comment">// Price: $19.99, Date: 08/25/2025</span></span><span>
</span></span>
$pattern 和 $replacement 都支持数组。如果替换值不是一一对应,则会用同一个替换值应用到每个模式:
<span><span><span class="hljs-variable">$input</span></span><span> = </span><span><span class="hljs-string">"foo 123 bar 456 baz"</span></span><span>;
</span><span><span class="hljs-variable">$patterns</span></span><span> = [</span><span><span class="hljs-string">'/\bfoo\b/'</span></span><span>, </span><span><span class="hljs-string">'/\d+/'</span></span><span>, </span><span><span class="hljs-string">'/\bbaz\b/'</span></span><span>];
</span><span><span class="hljs-variable">$replacements</span></span><span> = [</span><span><span class="hljs-string">'FOO'</span></span><span>, </span><span><span class="hljs-string">'[NUM]'</span></span><span>, </span><span><span class="hljs-string">'BAZ'</span></span><span>];
</span><span><span class="hljs-keyword">echo</span></span><span> </span><span><span class="hljs-title function_ invoke__">preg_replace</span></span><span>(</span><span><span class="hljs-variable">$patterns</span></span><span>, </span><span><span class="hljs-variable">$replacements</span></span><span>, </span><span><span class="hljs-variable">$input</span></span><span>);
</span><span><span class="hljs-comment">// FOO [NUM] bar [NUM] BAZ</span></span><span>
</span></span>
默认建议加 u 修饰符,避免把多字节字符拆坏。
中文分词边界可用 \b?不可靠。\b 是“单词边界”,针对 ASCII 单词字符;处理中文边界请用明确的字符类或前后查找。
示例(给中文与数字之间加空格):
<span><span><span class="hljs-variable">$str</span></span><span> = </span><span><span class="hljs-string">"版本2已发布在2025年8月25日"</span></span><span>;
</span><span><span class="hljs-variable">$str</span></span><span> = </span><span><span class="hljs-title function_ invoke__">preg_replace</span></span><span>(</span><span><span class="hljs-string">'/(?<=[\x{4e00}-\x{9fa5}])(?=\d)/u'</span></span><span>, </span><span><span class="hljs-string">' '</span></span><span>, </span><span><span class="hljs-variable">$str</span></span><span>);
</span><span><span class="hljs-variable">$str</span></span><span> = </span><span><span class="hljs-title function_ invoke__">preg_replace</span></span><span>(</span><span><span class="hljs-string">'/(?<=\d)(?=[\x{4e00}-\x{9fa5}])/u'</span></span><span>, </span><span><span class="hljs-string">' '</span></span><span>, </span><span><span class="hljs-variable">$str</span></span><span>);
</span><span><span class="hljs-keyword">echo</span></span><span> </span><span><span class="hljs-variable">$str</span></span><span>; </span><span><span class="hljs-comment">// 版本 2 已发布在 2025 年 8 月 25 日</span></span><span>
</span></span>
\x{4e00}-\x{9fa5} 是常用汉字区间,记得加 u。
目标:去掉所有标签,仅保留纯文本。
<span><span><span class="hljs-variable">$html</span></span><span> = </span><span><span class="hljs-string">"<p>Hello <strong>world</strong> &copy; 2025</p>"</span></span><span>;
</span><span><span class="hljs-variable">$plain</span></span><span> = </span><span><span class="hljs-title function_ invoke__">preg_replace</span></span><span>(</span><span><span class="hljs-string">'/<[^>]+>/'</span></span><span>, </span><span><span class="hljs-string">''</span></span><span>, </span><span><span class="hljs-variable">$html</span></span><span>);
</span><span><span class="hljs-keyword">echo</span></span><span> </span><span><span class="hljs-variable">$plain</span></span><span>; </span><span><span class="hljs-comment">// Hello world ? 2025</span></span><span>
</span></span>
简单清洗 OK;复杂 HTML 结构请用 DOM 才健壮。
<span><span><span class="hljs-variable">$phone</span></span><span> = </span><span><span class="hljs-string">"13812345678"</span></span><span>;
</span><span><span class="hljs-keyword">echo</span></span><span> </span><span><span class="hljs-title function_ invoke__">preg_replace</span></span><span>(</span><span><span class="hljs-string">'/(\d{3})\d{4}(\d{4})/'</span></span><span>, </span><span><span class="hljs-string">'$1****$2'</span></span><span>, </span><span><span class="hljs-variable">$phone</span></span><span>);
</span><span><span class="hljs-comment">// 138****5678</span></span><span>
</span></span>
<span><span><span class="hljs-variable">$template</span></span><span> = </span><span><span class="hljs-string">"Hi {name}, your order {id} is {status}."</span></span><span>;
</span><span><span class="hljs-variable">$data</span></span><span> = [</span><span><span class="hljs-string">'name'</span></span><span> => </span><span><span class="hljs-string">'Alice'</span></span><span>, </span><span><span class="hljs-string">'id'</span></span><span> => </span><span><span class="hljs-number">42</span></span><span>, </span><span><span class="hljs-string">'status'</span></span><span> => </span><span><span class="hljs-string">'shipped'</span></span><span>];
</span><span><span class="hljs-keyword">echo</span></span><span> </span><span><span class="hljs-title function_ invoke__">preg_replace_callback</span></span><span>(</span><span><span class="hljs-string">'/\{(\w+)\}/'</span></span><span>, function(</span><span><span class="hljs-variable">$m</span></span><span>) </span><span><span class="hljs-keyword">use</span></span><span> ($</span><span><span class="hljs-title">data</span></span><span>) {
</span><span><span class="hljs-title">return</span></span><span> $</span><span><span class="hljs-title">data</span></span><span>[$</span><span><span class="hljs-title">m</span></span><span>[1]] ?? $</span><span><span class="hljs-title">m</span></span><span>[0];
}, </span><span><span class="hljs-variable">$template</span></span><span>);
</span><span><span class="hljs-comment">// Hi Alice, your order 42 is shipped.</span></span><span>
</span></span>
<span><span><span class="hljs-variable">$md</span></span><span> = </span><span><span class="hljs-string">''</span></span><span>;
</span><span><span class="hljs-variable">$img</span></span><span> = </span><span><span class="hljs-title function_ invoke__">preg_replace</span></span><span>(
</span><span><span class="hljs-string">'/!\[([^\]]*)\]\((\S+)(?:\s+"([^"]*)")?\)/'</span></span><span>,
</span><span><span class="hljs-string">'<img src="$2" alt="$1" title="$3">'</span></span><span>,
</span><span><span class="hljs-variable">$md</span></span><span>
);
</span><span><span class="hljs-keyword">echo</span></span><span> </span><span><span class="hljs-variable">$img</span></span><span>;
</span><span><span class="hljs-comment">// <img src="/img/logo.png" alt="alt text" title="Title"></span></span><span>
</span></span>
<span><span><span class="hljs-variable">$text</span></span><span> = </span><span><span class="hljs-string">"Hello,world! PHP\tis\ngreat."</span></span><span>;
</span><span><span class="hljs-comment">// 把非换行的连续空白压成一个空格</span></span><span>
</span><span><span class="hljs-variable">$text</span></span><span> = </span><span><span class="hljs-title function_ invoke__">preg_replace</span></span><span>(</span><span><span class="hljs-string">'/[^\S\r\n]+/'</span></span><span>, </span><span><span class="hljs-string">' '</span></span><span>, </span><span><span class="hljs-variable">$text</span></span><span>);
</span><span><span class="hljs-comment">// 替换中文逗号为英文逗号后加空格</span></span><span>
</span><span><span class="hljs-variable">$text</span></span><span> = </span><span><span class="hljs-title function_ invoke__">preg_replace</span></span><span>(</span><span><span class="hljs-string">'/,/u'</span></span><span>, </span><span><span class="hljs-string">', '</span></span><span>, </span><span><span class="hljs-variable">$text</span></span><span>);
</span><span><span class="hljs-keyword">echo</span></span><span> </span><span><span class="hljs-variable">$text</span></span><span>; </span><span><span class="hljs-comment">// Hello, world! PHP is great.</span></span><span>
</span></span>
<span><span><span class="hljs-variable">$line</span></span><span> = </span><span><span class="hljs-string">'2025-08-25 14:03:22 [INFO] user=alice ip=203.0.113.9'</span></span><span>;
</span><span><span class="hljs-variable">$fmt</span></span><span> = </span><span><span class="hljs-title function_ invoke__">preg_replace</span></span><span>(
</span><span><span class="hljs-string">'/^(\d{4}-\d{2}-\d{2}) (\d{2}:\d{2}:\d{2}) \[(\w+)\] user=(\w+) ip=([\d.]+)$/'</span></span><span>,
</span><span><span class="hljs-string">'[$3][$1T$2Z] $4@$5'</span></span><span>,
</span><span><span class="hljs-variable">$line</span></span><span>
);
</span><span><span class="hljs-comment">// [INFO][2025-08-25T14:03:22Z] [email protected]</span></span><span>
</span></span>
优先具体,后泛化:字符类尽量窄,避免 .* 滥用。必要时改用惰性量词或前后查找。
避免灾难性回溯:模式中 (.+)+、(.*){m,} 这类结构极易爆栈;能明确边界就别用“贪吃蛇”。
使用 u 修饰符:文本包含多字节字符(中文/emoji)时必须加上,否则可能破坏字符。
回调代替 /e:任何需要计算的替换都用 preg_replace_callback(),更安全。
控制 $limit:当你只想替换第一个匹配时,把 $limit 设为 1。
统计与测试:利用 $count 收集替换次数;为关键模式写单元测试,覆盖边界用例。
预编译/缓存:PHP 内部对 PCRE 有一定缓存;但在高频路径上,尽量避免在循环中构造可变模式。
最小化重现:把长模式拆小,逐段验证。
可读性:用 x 修饰符写“带注释”的模式:
<span><span><span class="hljs-variable">$pattern</span></span><span> = <span class="hljs-string">'/
^\s* # 开头允许空白
(?P<key>\w+) # 键
\s*=\s*
(?P<val>.+?) # 值(惰性)
\s*$
/x'</span>;
</span></span>
转义意识:在 PHP 字符串里要双重考虑转义(例如 "\d" 与 \\d 的区别)。
只替换第一个匹配:preg_replace($p, $r, $s, 1, $count);
安全移除脚本标签:preg_replace('#<script\b[^>]*>.*?</script>#is', '', $html);
URL 中的查询参数重命名:匹配 ([?&])old=([^&#]*) → $1new=$2
千分位插入逗号(简单数值):preg_replace('/\B(?=(\d{3})+(?!\d))/', ',', $n);
多余空行压缩:preg_replace('/(\R)\s*(\R)/', "$1$2", $text);
去除不可见字符:preg_replace('/[\x00-\x1F\x7F]/', '', $s);
驼峰转下划线:preg_replace('/(?<!^)[A-Z]/', '_$0', $camel);
<span><span><span class="hljs-meta"><?php</span></span><span>
</span><span><span class="hljs-variable">$log</span></span><span> = <span class="hljs-string"><<<LOG
[2025-08-25 10:00:01] user=john phone=13812345678 [email protected]
[2025-08-25 10:05:09] user=林 phone=13987654321 [email protected]
LOG</span>;
</span><span><span class="hljs-comment">// 1) 基础脱敏:手机号中间四位打星、邮箱用户名只留首字符</span></span><span>
</span><span><span class="hljs-variable">$log</span></span><span> = </span><span><span class="hljs-title function_ invoke__">preg_replace</span></span><span>(</span><span><span class="hljs-string">'/(\b1\d{2})\d{4}(\d{4}\b)/'</span></span><span>, </span><span><span class="hljs-string">'$1****$2'</span></span><span>, </span><span><span class="hljs-variable">$log</span></span><span>);
</span><span><span class="hljs-variable">$log</span></span><span> = </span><span><span class="hljs-title function_ invoke__">preg_replace</span></span><span>(</span><span><span class="hljs-string">'/(?<=\b)[A-Za-z0-9._%+-](?:[A-Za-z0-9._%+-]?)+(?=@)/'</span></span><span>, </span><span><span class="hljs-string">'*'</span></span><span>, </span><span><span class="hljs-variable">$log</span></span><span>);
</span><span><span class="hljs-comment">// 2) 结构化重写:转成 CSV 行</span></span><span>
</span><span><span class="hljs-variable">$csv</span></span><span> = </span><span><span class="hljs-title function_ invoke__">preg_replace</span></span><span>(
</span><span><span class="hljs-string">'/^\[(\d{4}-\d{2}-\d{2}) (\d{2}:\d{2}:\d{2})\]\s+user=([^\s]+)\s+phone=([^\s]+)\s+email=([^\s]+)$/mu'</span></span><span>,
</span><span><span class="hljs-string">'$1,$2,$3,$4,$5'</span></span><span>,
</span><span><span class="hljs-variable">$log</span></span><span>
);
</span><span><span class="hljs-keyword">echo</span></span><span> </span><span><span class="hljs-variable">$csv</span></span><span>;
<span class="hljs-comment">/*
2025-08-25,10:00:01,john,138****5678,j*</span></span><span><span class="hljs-doctag">@example</span></span><span>.com
2025-08-25,10:05:09,林,139****4321,l*</span><span><span class="hljs-doctag">@example</span></span><span>.cn
*/
</span></span>
preg_replace() 的威力在于“精确描述你要找的模式,并把它改写成你需要的样子”。把握分隔符与修饰符、善用捕获分组与前后查找、在需要计算时用回调,你就能在日常开发中游刃有余地完成从清洗到重写的各种文本任务。多写小例子、多做边界测试,正则会从“黑魔法”变成顺手的日常工具。