正则表达式函数
preg_match()
用于执行一个正则表达式匹配,第一次匹配后,将会停止搜索。
int preg_match ( string $pattern , string $subject [, array &$matches [, int $flags = 0 [, int $offset = 0 ]]] )
返回 pattern 匹配次数
。 它的值将是 0(不匹配)或 1。 如果发生错误,则返回 FALSE。
preg_match_all()
用于执行一个全局正则表达式匹配,会一直搜索直到结尾。
int preg_match_all ( string $pattern , string $subject [, array &$matches [, int $flags = PREG_PATTERN_ORDER [, int $offset = 0 ]]] )
返回完整pattern 匹配次数
(可能是0)。或者如果发生错误,则返回FALSE。
正则表达式 - 元字符
字符 | 描述 |
---|---|
(pattern) | 匹配 pattern 并获取这一匹配。要匹配圆括号字符,请在括号前加反斜杠 '(' 或 ')'。 |
(?:pattern) | 匹配 pattern 但不获取匹配结果,也就是说这是一个非获取匹配。很多时候,可替代 "或" 匹配。例, 'industr(?:y|ies) 等价于 'industry|industries' 。 |
正则表达式 - 修饰符
正则表达式中常用的模式修正符有i、g、m、s、x、e等。它们之间可以组合搭配使用
i 不区分大小写的匹配;
g表示全局匹配
m 将字符串视为多行,不管是那行都能匹配;
s 将字符串视为单行,换行符作为普通字符;
x 将模式中的空白忽略;
A 强制从目标字符串开头匹配;
D 如果使用$限制结尾字符,则不允许结尾有换行;
U 只匹配最近的一个字符串;不重复匹配;
e 配合函数preg_replace()使用,
使用在线工具练习或测试
http://c.runoob.com/front-end/854
示例
准备待解析的内容
新建 rival_goods.html
文件,存放等下要用正则表达式解析的内容。
<tbody class="ant-table-tbody">
<tr class="ant-table-row oui-table-row-tree-node-1 ant-table-row-level-0" data-row-key="tree-node-1">
<td class="">
<span class="ant-table-row-indent indent-level-0" style="padding-left: 0px;"></span><!-- react-empty: 1350 --><div class="sycm-goods-td" style="width: 260px;"><a class="goodsImg pull-left" href="//detail.tmall.com/item.htm?id=626987898197" target="_blank" rel="noopener noreferrer" title="秋冬季2020新款高帮男鞋潮流百搭运动休闲加绒保暖棉鞋老爹潮鞋" style="width: 38px; height: 38px;"><img class="mediaObject" src="//img.alicdn.com/bao/uploaded/i1/2074376818/O1CN01U3Rzep20Egzw0ViDL_!!2074376818-0-lubanu-s.jpg_36x36.jpg"></a><div class="goodsInfo" style="width: 202px; max-height: 76px;"><p class="singleGoodsName"><a href="//detail.tmall.com/item.htm?id=626987898197" target="_blank" rel="noopener noreferrer" title="秋冬季2020新款高帮男鞋潮流百搭运动休闲加绒保暖棉鞋老爹潮鞋">秋冬季2020新款高帮男鞋潮流百搭运动休闲加绒保暖棉鞋老爹潮鞋</a></p><p class="goodsShopName" style="width: 202px;">较前一日</p></div></div>
</td>
<td class="">
<div class="alife-dt-card-common-table-sortable-td alife-dt-card-common-table-cateRankId"><span class="alife-dt-card-common-table-sortable-value">26</span><span class="alife-dt-card-common-table-sortable-ratio-value"></span><div class="alife-dt-card-common-table-sortable-cycleCrc" style="margin-right: 0px;"><span style="color: red;">升6名</span></div><div class="alife-dt-card-common-table-sortable-syncCrc" style="margin-right: 0px;"></div></div>
</td>
<td class="">
<div class="alife-dt-card-common-table-sortable-td alife-dt-card-common-table-tradeIndex"><span class="alife-dt-card-common-table-sortable-value">37,135</span><span class="alife-dt-card-common-table-sortable-ratio-value"></span><div class="alife-dt-card-common-table-sortable-cycleCrc" style="margin-right: 0px;"><span style="color: gray;">-0.78%</span></div><div class="alife-dt-card-common-table-sortable-syncCrc" style="margin-right: 0px;"></div></div>
</td>
<td class="alife-dt-card-common-table-right-column">
<a href="/mc/ci/item/analysis?rivalItem1Id=626987898197&cateId=50011740" target="_blank">竞品分析</a>
</td>
</tr>
<tr class="ant-table-row oui-table-row-tree-node-2 ant-table-row-level-0" data-row-key="tree-node-2">
<td class="">
<span class="ant-table-row-indent indent-level-0" style="padding-left: 0px;"></span><!-- react-empty: 1377 --><div class="sycm-goods-td" style="width: 260px;"><a class="goodsImg pull-left" href="//detail.tmall.com/item.htm?id=629272537596" target="_blank" rel="noopener noreferrer" title="aj男鞋正品官网旗舰店官空军一号2020新款aj1莆田篮球高帮潮鞋男" style="width: 38px; height: 38px;"><img class="mediaObject" src="//img.alicdn.com/bao/uploaded/i1/2932519149/O1CN016qZWP32HSIFYZUvdB_!!0-item_pic.jpg_36x36.jpg"></a><div class="goodsInfo" style="width: 202px; max-height: 76px;"><p class="singleGoodsName"><a href="//detail.tmall.com/item.htm?id=629272537596" target="_blank" rel="noopener noreferrer" title="aj男鞋正品官网旗舰店官空军一号2020新款aj1莆田篮球高帮潮鞋男">aj男鞋正品官网旗舰店官空军一号2020新款aj1莆田篮球高帮潮鞋男</a></p><p class="goodsShopName" style="width: 202px;">较前一日</p></div></div>
</td>
<td class="">
<div class="alife-dt-card-common-table-sortable-td alife-dt-card-common-table-cateRankId"><span class="alife-dt-card-common-table-sortable-value">28</span><span class="alife-dt-card-common-table-sortable-ratio-value"></span><div class="alife-dt-card-common-table-sortable-cycleCrc" style="margin-right: 0px;"><span style="color: red;">升3名</span></div><div class="alife-dt-card-common-table-sortable-syncCrc" style="margin-right: 0px;"></div></div>
</td>
<td class="">
<div class="alife-dt-card-common-table-sortable-td alife-dt-card-common-table-tradeIndex"><span class="alife-dt-card-common-table-sortable-value">35,899</span><span class="alife-dt-card-common-table-sortable-ratio-value"></span><div class="alife-dt-card-common-table-sortable-cycleCrc" style="margin-right: 0px;"><span style="color: gray;">-7.61%</span></div><div class="alife-dt-card-common-table-sortable-syncCrc" style="margin-right: 0px;"></div></div>
</td>
<td class="alife-dt-card-common-table-right-column">
<a href="/mc/ci/item/analysis?rivalItem1Id=629272537596&cateId=50011740" target="_blank">竞品分析</a>
</td>
</tr>
</tbody>
使用正则表达式,解析并提取数据
$html = file_get_contents("rival_goods.html");
$vars = array('detailUrls'=>array(),'titles'=>array(), 'images'=>array());
preg_match_all("/(?:<p class=\"singleGoodsName\"><a href=\")(.+)(?:\" target=\".+\">.+<\/a>)/U", $html, $vars['detailUrls']);
preg_match_all("/(?:<img class=\"mediaObject\" src=\")(.+)(?:\">)/U", $html, $vars['images']);
preg_match_all("/(?:<p class=\"singleGoodsName\"><a .+>)(.+)(?:<\/a><\/p>)/U", $html, $vars['titles']);
print_r($vars);
提取结果
Array
(
[detailUrls] => Array
(
[0] => Array
(
[0] => <p class="singleGoodsName"><a href="//detail.tmall.com/item.htm?id=626987898197" target="_blank" rel="noopener noreferrer" title="秋冬季2020新款高帮男鞋潮流百搭运动休闲加绒保暖棉鞋老爹潮鞋">秋冬
季2020新款高帮男鞋潮流百搭运动休闲加绒保暖棉鞋老爹潮鞋</a>
[1] => <p class="singleGoodsName"><a href="//detail.tmall.com/item.htm?id=629272537596" target="_blank" rel="noopener noreferrer" title="aj男鞋正品官网旗舰店官空军一号2020新款aj1莆田篮球高帮潮鞋男">aj男
鞋正品官网旗舰店官空军一号2020新款aj1莆田篮球高帮潮鞋男</a>
)
[1] => Array
(
[0] => //detail.tmall.com/item.htm?id=626987898197
[1] => //detail.tmall.com/item.htm?id=629272537596
)
)
[titles] => Array
(
[0] => Array
(
[0] => <p class="singleGoodsName"><a href="//detail.tmall.com/item.htm?id=626987898197" target="_blank" rel="noopener noreferrer" title="秋冬季2020新款高帮男鞋潮流百搭运动休闲加绒保暖棉鞋老爹潮鞋">秋冬
季2020新款高帮男鞋潮流百搭运动休闲加绒保暖棉鞋老爹潮鞋</a></p>
[1] => <p class="singleGoodsName"><a href="//detail.tmall.com/item.htm?id=629272537596" target="_blank" rel="noopener noreferrer" title="aj男鞋正品官网旗舰店官空军一号2020新款aj1莆田篮球高帮潮鞋男">aj男
鞋正品官网旗舰店官空军一号2020新款aj1莆田篮球高帮潮鞋男</a></p>
)
[1] => Array
(
[0] => 秋冬季2020新款高帮男鞋潮流百搭运动休闲加绒保暖棉鞋老爹潮鞋
[1] => aj男鞋正品官网旗舰店官空军一号2020新款aj1莆田篮球高帮潮鞋男
)
)
[images] => Array
(
[0] => Array
(
[0] => <img class="mediaObject" src="//img.alicdn.com/bao/uploaded/i1/2074376818/O1CN01U3Rzep20Egzw0ViDL_!!2074376818-0-lubanu-s.jpg_36x36.jpg">
[1] => <img class="mediaObject" src="//img.alicdn.com/bao/uploaded/i1/2932519149/O1CN016qZWP32HSIFYZUvdB_!!0-item_pic.jpg_36x36.jpg">
)
[1] => Array
(
[0] => //img.alicdn.com/bao/uploaded/i1/2074376818/O1CN01U3Rzep20Egzw0ViDL_!!2074376818-0-lubanu-s.jpg_36x36.jpg
[1] => //img.alicdn.com/bao/uploaded/i1/2932519149/O1CN016qZWP32HSIFYZUvdB_!!0-item_pic.jpg_36x36.jpg
)
)
)
PHP 正则表达式(PCRE) https://www.runoob.com/php/php-pcre.html
正则表达式 - 教程 https://www.runoob.com/regexp/regexp-tutorial.html
正则表达式中模式修正符作用详解(i、g、m、s、x、e)https://www.cnblogs.com/kevin-yuan/archive/2012/09/25/2702167.html