沙滩星空的博客沙滩星空的博客

Golang正则表达式

简介

regexp包实现了正则表达式搜索。采用RE2语法(除了\c、\C),和Perl、Python等语言的正则基本一致。

查看文档命令: go doc regexp/syntax
导入包: import "regexp"

Regexp包提供了16个方法,用于匹配正则表达式搜索结果。方法名满足如下正则表达式:

Find(All)?(String)?(Submatch)?(Index)?
Go包仓库: https://pkg.go.dev/regexp
RE2语法: https://github.com/google/re2

方法名

Find(All)?(String)?(Submatch)?(Index)?

如果'All'出现了,该方法会返回输入中所有互不重叠的匹配结果。如果一个匹配结果的前后(没有间隔字符)存在长度为0的成功匹配,该空匹配会被忽略。包含All的方法会要求一个额外的整数参数n,如果n>=0,方法会返回最多前n个匹配结果。

如果'String'出现了,匹配对象为字符串,否则应该是[]byte类型,返回值和匹配对象的类型是对应的。

如果'Submatch'出现了,返回值是表示正则表达式中成功的组匹配(子匹配/次级匹配)的切片。组匹配是正则表达式内部的括号包围的次级表达式(也被称为“捕获分组”),从左到右按左括号的顺序编号。,索引0的组匹配为完整表达式的匹配结果,1为第一个分组的匹配结果,依次类推。

如果'Index'出现了,匹配/分组匹配会用输入流的字节索引对表示result[2n:2n+1]表示第n个分组匹配的的匹配结果。如果没有'Index',匹配结果表示为匹配到的文本。如果索引为负数,表示分组匹配没有匹配到输入流中的文本。

语法

本包采用的正则表达式语法,默认采用perl标志。某些语法可以通过切换解析时的标志来关闭。

单字符:

        .              任意字符(标志s==true时还包括换行符)
        [xyz]          字符族
        [^xyz]         反向字符族
        \d             Perl预定义字符族
        \D             反向Perl预定义字符族
        [:alpha:]      ASCII字符族
        [:^alpha:]     反向ASCII字符族
        \pN            Unicode字符族(单字符名),参见unicode包
        \PN            反向Unicode字符族(单字符名)
        \p{Greek}      Unicode字符族(完整字符名)
        \P{Greek}      反向Unicode字符族(完整字符名)

分组:

        (re)           编号的捕获分组
        (?P<name>re)   命名并编号的捕获分组
        (?:re)         不捕获的分组
        (?flags)       设置当前所在分组的标志,不捕获也不匹配
        (?flags:re)    设置re段的标志,不捕获的分组

标志的语法为xyz(设置)、-xyz(清楚)、xy-z(设置xy,清楚z),标志如下:

        I              大小写敏感(默认关闭)
        m              ^和$在匹配文本开始和结尾之外,还可以匹配行首和行尾(默认开启)
        s              让.可以匹配\n(默认关闭)
        U              非贪婪的:交换x*和x*?、x+和x+?……的含义(默认关闭)

重复

        x*             重复>=0次匹配x,越多越好(优先重复匹配x)
        x+             重复>=1次匹配x,越多越好(优先重复匹配x)
        x?             0或1次匹配x,优先1次
        x{n,m}         n到m次匹配x,越多越好(优先重复匹配x)
        x{n,}          重复>=n次匹配x,越多越好(优先重复匹配x)
        x{n}           重复n次匹配x
        x*?            重复>=0次匹配x,越少越好(优先跳出重复)
        x+?            重复>=1次匹配x,越少越好(优先跳出重复)
        x??            0或1次匹配x,优先0次
        x{n,m}?        n到m次匹配x,越少越好(优先跳出重复)
        x{n,}?         重复>=n次匹配x,越少越好(优先跳出重复)
        x{n}?          重复n次匹配x

示例

  1. 匹配 div 标签中的内容
    //解释正则表达式
    reg := regexp.MustCompile(`<div>(?s:(.*?))</div>`)
    if reg == nil {
        fmt.Println("MustCompile err")
        return
    }
    //提取关键信息
    result := reg.FindAllStringSubmatch(buf, -1)
    //过滤<></>
    for _, text := range result {
        fmt.Println("text[1] = ", text[1])
    }
  1. 提取JS代码中的JSON字符串: https://go.dev/play/p/vSDtL_6l2RS
package main

import (
    "fmt"
    "regexp"
)

func main() {
    str := `==jsonDetailText=="Viewed Product",{"currency":"USD","variantId":41635375382697,"productId":7302746734761,"productGid":"gid:\/\/shopify\/Product\/7302746734761","name":"Ultimate Sports Bra® - Prism Blue - XSmall","price":"75.00","sku":"110002-371","brand":"Shefit","variant":"XSmall","category":"Ultimate - Limited Edition","nonInteraction":true});====`
    re, _ := regexp.Compile(`\"Viewed Product\",(.+)\);`)
    result := re.FindAllStringSubmatch(str, -1)
    fmt.Println(len(result))
    fmt.Println(result)
    fmt.Println(len(result[0]))
    fmt.Println(result[0])
    fmt.Println(result[0][0])
    fmt.Println(result[0][1])
}
1
[["Viewed Product",{"currency":"USD","variantId":41635375382697,"productId":7302746734761,"productGid":"gid:\/\/shopify\/Product\/7302746734761","name":"Ultimate Sports Bra® - Prism Blue - XSmall","price":"75.00","sku":"110002-371","brand":"Shefit","variant":"XSmall","category":"Ultimate - Limited Edition","nonInteraction":true}); {"currency":"USD","variantId":41635375382697,"productId":7302746734761,"productGid":"gid:\/\/shopify\/Product\/7302746734761","name":"Ultimate Sports Bra® - Prism Blue - XSmall","price":"75.00","sku":"110002-371","brand":"Shefit","variant":"XSmall","category":"Ultimate - Limited Edition","nonInteraction":true}]]
2
["Viewed Product",{"currency":"USD","variantId":41635375382697,"productId":7302746734761,"productGid":"gid:\/\/shopify\/Product\/7302746734761","name":"Ultimate Sports Bra® - Prism Blue - XSmall","price":"75.00","sku":"110002-371","brand":"Shefit","variant":"XSmall","category":"Ultimate - Limited Edition","nonInteraction":true}); {"currency":"USD","variantId":41635375382697,"productId":7302746734761,"productGid":"gid:\/\/shopify\/Product\/7302746734761","name":"Ultimate Sports Bra® - Prism Blue - XSmall","price":"75.00","sku":"110002-371","brand":"Shefit","variant":"XSmall","category":"Ultimate - Limited Edition","nonInteraction":true}]
"Viewed Product",{"currency":"USD","variantId":41635375382697,"productId":7302746734761,"productGid":"gid:\/\/shopify\/Product\/7302746734761","name":"Ultimate Sports Bra® - Prism Blue - XSmall","price":"75.00","sku":"110002-371","brand":"Shefit","variant":"XSmall","category":"Ultimate - Limited Edition","nonInteraction":true});
{"currency":"USD","variantId":41635375382697,"productId":7302746734761,"productGid":"gid:\/\/shopify\/Product\/7302746734761","name":"Ultimate Sports Bra® - Prism Blue - XSmall","price":"75.00","sku":"110002-371","brand":"Shefit","variant":"XSmall","category":"Ultimate - Limited Edition","nonInteraction":true}

Go标准库:regexp https://wizardforcel.gitbooks.io/golang-stdlib-ref/content/107.html
Go语言正则表达式 http://c.biancheng.net/view/5124.html
未经允许不得转载:沙滩星空的博客 » Golang正则表达式

评论 抢沙发

  • 昵称 (必填)
  • 邮箱 (必填)
  • 网址