zl程序教程

您现在的位置是:首页 >  后端

当前栏目

Java:正则表达式:regular expression: regexp

JAVA正则表达式 expression REGEXP regular
2023-09-14 09:13:13 时间

原链接

https://docs.oracle.com/javase/6/docs/api/java/util/regex/Pattern.html?spm=a2c6h.12873639.0.0.6f275664HyKZqi
https://docs.oracle.com/javase/tutorial/essential/regex/groups.html

解释

Special constructs (non-capturing),不会捕获 这里的X,不会更改X的值
(?:X) X, as a non-capturing group
(?idmsux-idmsux) Nothing, but turns match flags i d m s u x on - off
(?idmsux-idmsux:X) X, as a non-capturing group with the given flags i d m s u x on - off
(?=X) X, via zero-width positive lookahead 向前再看看紧接着是否有X出现,必须要出现
(?!X) X, via zero-width negative lookahead 向前看,不能有X
(?<=X) X, via zero-width positive lookbehind, 向后看,紧接着是否有X出现,必须出现
(?<!X) X, via zero-width negative lookbehind ,向后看,不能出现X
(?>X) X, as an independent, non-capturing group

在线工具

https://tool.oschina.net/regex/

实例

(?<=\s)[\-0-9a-zA-Z]+(?=( systemvm| abc))
这个可以匹配,开头是空格,结尾是 systemvm 或者 abc的子串。

(?<![0-9])(25[0-5]|2[0-4]\d|[0-1]?\d?\d)(\.(25[0-5]|2[0-4]\d|[0-1]?\d?\d)){3}(?![0-9])
匹配IP地址;

实例3 单词边界

(?<=hostname:)["\s]+\b[0-9a-zA-Z]+\b   /// 必须成对出现?
" hostname:  May 13 22:34:57 localhost nodemgr hostname: abc; 
May 13 03:36:17 localhost"
这里可以匹配May和abc

反向引用

(\d\d)(\d\d)\1\2
可以匹配
12321232
第一个括号里的是一组查找对象,第二个括号里的是第二组查找对象
\1代表第一组查找对象的匹配结果
\2 代表第二组的查找对象匹配结果

\s 的范围还是挺大的

\s A whitespace character: [ \t\n\x0B\f\r]

错误示例

Cannot allow look-behind variable length expression;
有些正则表达式的解释器不能处理不定长的后向检查表达式,例如:

(?<=(abc\[[\d]*\]: ))\s*[0-9a-zA-Z]+(?=[-]+)
这里是要匹配 字符串:"abc[-任意多个数字-]: "; 但是,使用了星号来匹配任意多个数字,这里支持不了任意多个数字;只能指定具体是多少个数字
修改为:
(?<=(abc\[[\d]{2,10}\]: ))\s*[0-9a-zA-Z]+(?=[-]+)  // 具体就是2到10个数字。

问题,匹配行开头的问题

如果需要匹配多个起始行的话,需要用到 (?m)(^abc)(?-m) 标识,multi-line;不然只匹配所有字符串的第一行。
https://docs.oracle.com/javase/6/docs/api/java/util/regex/Pattern.html?spm=a2c6h.12873639.0.0.6f275664HyKZqi#MULTILINE

python regex

如果是用python的regex模块,version0,不支持 表达式后面接(?-m)。例如

 (?m)(?<=(^[a-zA-Z]{2,3})[\s][0-9]{1,2}[\s]([0-9]{2}:){2}[0-9]{2}[\s]{1,3})[a-zA-Z0-9]+(?=[ -])(?-m) 

这个表达式在python会出现下面的错误:

if(REG.match(expression,'')):
File "usr/lib64/python3.6/site-packages/regex/regex.py", line 251, in match
return _compile(pattern, flags, kwargs).match(string, pos, endpos,
File "usr/lib64/python3.6/site-packages/regex/regex.py", line 515, in _compile
caught_exception.pos)
regex._regex_core.error: bad inline flags: cannot turn flags off at position 99

V0 = VERSION0 = 0x2000    # Old legacy behaviour.
V1 = VERSION1 = 0x100     # New enhanced behaviour.
W = WORD = 0x800          # Default Unicode word breaks.
X = VERBOSE = 0x40        # Ignore whitespace and comments.
T = TEMPLATE = 0x1        # Template (present because re module has it).

DEFAULT_VERSION = VERSION1

_ALL_VERSIONS = VERSION0 | VERSION1
_ALL_ENCODINGS = ASCII | LOCALE | UNICODE

def parse_positional_flags(source, info, flags_on, flags_off):
    "Parses positional flags."
    version = (info.flags & _ALL_VERSIONS) or DEFAULT_VERSION
    if version == VERSION0:
        # Positional flags are global and can only be turned on.
        if flags_off:
            raise error("bad inline flags: cannot turn flags off",
              source.string, source.pos)

.net regular

https://docs.microsoft.com/en-us/dotnet/standard/base-types/regular-expression-language-quick-reference