zl程序教程

您现在的位置是:首页 >  后端

当前栏目

Python正则表达式

2023-09-14 09:10:54 时间

    很少用正则表达式(为什么?不知道,但很有用),每次用到总是要重新查,然后重新试验后使用。
    整理Python正则表达式帮助内容,学习和理解,蓝色​标记及空白有待完善和补充。​​​​​​

  1. 函数
  2. 特殊字符
  3. 特殊转译 
This module exports the following functions 函数s = 'abca'
标识英文描述中文描述用法示例
matchMatch a regular expression pattern to the beginning of a string.从字符串开始(第一个字符)匹配模式串match(pattern, string, flags=0)

re.match('a',s)
>>> <re.Match object; span=(0, 1), match='a'>
re.match('a',s)[0]
>>> a
re.match('ab',s)
>>> <re.Match object; span=(0, 2), match='ab'>
re.match('c',s)
>>> None

fullmatchMatch a regular expression pattern to all of a string.匹配字符串与模式串是否一致fullmatch(pattern, string, flags=0)re.fullmatch('abca',s)
>>> <re.Match object; span=(0, 3), match='abc'>
re.fullmatch('abc',s)
>>> None
searchSearch a string for the presence of a pattern.在字符串中匹配模式串search(pattern, string, flags=0)re.search('a',s)
>>> <re.Match object; span=(0, 1), match='a'>
re.search('bc',s)
>>> <re.Match object; span=(1, 3), match='bc'>
re.search('d',s)
>>> None
subSubstitute occurrences of a pattern found in a string.替换字符串中匹配的模式串,默认全部
count设定匹配替换的个数
sub(pattern, repl, string, count=0, flags=0)re.sub('a','A',s)
>>> AbcA
re.sub('a','A',s,1)
>>> Abca
re.sub('abc','ABC',s)
>>> ABCa
re.sub('d','A',s)
>>> abca   
subnSame as sub, but also return the number of substitutions made.sub,同时返回模式串出现的次数,返回元组subn(pattern, repl, string, count=0, flags=0)re.subn('a','A',s)
>>> ('AbcA', 2)
re.subn('a','A',s)[1]
>>> 2
re.subn('d','A',s)
>>> ('abca', 0)
splitSplit a string by the occurrences of a pattern.按模式拆分字符串,匹配的模式串显示为空,返回列表
maxsplit设定匹配和拆分的个数
split(pattern, string, maxsplit=0, flags=0)re.split('a',s)
>>> ['', 'bc', '']
re.split('a',s,1)
>>> ['', 'bca']
findallFind all occurrences of a pattern in a string.按模式返回字符串中所有匹配findall(pattern, string, flags=0)re.findall('a',s)
>>> ['a', 'a']
finditerReturn an iterator yielding a Match object for each match.findall,返回match对象listfinditer(pattern, string, flags=0)re.finditer('a',s)
>>> <callable_iterator object at 0x00000253AD29A9A0>
list(re.finditer('a',s))
>>> [<re.Match object; span=(0, 1), match='a'>, <re.Match object; span=(3, 4), match='a'>]
compileCompile a pattern into a Pattern object.创建模式串对象compile(pattern, flags=0)mo = re.compile('a')
re.findall(mo,s)
>>> ['a', 'a']
purgeClear the regular expression cache.清空正则表达式缓存purge()具体意义不明确,待验证
escapeBackslash all non-alphanumerics in a string.转译,所有非字母数字添加反斜杠escape(pattern)re.escape('\\')
>>> '\\\\'
re.escape('a')
>>> 'a'
re.escape('1')
>>> '1'
re.escape('#')
>>> '\\#'
The special characters are,特殊字符s = 'abcdaabc \n reg'
标识英文描述中文描述用法示例
.Matches any character except a newline.匹配除\n换行字符外所有字符串,返回listre.findall('.',s)
>>> ['a', 'b', 'c', 'd', 'a', 'a', 'b', 'c', ' ', ' ', 'r', 'e', 'g']
^Matches the start of the string.从字符串开始匹配re.findall('^a',s)
>>> ['a']
re.findall('^b',s)
>>> []
$Matches the end of the string or just before the newline at the end of the string.以字符串结束开始匹配re.findall('g$',s)
>>> ['g']
re.findall('a$',s)
>>> []
*Matches 0 or more (greedy) repetitions of the preceding RE. 匹配0或更多重复字符(*前一字符或字符段)re.findall('a*',s)   # 匹配包含非a 1个a 或 多个a的子字符串
>>> ['a', '', '', '', 'aa', '', '', '', '', '', '', '', '', '']  
re.findall('abcc*',s)
>>> ['abc', 'abc'] 
+Matches 1 or more (greedy) repetitions of the preceding RE.匹配1或更多重复字符(+前一字符或字符段)re.findall('a+',s)   # 匹配包含a 多个a的子字符串
>>> ['a', 'aa']
re.findall('abcc+',s)
>>> []
?Matches 0 or 1 (greedy) of the preceding RE.匹配01个字符(?前一字符或字符段)re.findall('a?',s)   # 匹配1a a的子字符串
>>> ['a', '', '', '', 'a', 'a', '', '', '', '', '', '', '', '', '']
re.findall('abcc?',s)
>>> ['abc', 'abc']
*?,+?,??Non-greedy versions of the previous three special characters.以?结束贪婪匹配(只匹配最少字符)re.findall('a*?b',s)
>>> ['ab', 'aab']
re.findall('a+?b',s)
>>> ['ab', 'aab']
re.findall('a??b',s)
>>> ['ab', 'ab']
{m,n}Matches from m to n repetitions of the preceding RE.匹配一个字符mn次重复(前一字符或字符段)re.findall('abc{0,0}',s)
>>> ['ab', 'ab']
re.findall('abc{0,1}',s)
>>> ['abc', 'abc']
{m,n}?Non-greedy version of the above.以?结束贪婪(只匹配最少字符),匹配指定长度re.findall('abc{0,1}?',s)
>>> ['ab', 'ab']
re.findall('abc{0,1}?d',s)
>>>['abcd']
\\Either escapes special characters or signals a special sequence.匹配特殊字符或特殊序列re.findall('\\n',s)
>>> ['\n']
[]Indicates a set of characters.A "^" as the first character indicates a complementing set.将[]中的字符串中的每个字符分别匹配,不包含特殊字符
第一个字符^ 非,不匹配^后的每个字符的模式
re.findall('[a*b\n]',s)
>>> ['a', 'b', 'a', 'a', 'b', '\n']
re.findall('[^a*b\n]',s)
>>> ['c', 'd', 'c', ' ', ' ', 'r', 'e', 'g']
|A|B, creates an RE that will match either A or B.or,匹配多种不同模式串re.findall('a*?b|\\n',s)
>>> ['ab', 'aab', '\n']
(...)Matches the RE inside the parentheses.The contents can be retrieved or matched later in the string.按()中的模式串进行匹配并分组,不在()的不进行匹配re.findall('(a*)(bc+)(d?)',s)
(?aiLmsux)The letters set the corresponding flags defined below.  
(?:...)Non-grouping version of regular parentheses.  
(?P<name>...)The substring matched by the group is accessible by name.通过名称访问组匹配模式串re.search('(?P<TEST>.*)',s).groupdict()
>>> {'TEST': 'abcdaabc '}
(?P=name)Matches the text matched earlier by the group named name.按组名匹配第一个文本re.search('(?P<abc>.*)(?P=abc)',s).groupdict()
>>> {'abc': ''}
(?#...)A comment; ignored.注释re.search('(\w*)',s).group()
>>> abcdaabc
re.search('(?#Comment:\w*)',s).group()
>>>  
(?=...)Matches if ... matches next, but doesn't consume the string.分组,满足(=...)开始并包括开始字符匹配串re.findall('(?=a)\w*',s)
>>> ['abcdaabc']
re.findall('(?=a)\w*',s)
>>> ['bcdaabc']
(?!...)Matches if ... doesn't match next.分组,非(=...)开始并包括开始字符匹配串re.findall('(?! )\w*',s)
>>> ['abcdaabc', '', 'reg', '']
re.findall('(?<! )\w*',s)
>>> ['abcdaabc', '', '', 'eg', '']
(?<=...)Matches if preceded by ... (must be fixed length).分组,满足(=...)开始但不包括开始字符匹配串re.findall('(?<= )',s)  # start with ''(blank)
>>> ['', '']
re.findall('(?<= )\w*',s) # start with ''(blank) and follow with 数字、字母、下划线
>>> ['', 'reg']
(?<!...)Matches if not preceded by ... (must be fixed length).分组,非(=...)开始但不包括开始字符匹配串re.findall('(?<= )',s)  # start not with ''(blank)
>>> ['', '', '', '', '', '', '', '', '', '', '', '', '']
re.findall('(?<= )\w*',s) # start not with ''(blank) and follow with 数字、字母、下划线
>>> ['abcdaabc', '', '', 'eg', '']
(?(id/name)yes|no)Matches yes pattern if the group with id/name matched,the (optional) no pattern otherwise.  
The special sequences consist of "\\" and a character from the list below.  If the ordinary character is not on the list, then the resulting RE will match the second character.s = '0123456789 this is regular tist express!'
标识英文描述中文描述用法示例
\numberMatches the contents of the group of the same number.()对字符串分组,在分组中使用\num表示引用第num个模式s = '122211'
re.findall(r'(\d{1})(\d{1})(\d{1})',s)    #3组,每组1个数字
>>> [('1', '2', '2'), ('2', '1', '1')]
re.findall(r'(\d{1})(\d{1})(\1)',s)        #
3组,每组1个数字,第3组内容与第1组一致
>>> [('2', '2', '2')]
\AMatches only at the start of the string.匹配以模式开始的字符串,^re.findall(r'\A.',s)
>>> ['0']
\ZMatches only at the end of the string.匹配以模式结束的字符串,$re.findall(r'.\Z',s)
>>> ['!']
\bMatches the empty string, but only at the start or end of a word.匹配单词边界(单词间空格)re.findall(r'is\b',s)
>>> ['is', 'is']
re.findall(r'\bis\b',s)
>>> ['is']
re.findall(r'.is\b',s)
>>> ['his', ' is']
\BMatches the empty string, but not at the start or end of a word.匹配非单词边界(除空格外字符)re.findall(r'.is\B.',s)
>>> ['tist']
\dMatches any decimal digit; equivalent to the set [0-9] in bytes patterns or string patterns with the ASCII flag. In string patterns without the ASCII flag, it will match the whole range of Unicode digits.匹配任意数字,等价于 [0-9]re.findall(r'\d',s)
>>> ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']
\DMatches any non-digit character; equivalent to [^\d].匹配任意非数字re.findall(r'\d',s)
>>> [' ', 't', 'h', 'i', 's', ' ', 'i', 's', ' ', 'r', 'e', 'g', 'u', 'l', 'a', 'r', ' ', 't', 'i', 's', 't', ' ', 'e', 'x', 'p', 'r', 'e', 's', 's', '!']
\sMatches any whitespace character; equivalent to [ \t\n\r\f\v] in bytes patterns or string patterns with the ASCII flag. In string patterns without the ASCII flag, it will match the whole range of Unicode whitespace characters.匹配任意空白字符,等价于 [ \t\n\r\f]re.findall(r'\s',s)
>>> [' ', ' ', ' ', ' ', ' ']
re.findall(r'\s[a-z]*',s
>>> [' this', ' is', ' regular', ' tist', ' express']
\SMatches any non-whitespace character; equivalent to [^\s].匹配任意非空字符re.findall(r'\S',s)
>>> ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 't', 'h', 'i', 's', 'i', 's', 'r', 'e', 'g', 'u', 'l', 'a', 'r', 't', 'i', 's', 't', 'e', 'x', 'p', 'r', 'e', 's', 's', '!']
\wMatches any alphanumeric character; equivalent to [a-zA-Z0-9_]in bytes patterns or string patterns with the ASCII flag.In string patterns without the ASCII flag, it will match the range of Unicode alphanumeric characters (letters plus digits plus underscore).With LOCALE, it will match the set [0-9_] plus characters defined as letters for the current locale.匹配字母数字及下划线re.findall(r'\s\w*',s)
>>> [' this', ' is', ' regular', ' tist', ' express']
re.findall(r'\w',s)
>>> ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 't', 'h', 'i', 's', 'i', 's', 'r', 'e', 'g', 'u', 'l', 'a', 'r', 't', 'i', 's', 't', 'e', 'x', 'p', 'r', 'e', 's', 's']
\WMatches the complement of \w.匹配非字母数字及下划线re.findall(r'\W*',s)
>>> [' ', ' ', ' ', ' ', ' ', '!']
re.findall(r'\S\W',s)
>>> ['9 ', 's ', 's ', 'r ', 't ', 's!']
\\Matches a literal backslash.匹配反斜杠s = 'abc\\'
re.findall(r'\\')
>>> ['\\']
说明 
Greedy means that it will match as many repetitions as possible.贪婪意味着匹配任意多个可能的重复
pattern模式串,要匹配的正则表达式
flag标志位,控制正则表达式的匹配方式1. re.I(re.IGNORECASE): 忽略大小写
2. re.M(MULTILINE): 多行模式,改变'^''$'的行为
3. re.S(DOTALL): 点任意匹配模式,改变'.'的行为
4. re.L(LOCALE): 使预定字符类 \w \W \b \B \s \S 取决于当前区域设定
5. re.U(UNICODE): 使预定字符类 \w \W \b \B \s \S \d \D 取决于unicode定义的字符属性
6. re.X(VERBOSE): 详细模式。这个模式下正则表达式可以是多行,忽略空白字符,并可以加入注释
\使用转译字符'\'时,由于字符串会自动转译,因为要使用 r'\b' 格式(增加 r 标识),以满足正则表达式的模式匹配
  • 实例:按组匹配身份证,身份证号 61052420220101912X
    身份证号码是由18位数字组成的,他们分别表示:
      1)前1、2位数字表示:所在省份的代码
      2)前3、4位数字表示:所在城市的代码
      3)前5、6位数字表示:所在区县的代码
      4)第7~14位数字表示:出生年、月、日;7、8、9、10位是年,11、12位是月,13、14位是日
      5)第15、16位数字表示:所在地的派出所的代码
      6)第17位数字表示性别:奇数表示男性,偶数表示女性
           7)第18位数字是校检码:校检码可以是0~9的数字,有时也用X表示
import re

s = '61052420220129128X'

idcard = re.findall('(?P<省>\d{2})(?P<市>\d{2})(?P<县>\d{2})(?P<年>\d{4})(?P<月>\d{2})(?P<日>\d{2})(?P<派出所>\d{2})(?P<性别>\d{1})(?P<校验码>\d{1}|\D{1})',s)
print("FindAll: ",idcard)

idcard = re.search('(?P<省>\d{2})(?P<市>\d{2})(?P<县>\d{2})(?P<年>\d{4})(?P<月>\d{2})(?P<日>\d{2})(?P<派出所>\d{2})(?P<性别>\d{1})(?P<校验码>\d{1}|\D{1})',s).groupdict()
print("Search1: ",idcard)

print("Search2: ")
for i in idcard.items():
    print(i[0],": ",i[1])

>>>>>>>>>>>>>>>>>>>>>
FindAll:  [('61', '05', '24', '2022', '01', '29', '12', '8', 'X')]

Search1:  {'省': '61', '市': '05', '县': '24', '年': '2022', '月': '01', '日': '29', '派出所': '12', '性别': '8', '校验码': 'X'}

Search2: 
省 :  61
市 :  05
县 :  24
年 :  2022
月 :  01
日 :  29
派出所 :  12
性别 :  8
校验码 :  X

参考: