Python中re操作正则表达式--688IT编程网

Python中re操作正则表达式

在python中使⽤正则表达式

1.转义符

正则表达式中的转义：

'\('表⽰匹配⼩括号

[()+*/?&.] 在字符组中⼀些特殊的字符会现出原形

所有的\s\d\w\S\D\W\n\t都表⽰他原本的意义

[-]只有写在字符组的⾸位的时候表⽰普通的减号

写在其它位置的时候表⽰范⽂[1-9]如果就是想匹配减号[1\-9]

Python中的转义符

分析过程：

'\n'#\是转义符赋予这个n⼀个特殊的意义表⽰⼀个换⾏符

print('\\n')

print(r'\n')

转义：python '\\\\n'正则'\\n'

结论：

r'\\n' r'\n'在python中

<模块

准备：

flags有很多可选值：

re.I(IGNORECASE)忽略⼤⼩写，括号内是完整的写法

re.M(MULTILINE)多⾏模式，改变^和$的⾏为

re.S(DOTALL)点可以匹配任意字符，包括换⾏符

re.L(LOCALE)做本地化识别的匹配，表⽰特殊字符集 \w, \W, \b, \B, \s, \S 依赖于当前环境，不推荐使⽤

re.U(UNICODE) 使⽤\w \W \s \S \d \D使⽤取决于unicode定义的字符属性。在python3中默认使⽤该flag

re.X(VERBOSE)冗长模式，该模式下pattern字符串可以是多⾏的，忽略空⽩字符，并可以添加注释

1）匹配⽅法 findall() search() mathc()

(1)findall()

def findall(pattern, string, flags=0):

"""Return a list of all non-overlapping matches in the string.

If one or more capturing groups are present in the pattern, return

a list of groups; this will be a list of tuples if the pattern

has more than one group.

Empty matches are included in the result."""

return _compile(pattern, flags).findall(string)

⽤法：所有的匹配结果都返回在⼀个列表中，如果没有匹配上就返回⼀个空列表。

例⼦：

import re

res = re.findall(r'\w+', r'Jake@tom')

print(res)

结果：

['Jake', 'tom']

(2)search()

def search(pattern, string, flags=0):

"""Scan through string looking for a match to the pattern, returning

a match object, or None if no match was found."""

return _compile(pattern, flags).search(string)

⽤法：返回第⼀个匹配到的对象，可以调⽤这个对象的 group()⽅法返回第⼀个匹配到的值。没有匹配上返回None。例⼦：

import re

res1 = re.search(r'\d+', r'222T333')

print('search', res1)

up())

结果：

search <_sre.SRE_Match object; span=(0, 3), match='222'>

222

(3)match()

def match(pattern, string, flags=0):

"""Try to apply the pattern at the start of the string, returning

a match object, or None if no match was found."""

return _compile(pattern, flags).match(string)

⽤法：和search⽤法⼀样，唯⼀区别就是只在字符串开始匹配

例⼦：

import re

res2 = re.match(r'\d+', r'222T333')#与search区别就是只在字符串开始匹配

print('match', res2)

up())

结果：

match <_sre.SRE_Match object; span=(0, 3), match='222'>

222

2）切割和替换 sub() subn() split()

(1)sub()

def sub(pattern, repl, string, count=0, flags=0):

"""Return the string obtained by replacing the leftmost

non-overlapping occurrences of the pattern in string by the

replacement repl. repl can be either a string or a callable;

if a string, backslash escapes in it are processed. If it is

a callable, it's passed the match object and must return

a replacement string to be used."""

return _compile(pattern, flags).sub(repl, string, count)

⽤法：根据（pattern）正则表达式规则将匹配好的字符串替换为新字符串（repl）,string为⽬标串，count可以指定替换次数

例⼦：

import re

res = re.sub(r'\d+', 'SSS',r'222XXX333V3')

print(res)

结果：

SSSXXXSSSVSSS

(2)subn()

def subn(pattern, repl, string, count=0, flags=0):

"""Return a 2-tuple containing (new_string, number).

new_string is the string obtained by replacing the leftmost

non-overlapping occurrences of the pattern in the source

string by the replacement repl. number is the number of

substitutions that were made. repl can be either a string or a

callable; if a string, backslash escapes in it are processed.

If it is a callable, it's passed the match object and must

return a replacement string to be used."""

return _compile(pattern, flags).subn(repl, string, count)

⽤法：根据（pattern）正则表达式规则将匹配好的字符串替换为新字符串（repl）,string为⽬标串，count可以指定替换次数.返回的结果是元组，其中有替换结果和替换次数

例⼦：

import re

res = re.subn(r'\d+', 'SSS',r'222XXX333V3')

print(res)

结果：

('SSSXXXSSSVSSS', 3)

(3)split()

def split(pattern, string, maxsplit=0, flags=0):

"""Split the source string by the occurrences of the pattern,

returning a list containing the resulting substrings. If

capturing parentheses are used in pattern, then the text of all

groups in the pattern are also returned as part of the resulting

list. If maxsplit is nonzero, at most maxsplit splits occur,

and the remainder of the string is returned as the final element

of the list."""

return _compile(pattern, flags).split(string, maxsplit)

⽤法：按照正则表达式匹配好的字符串去切割⽬标字符串，匹配对个结果会先拿第⼀个结果切割⽬标串，切割完后拿第⼆个结果切割这两个字符串，以此类推。可以指定最⼤切割次数，返回⼀个列表。

例⼦：

import re

res = re.split(r'\d+', r'333FF444FF44')

print(res)

结果：

['', 'FF', 'FF', '']

3）进阶 compile() finditer()

(1)compile()*****时间效率

def compile(pattern, flags=0):

"Compile a regular expression pattern, returning a pattern object."

return _compile(pattern, flags)

⽤法：把正则表达式编译为正则表达式对象

作⽤：节省时间，只有在多次使⽤某⼀个相同的正则表达式的时候，才会帮助我们提⾼效率。

例⼦：

import re

res = repile(r'\d+')

print(res)

结果：

repile('\\d+')

(2)finditer()*****空间效率

def finditer(pattern, string, flags=0):

"""Return an iterator over all non-overlapping matches in thepython正则表达式不包含

string. For each match, the iterator returns a match object.

Empty matches are included in the result."""

return _compile(pattern, flags).finditer(string)

⽤法：根据正则表达式匹配字符串得到⼀个迭代器，迭代器中每个元素都是⼀个对象，每个

对象都可通过 group()⽅法获取对应的匹配值。

例⼦：

import re

res = repile(r'\d+')

res = re.finditer(r'\d+', r'sss444ff333f')

print(res)

for r in res:

print(r, '--------', r.group())

结果：

<callable_iterator object at 0x106f29668>

<_sre.SRE_Match object; span=(3, 6), match='444'> -------- 444

<_sre.SRE_Match object; span=(8, 11), match='333'> -------- 333

3.正则表式进阶（很重要）

分组与re模块的组合使⽤

1）分组 () 与 findall() finditer()

import re

#findall会优先显⽰分组中的内容，如果在第左半边括号后加上?：就会取消分组优先

#(?：正则表达式) 取消优先

#如果有⼀个分组，那么就将匹配好的元素放到⼀个列表中，如果分组有两个以上，那么这些元组组成⼀个元组存到列表中res = re.findall(r'<(\w+)>', r'<a>我爱你中国</a><h1>亲爱的母亲</h1>')

print(res)

res = re.findall(r'<(?:\w+)>', r'<a>我爱你中国</a><h1>亲爱的母亲</h1>')

print(res)

结果：

['a', 'h1']

['<a>', '<h1>']

#不会优先分组中内容，可以通过group(分组名)来得到分组中的值

import re

res = re.finditer(r'<(?:\w+)>', r'<a>我爱你中国</a><h1>亲爱的母亲</h1>')

for i in res:

up())

结果：

<a>

<h1>

2）分组命名分组与 search()

#（？P<name>正则表达式）表⽰给分组起名字

#（？P=name）表⽰使⽤这个分组，这⾥匹配到的内容应该和分组中内容完全⼀致

<1>

import re

#search匹配的是第⼀次匹配好的值

#得到的结果可以使⽤结果.group()⽅法得到

#如果search与分组配合使⽤给group传参数，第⼀个分组内容传1的到第⼀分组内容，以此类推

#groups()函数的到⼀个所有分组的集合以元组形式返回

res = re.search(r'<(\w+)>(\w+)</(\w+)>', r'<a>hello</a>')

up())

up(1))

up(2))

up(3))

ups())

结果：

<a>hello</a>

hello

('a', 'hello', 'a')

<2>

import re

res = re.search(r'<(?P<name>\w+)>\w+</(?P=name)>', r'<a>hello</a>')

up('name'))

up())

res = re.search(r'<(?P<name>\w+)>\w+</(?P=name)>', r'<a>hello</h1><a>hello</a>')

up('name'))

结果：

<a>hello</a>

<3>

import re

res = re.search(r'<(?P<tt>\w+)>(?P<cc>\w+)</\w+>', r'<a>hello</h1><a>hello</a>')

up('tt'))

up('cc'))

up())

结果：

hello

<a>hello</h1>

3）通过索引使⽤分组

#\1表⽰使⽤第⼀组，匹配到的内容必须和第⼀组中的内容完全⼀致。

import re

#\1表⽰使⽤第⼀组，匹配到的内容必须和第⼀组中的内容完全⼀致。

res = re.search(r'<(\w+)>\w+</\1>', r'<a>hello</a>')

up(1))

up())

结果：

<a>hello</a>

4）分组与 split()

切割后的结果会保留分组内被切割的内容

import re

ret = re.split('(\d+)','Tom18Jake20Json22')

print(ret)

结果：

['Tom', '18', 'Jake', '20', 'Json', '22', '']

总结：

# 在python中使⽤正则表达式

# 转义符 : 在正则中的转义符 \ 在python中的转义符

# re模块

# findall search match

# sub subn split

# compile finditer

# python中的正则表达式

# findall 会优先显⽰分组中的内容，要想取消分组优先,(?：正则表达式)

# split 遇到分组会保留分组内被切掉的内容

# search 如果search中有分组的话，通过group(n)就能够拿到group中的匹配的内容# 正则表达式进阶

# 分组命名

# (?P<name>正则表达式) 表⽰给分组起名字

# (?P=name)表⽰使⽤这个分组，这⾥匹配到的内容应该和分组中的内容完全相同# 通过索引使⽤分组

# \1 表⽰使⽤第⼀组，匹配到的内容必须和第⼀个组中的内容完全相同

688IT编程网

Python中re操作正则表达式

发表评论

推荐文章

java正则表达式选择题

一种基于正则表达式的DBC文件解析及报文分析方法[发明专利]

工龄小数点提取

非零金额正则表达式

提取文本中数字的函数

热门文章

利用正则表达式实现文本数据提取与处理

正则表达式零宽断言详解

文本匹配规则

excel中使用正则

1-31正则表达式

anki之高级筛选

BUAA_OO_2021_第一单元总结

insert语句递增写法

sublime text 3在行前插入递增数字序号的方法

字符串只允许数字和英文的正则

powerbuilder 正则表达式

Shell脚本编写的高级技巧利用正则表达式进行字符串匹配

JAVA正则表达式的三种模式:贪婪,勉强和占有的讨论

go regexp匹配规则

oracle regexp_substr 实现原理

基本的元字符回溯引用和前后查匹配模式

elasticsearch query dsl正则

oracle sql正则表达式

GA-设置目标

仅匹配全角片假名的正则表达式

最新文章

java正则表达式选择题

工龄小数点提取

非零金额正则表达式

提取文本中数字的函数

vue数字相加小数点变长-概述说明以及解释

vue validate 正则验证小数长度

标签列表

688IT编程网

Python中re操作正则表达式

发表评论

推荐文章

java正则表达式 选择题

一种基于正则表达式的DBC文件解析及报文分析方法[发明专利]

工龄小数点提取

非零金额 正则表达式

提取文本中数字的函数

热门文章

利用正则表达式实现文本数据提取与处理

正则表达式零宽断言详解

文本匹配规则

excel中使用正则

1-31正则表达式

anki之高级筛选

BUAA_OO_2021_第一单元总结

insert语句递增写法

sublime text 3在行前插入递增数字序号的方法

字符串只允许数字和英文的正则

powerbuilder 正则表达式

Shell脚本编写的高级技巧利用正则表达式进行字符串匹配

JAVA正则表达式的三种模式:贪婪,勉强和占有的讨论

go regexp匹配规则

oracle regexp_substr 实现原理

基本的元字符 回溯引用和前后查 匹配模式

elasticsearch query dsl正则

oracle sql正则表达式

GA-设置目标

仅匹配全角片假名的正则表达式

最新文章

java正则表达式 选择题

工龄小数点提取

非零金额 正则表达式

提取文本中数字的函数

vue数字相加小数点变长-概述说明以及解释

vue validate 正则验证小数长度

标签列表

java正则表达式选择题

非零金额正则表达式

基本的元字符回溯引用和前后查匹配模式

java正则表达式选择题

非零金额正则表达式