Python--详解Python中re.sub--688IT编程网

Python--详解Python中re.sub

给出定义：

re.sub(pattern, repl, string, count=0, flags=0)

Return the string obtained by replacing the leftmost non-overlapping occurrences of pattern in string by the replacement repl. If the pattern isn’t found, string is returned unchanged. repl can be a string or a function; if it is a string, any backslash e >>> re.sub(r'def\s+([a-zA-Z_][a-zA-Z_0-9]*)\s*\(\s*\):',

... r'static PyObject*\npy_\1(void)\n{',

... 'def myfunc():')

'static PyObject*\npy_myfunc(void)\n{'

If repl is a function, it is called for every non-overlapping occurrence of pattern. The function takes a single match object argument, and returns the replacement string. For example:

>>> def dashrepl(matchobj):

.. up(0) == '-': return ' '

... else: return '-'

>>> re.sub('-{1,2}', dashrepl, 'pro----gram-files')

'pro--gram files'

>>> re.sub(r'\sAND\s', ' & ', 'Baked Beans And Spam', flags=re.IGNORECASE)

'Baked Beans & Spam'

The pattern may be a string or an RE object.

The optional argument count is the maximum number of pattern occurrences to be replaced; count must be a non-negative integer. If omitted or zero, all occurrences will be replaced. Empty matches for the pattern are replaced only when not adja In addition to character escapes and backreferences as described above, \g<name> will use the substring matched by the group named name, as defined by the (?P<name>...) syntax. \g<number> uses the corresponding group number; \g<2> is Changed in version 2.7: Added the optional flags argument.

re.sub的功能

re是regular expression的所写，表⽰正则表达式

sub是substitute的所写，表⽰替换；

re.sub是个正则表达式⽅⾯的函数，⽤来实现通过正则表达式，实现⽐普通字符串的replace更加强⼤的替换功能；

举个最简单的例⼦：

如果输⼊字符串是：

inputStr = "hello 111 world 111"

那么你可以通过

replacedStr = place("111", "222")

去换成

"hello 222 world 222"

但是，如果输⼊字符串是：

inputStr = "hello 123 world 456"

⽽你是想把123和456，都换成222

（以及其他还有更多的复杂的情况的时候），

那么就没法直接通过字符串的replace达到这⼀⽬的了。

就需要借助于re.sub，通过正则表达式，来实现这种相对复杂的字符串的替换：

replacedStr = re.sub("\d+", "222", inputStr)

当然，实际情况中，会有⽐这个例⼦更加复杂的，其他各种特殊情况，就只能通过此re.sub去实现如此复杂的替换的功能了。

所以，re.sub的含义，作⽤，功能就是：

对于输⼊的⼀个字符串，利⽤正则表达式（的强⼤的字符串处理功能），去实现（相对复杂的）字符串替换处理，然后返回被替换后的字符串

其中re.sub还⽀持各种参数，⽐如count指定要替换的个数等等。

下⾯就是来详细解释其各个参数的含义。

re.sub的各个参数的详细解释

re.sub共有五个参数。

re.sub(pattern, repl, string, count=0, flags=0)

其中三个必选参数：pattern, repl, string

两个可选参数：count, flags

第⼀个参数：pattern

pattern，表⽰正则中的模式字符串，这个没太多要解释的。

需要知道的是：

反斜杠加数字（\N），则对应着匹配的组（matched group）

⽐如\6，表⽰匹配前⾯pattern中的第6个group

意味着，pattern中，前⾯肯定是存在对应的，第6个group，然后你后⾯也才能去引⽤

⽐如，想要处理：

hello crifan, nihao crifan

且此处的，前后的crifan，肯定是⼀样的。

⽽想要把整个这样的字符串，换成crifanli

则就可以这样的re.sub实现替换：

inputStr = "hello crifan, nihao crifan";

replacedStr = re.sub(r"hello (\w+), nihao \1", "crifanli", inputStr);

repl，就是replacement，被替换，的字符串的意思。

repl可以是字符串，也可以是函数。

repl是字符串

如果repl是字符串的话，其中的任何反斜杠转义字符，都会被处理的。

即：

\n：会被处理为对应的换⾏符；

\r：会被处理为回车符；

其他不能识别的转移字符，则只是被识别为普通的字符：

⽐如\j，会被处理为j这个字母本⾝；

反斜杠加g以及中括号内⼀个名字，即：\g，对应着命了名的组，named group 接着上⾯的举例：

想要把对应的：

hello crifan, nihao crifan

中的crifan提取出来，只剩：

crifan

就可以写成：

inputStr = "hello crifan, nihao crifan";

replacedStr = re.sub(r"hello (\w+), nihao \1", "\g<1>", inputStr);

print "replacedStr=",replacedStr; #crifan

对应的带命名的组（named group）的版本是：

inputStr = “hello crifan, nihao crifan”;

replacedStr = re.sub(r”hello (?P\w+), nihao (?P=name)”, “\g”, inputStr);

print “replacedStr=”,replacedStr; #crifan

repl是函数

举例说明：

⽐如输⼊内容是：

hello 123 world 456

想要把其中的数字部分，都加上111，变成：

hello 234 world 567

那么就可以写成：

#!/usr/bin/python

# -*- coding: utf-8 -*-

import re;

def pythonReSubDemo():

"""

demo Pyton re.sub

"""

inputStr = "hello 123 world 456";

def _add111(matched):

intStr = up("number"); #123

intValue = int(intStr);

addedValue = intValue + 111; #234

addedValueStr = str(addedValue);

return addedValueStr;

replacedStr = re.sub("(?P<number>\d+)", _add111, inputStr);

print "replacedStr=",replacedStr; #hello 234 world 567

>>>>>>>>>>>>>>>#### if __name__=="__main__":

pythonReSubDemo();

第三个参数：string

string，即表⽰要被处理，要被替换的那个string字符串。

没什么特殊要说明。

第四个参数：count

举例说明：

继续之前的例⼦，假如对于匹配到的内容，只处理其中⼀部分。

⽐如对于：

hello 123 world 456 nihao 789

只是像要处理前⾯两个数字：123,456，分别给他们加111，⽽不处理789，

那么就可以写成：

#!/usr/bin/python

# -*- coding: utf-8 -*-

import re;

def pythonReSubDemo():

"""

demo Pyton re.sub

"""

inputStr = "hello 123 world 456 nihao 789";

intStr = up("number"); #123

intValue = int(intStr);

addedValue = intValue + 111; #234

addedValueStr = str(addedValue);

return addedValueStr;

replacedStr = re.sub("(?P<number>\d+)", _add111, inputStr, 2);

print "replacedStr=",replacedStr; #hello 234 world 567 nihao 789

>>>>>>>>>>>>>>>####

if __name__=="__main__":

pythonReSubDemo();

第五个参数：flags

关于re.sub的注意事项

正则匹配的含义要注意，被替换的字符串，即参数repl，是普通的字符串，不是pattern

注意到，语法是：

re.sub(pattern, repl, string, count=0, flags=0)

即，对应的第⼆个参数是repl。

需要你指定对应的r前缀，才是pattern：

r"xxxx"

不要误把第四个参数flag的值，传递到第三个参数count中了

否则就会出现我这⾥：

【已解决】Python中，（1）repile后再sub可以⼯作，但re.sub不⼯作，或者是（2）re.search后replace⼯作，但直接re.sub以及repile后再re.sub都不⼯作遇到的问题：

当传递第三个参数，原以为是flag的值是，

结果实际上是count的值

所以导致re.sub不功能，

所以要参数指定清楚了：

replacedStr = re.sub(replacePattern, orignialStr, replacedPartStr, flags=re.I); # can omit count parameter

或：

只能输入数字,最大值为1000的正则表达式

« 上一篇

VBA 中的正则表达式应用与实例讲解

688IT编程网

Python--详解Python中re.sub

发表评论

推荐文章

1-4096的整数正则表达式

正则10-360之间的整数

数据库正则匹配数字

ue 匹配数字正则

ireport常用正则表达式

热门文章

生成2位随机整数的正则表达式

大于等于0的整数的正则

大于指定整数的数字正则表达式

阿里云密码正则表达式

el-form 密码正则表达

js 密码正则表达式

php密码正则

excel字母正则 -回复

shell 中括号正则

sn明细正则表达式

字母对称的正则表达式

shell akw 正则表达式

hive中的正则表达式

密码数字字母符号混合 java 正则

正则数字字母组合

组织机构代码正则

8位密码的正则表达式

C#的常用正则表达式

数字验证正则表达式大全

正则表达式判断abb式短语

最新文章

数据库正则匹配数字

ue 匹配数字正则

ireport常用正则表达式

vue开发中利用正则限制input框的输入(手机号、非0开头的正整数等)

正整数的正则

vue 控制用户只能输入正整数的方法

标签列表

688IT编程网

Python--详解Python中re.sub

发表评论

推荐文章

1-4096的整数正则表达式

正则10-360之间的整数

数据库正则匹配数字

ue 匹配数字 正则

ireport常用正则表达式

热门文章

生成2位随机整数的正则表达式

大于等于0的整数的正则

大于指定整数的数字 正则表达式

阿里云密码正则表达式

el-form 密码正则表达

js 密码 正则表达式

php密码正则

excel字母正则 -回复

shell 中括号 正则

sn明细正则表达式

字母对称的正则表达式

shell akw 正则表达式

hive中的正则表达式

密码 数字字母符号混合 java 正则

正则数字字母组合

组织机构代码正则

8位密码的正则表达式

C#的常用正则表达式

数字验证正则表达式大全

正则表达式判断abb式短语

最新文章

数据库正则匹配数字

ue 匹配数字 正则

ireport常用正则表达式

vue开发中利用正则限制input框的输入(手机号、非0开头的正整数等)

正整数的正则

vue 控制用户只能输入正整数的方法

标签列表

ue 匹配数字正则

大于指定整数的数字正则表达式

js 密码正则表达式

shell 中括号正则

密码数字字母符号混合 java 正则

ue 匹配数字正则