如何去掉字符串中不需要的字符?需求:
1、过滤⽤户输⼊中前后多余的空⽩字符
' nick2008@email '
2、过滤某windows下编辑⽂本中的'\r':
'hello world\r\n'
3、去掉⽂本中的unicode组合符号(⾳调):
tiān xià dì yī
思路:
1、字符串strip(),lstrip(),rstrip()⽅法去掉字符串两端字符
2、删除单个固定位置字符,可以使⽤切⽚+拼接的⽅式
3、字符串的replace()⽅法或者正则表达式re.sub()删除任意位置字符
4、字符串的translat()⽅法,可以同时删除多种不同字符
代码:
s1 = '  nick2008@email  '
s2 = '---abc+++'
# strip()去除字符串⾸尾两端的字符,默认为空⽩字符。
print(s1.strip())
print(s2.strip('-+'))
s3 = 'abc:123'
#删除字符串中的:
print(s3[:3]+s3[4:])
s4 = '\tabc\t123\txyz'
#删除字符串的\t
place('\t',''))
s5 = '\tabc\t123\txyz\ropq\r'
# 去除字符串中的\t\r
import re
ret = re.sub('[\t\r]','',s5)
print(ret)
s6 = 'abc1230323xyz'
#将字符串的a-->x,b-->y,c-->z,x-->a,y-->b,z-->a
# ⽣成映射表
table = str.maketrans('abcxyz','xyzabc')
# 使⽤translate的⽅法,完成这个映射的功能
ret2 = s6.translate(table)
print(ret2)
s7 = 'tiān xià dì yī'
# 去除字符串的⾳调
import sys
import unicodedata
remp = {
# ord返回ascii值
ord('\t'): '',
ord('\f'): '',
ord('\r'): None
}
# 去除\t,\f,\r
'''
通过使⽤dict.fromkeys()⽅法构造⼀个字典,每个unicode和⾳符作为键,对应的值全部为None,
然后使⽤alize()将原始输⼊标准化为分解形式字符
sys.maxunicode:给出最⼤Unicode代码的值的整数,即1114111(⼗六进制的0x10FFFF).
unocodedatabining:将分配给字符chr的规范组合类作为整数返回,如果未定义组合类,则返回0.
'''
cmb_chrs = dict.fromkeys(c for c in range(sys.maxunicode) if unicodedatabining(chr(c)))
b = alize('NFD',s7)
#调⽤chraslate函数删除所有重⾳符
anslate(cmb_chrs))
#⽅法2:
alize('NFKD',s7).encode('ascii','ignore').decode())
===============================================================================
>>> s = '  richardo@qq  '
>>> s.strip()
'richardo@qq'
>>> s.lstrip()
'richardo@qq  '
>>> s.rstrip()
'  richardo@qq'
>>> s = ' \t  richardo@qq \n  '
>>> s.strip()
'richardo@qq'
>>> s = '====richardo@qq======'
>>> s.strip('=')
'richardo@qq'
>>> s = '==+-==richardo@qq===+-==='
>>> s.strip('=+-')
'richardo@qq'
>>> s2 = 'abc:1234'
>>> s2[:3] + s2[4:]
'abc1234'
>>> s3 = '  xyz  acb  fn '
>>> s3.strip()
'xyz  acb  fn'
>>> s3.replace(' ','')
'xyzacbfn'
>>> s3 = ' \t abc \t xyx \n '
>>> s3.replace(' ','')
'\tabc\txyx\n'
>>> import re
>>> re.sub('[ \t\n]+', '', s3)
'abcxyx'
>>> re.sub('\s+','',s3)  # ⼀个或者多个的空⽩字符
'abcxyx'
>>> s = 'abc1234xyz'
>>> s.translate?
Docstring:
Return a copy of the string S in which each character has been mapped through the given translation table. The table must implement
lookup/indexing via __getitem__, for instance a dictionary or list, mapping Unicode ordinals to Unicode ordinals, strings, or None. If
this operation raises LookupError, the character is left untouched. Characters mapped to None are deleted.
Type:      builtin_function_or_method
>>> s.translate({'a':'X'})
'abc1234xyz'
>>> ord('a')
97
>>> s.translate({ord('a'):'X'})
'Xbc1234xyz'
>>> s.translate({ord('a'):'X',ord('b'):'Y'})
'XYc1234xyz'
>>> s.maketrans('abcxyz','XYZABC')
字符串replace函数
{97: 88, 98: 89, 99: 90, 120: 65, 121: 66, 122: 67}
>>>  s.translate(s.maketrans('abcxyz','XYZABC'))
'XYZ1234ABC'
>>> s.translate({ord('a'):None})
'bc1234xyz'
>>> s4 = 'tiān xià dì yī'
>>> c = 'à'
>>> len(c)
1
>>> c[0]
'à'
>>> s5 = 'nǐ hǎo'
>>> d = 'ǎ'
>>> len(d)
1
>>> import unicodedata
>>> unicodedatabining(d[1])
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-103-a8ef43ca7ee4> in <module>
-
---> 1 unicodedatabining(d[1])
IndexError: string index out of range
>>> unicodedatabining(d[0])
>>> mystr = 'Lǐ Zhōu Wú'
>>> alize('NFKD', mystr).encode('ascii','ignore')
b'Li Zhou Wu'
>>> alize('NFKD', mystr).encode('ascii','ignore'))
"b'Li Zhou Wu'"
>>> alize('NFKD', mystr).encode('ascii','ignore')) bytes
>>> alize('NFKD', mystr).encode('ascii','ignore').decode() 'Li Zhou Wu'
>>>

版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系QQ:729038198,我们将在24小时内删除。