Python统计list中各个元素出现的次数
列表count()函数调⽤⽅法
对象.count(参数)
count()⽅法操作⽰例
有列表['a','iplaypython','c','b‘,'a'],想统计字符串'a'在列表中出现的次数,可以这样操作
>>> ['a','iplaypython','c','b','a'].count('a')
2
其返回值就是要统计参数出现的次数。在应⽤的时候最好是把列表赋给⼀个变量,之后再⽤count()⽅法来操作⽐较好。当对象是⼀个嵌套的列表时,要查嵌套列表中的列表参数count()⽅法同样可以完成
>>> x = [1,2,'a',[1,2],[1,2]]
>>> x.count([1,2])
2
>>> x.count(1)
1
>>> x.count('a')
1
1. 计算字母和数字出现的次数
str='abc123abc456aa'
d={}
for x in str:
print x
if not x in d:
d[x]=1
else:
d[x]=d[x]+1
print d
{'a': 4, 'c': 2, 'b': 2, '1': 1, '3': 1, '2': 1, '5': 1, '4': 1, '6': 1}
#!/usr/bin/python3
str="ABCdefabcdefabc"
str=str.lower()
str_list=list(str)
char_dict={}
for char1 in str:
if char1 in char_dict:
count=char_dict[char1]
else:
count=0
count=count+1
char_dict[char1]=count
print(char_dict)
a = "aAsmr3idd4bgs7Dlsf9eAF"
请将a字符串的数字取出,并输出成⼀个新的字符串。
请统计a字符串出现的每个字母的出现次数(忽略⼤⼩写,a与A是同⼀个字母),并输出成⼀个字典。例 {'a':3,'b':1}
请去除a字符串多次出现的字母,仅留最先出现的⼀个,⼤⼩写不敏感。例 'aAsmr3idd4bgs7Dlsf9eAF',经过去除后,输出 'asmr3id4bg7lf9e' a = "aAsmr3idd4bgs7Dlsf9eAF"
def fun1_2(x): #1&2
x = x.lower() #⼤⼩写转换
num = []
dic = {}
for i in x:
if i.isdigit(): #判断如果为数字,请将a字符串的数字取出,并输出⼀个新的字符串
num.append(i)
else: #2 请统计a字符串出现每个字母的出现次数(忽视⼤⼩写),并输出⼀个字典。例:{'a':3,'b':1}
if i in dic:
continue
else:
dic[i] = x.count(i)
new = ''.join(num)
print"the new numbers string is: " + new
print"the dictionary is: %s" % dic
fun1_2(a)
def fun3(x):
x = x.lower()
new3 = []
for i in x:
if i in new3:
continue
else:
new3.append(i)
print''.join(new3)
fun3(a)
三种⽅法:
①直接使⽤dict
②使⽤defaultdict
③使⽤Counter
ps:`int()`函数默认返回0
①dict
1. text = "I'm a hand some boy!"
2.
3. frequency = {}
4.
5. for word in text.split():
6. if word not in frequency:
7. frequency[word] = 1
8. else:
9. frequency[word] += 1
②defaultdict
1. import collections
2.
3. frequency = collections.defaultdict(int)
4.
5. text = "I'm a hand some boy!"
6.
7. for word in text.split():
8. frequency[word] += 1
③Counter
1. import collections
2.
3. text = "I'm a hand some boy!"
4. frequency = collections.Counter(text.split())
现有列表如下:
[6, 7, 5, 9, 4, 1, 8, 6, 2, 9]
希望统计各个元素出现的次数,可以看作⼀个词频统计的问题。
我们希望最终得到⼀个这样的结果:{6:2, }即 {某个元素:出现的次数...}
⾸先要将这些元素作为字典的键,建⽴⼀个初值为空的字典:
>>> from random import randint
>>> l = [randint(1,10) for x in xrange(10)]
>>> l
[6, 7, 5, 9, 4, 1, 8, 6, 2, 9]
>>> d = dict.fromkeys(l, 0)
>>> d
{1: 0, 2: 0, 4: 0, 5: 0, 6: 0, 7: 0, 8: 0, 9: 0}
# 现在的任务是需要将d中每个键所对应的值统计出来>>> for x in l:
>>> d[x] += 1
>>> d
{1: 1, 2: 1, 4: 1, 5: 1, 6: 2, 7: 1, 8: 1, 9: 2}
# 这就统计完了所有的元素出现的次数
另外⼀种⽅法,利⽤collections模块中的Counter对象
>>> from collections import Counter
# 这个Counter可以直接接受⼀个列表,将它转化为统计完成的结果
>>> d = Counter(l)
>>> d
Counter({6: 2, 9: 2, 1: 1, 2: 1, 4: 1, 5: 1, 7: 1, 8: 1})
# 该Counter对象是字典对象的⼦类,也可以通过键来访问对应值
>>> d[6]
2
# Counter对象⽅便之处在于它内置有most_common(n)⽅法,可以直接统计出前n个最⾼词频
>>> d.most_common(2)
[(6, 2), (9, 2)]
⽤python做词频统计
import string
import time
path='C:\\Users\\ZHANGSHUAILING\\Desktop\\'
with open(path,'r') as text:
words=[raw_word.strip(string.punctuation).lower() for raw_word ad().split()]
words_index=set(words)
counts_dict={unt(index) for index in words_index}
for word in sorted(counts_dict,key=lambda x:counts_dict[x],reverse=True):
time.sleep(2)
print ('{}--{} times'.format(word,counts_dict[word]))
{'the': 2154, 'and': 1394, 'to': 1080, 'of': 871, 'a': 861, 'his': 639, 'The': 637, 'in': 515, 'he': 461, 'with': 310, 'that': 308, 'you': 295, 'for': 280, 'A': 269, 'was': 258, 'him': 246, 'I': 234, 'had': 220, 'as': 217, 'not': 215, 'by': 196, 'on': 189, 'it': 178, 'be': 164, 'at': 153, 'from': 149, 'they': 149, 'but': 149, 'is': 144, 'her': 144, 'their': 143, 'who': 131, 'all': 121, 'one': 119, 'which': 119,}#部分结果展⽰
import re,collections
def get_words(file):
with open (file) as f:
words_box=[]
for line in f:
if re.match(r'[a-zA-Z0-9]*',line):#避免中⽂影响
d(line.strip().split())
return collections.Counter(words_box)
print(get_nums('')+get_nums('伊索寓⾔.txt'))
import re,collections
def get_words(file):
with open (file) as f:
words_box=[]
for line in f:
if re.match(r'[a-zA-Z0-9]',line):
d(line.strip().split())
return collections.Counter(words_box)
a=get_nums('')+get_nums('伊索寓⾔.txt')
st_common(10))
python 计数⽅法⼩结
⽅法⼀:遍历法
def get_counts(sequence):
counts = {}
for x in sequence:
if x in counts:
counts[x] += 1字符串函数strip的作用
else:
counts[x] = 1
return counts
这是最常规的⽅法,⼀个⼀个数咯
⽅法⼆: defaultdict
这⾥⽤到了coollections 库
from collections import defaultdict
def get_counts2(sequence):
counts = defaultdict(int) #所有值被初始化为0
for x in sequence:
counts[x] += 1
return counts
最后得到的是元素:个数的⼀个字典
⽅法三:value_counts()
这个⽅法是pandas 中的,所以使⽤时候需要先导⼊pandas ,该⽅法会对元素计数,并按从⼤到⼩的顺序排列
tz_counts = frame['tz'].value_counts()
tz_counts[:10]
>>>
America/New_York 1251
521
America/Chicago 400
America/Los_Angeles 382
America/Denver 191
Europe/London 74
Asia/Tokyo 37
Pacific/Honolulu 36
Europe/Madrid 35
America/Sao_Paulo 33
Name: tz, dtype: int64
我们看⼀下官⽅⽂档中的说明
Series.value_counts(normalize=False, sort=True, ascending=False, bins=None, dropna=True)[source]? Returns object containing counts of unique values.
1
2
这⾥说明⼀下返回的数据是Series 格式的
总的来说⽅法⼀最为普通如果数据量⽐较⼤的话⾮常费时间,⽅法三对数据的格式有要求,所以推荐使⽤⽅法⼆
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系QQ:729038198,我们将在24小时内删除。
发表评论