python分组排序_python中分组排序--groupby(),rank()1.python 中分组统计
1.1按性别统计出年龄最⼤,最⼩,平均值
import pandas as pd
df = pd.read_excel(r'./data.xlsx')
print(df)
ages = df.groupby(['gender'])['age']
ages_min = ages.min()
ages_max = ages.max()
ages_mean = an()
print(ages_min)
print(ages_max)
print(ages_mean)
'''
输出结果
gender
⼥ 16
男 12
Name: age, dtype: int64
gender
⼥ 32
男 32
Name: age, dtype: int64
gender
⼥ 25.25
男 17.20
Name: age, dtype: float64
'''
1.2⽣成⼀列sum_age 对age 进⾏累加
df['sum_age'] = df['age'].cumsum()
print(df)
1.3新⽣成⼀列sum_age_new 按照gender和is_good对age进⾏累加
df['sum_age_new'] = df.groupby(['gender','is_good'])['age'].cumsum()
2.python中排序问题
2.1 按照年龄进⾏排序
df['rank'] = df['age'].rank()
df['rank_mean'] = df['age'].rank(method='average')
df['rank_min'] = df['age'].rank(method='min')
df['rank_max'] = df['age'].rank(method='max')
df['rank_first'] = df['age'].rank(method='first')
print(df)
根据不同的性别对年龄进⾏排序
df['rank_g'] = df.groupby(['gender'])['age'].rank()
print(df)
2.2在排序的过程中遇到两个数值相同,空置的排序情况,在这种条件下rank如何进⾏参数设置
⾸先排序过程中存在相同的数值时?
rank()函数参数设置
默认为average平均值:年龄为32的数值,排序应该为8,9取平均值则为8.5
min:排序中最⼩值,年龄排序中取值为8
max:排序中最⼤值,年龄排序中取值9
first:同样数值按照值出现的前后进⾏排序 5号性别为男的年龄排序为8,7号性别为⼥的排序为9
dense: like ‘min’, but rank always increases by 1 between groups 排序时当值相同时,相同的值为同⼀排名类似min值排序,后续值排名在此排名基础上加⼀
2.na_option : {‘keep’, ‘top’, ‘bottom’}, default ‘keep’ 当排序数据中存在空值时,默认值设置为keep
How to rank NaN values:
keep: assign NaN rank to NaN values 默认空值不参与排序
top: assign smallest rank to NaN values if ascending 默认为升序时从空值为最⼩值排序
bottom: assign highest rank to NaN values if ascending 默认升序时 空置为
df['rank'] = df['age'].rank(method='first')
df['rank_k'] = df['age'].rank(method='first',na_option='keep')
df['rank_t'] = df['age'].rank(method='first',na_option='top')
df['rank_b'] = df['age'].rank(method='first',na_option='bottom')
print(df)
data['rank'] = upby(['Name_y'])['Salary'].rank(ascending=False,method='dense')
3.对salary进⾏降序排序,对于排序中相同salary值按照emp_no的⼤⼩进⾏排序
在使⽤pandas时先按照emp_no和salary进⾏值的排序,然后再进⾏rank(method=‘dense’)排序
df = pd.DataFrame({'emp_no':[10001,10002,10003,10004,10005,10006,10007,10010,10009,10011],'salary': [88958,72527,43311,74057,94692,43311,88070,94409,94409,25828]})
print(df)
df['排序-1'] = df.sort_values(by=['emp_no','salary'])['salary'].rank(method='first',ascending=False)
dt = df.sort_values(by=['排序-1'])
print(dt)
df['排序-1'] = df['salary'].rank(method='dense',ascending=False)groupby分组
dt = df.sort_values(by=['排序-1','emp_no'])
print(dt)
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系QQ:729038198,我们将在24小时内删除。
发表评论