【Python数据分析与可视化】Pandas统计分析(实训⼆)
【Python数据分析与可视化】Pandas统计分析(实训⼆)
对⼩费数据集进⾏数据分析与可视化
1.导⼊模块
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
2.获取数据
ad_excel('tips.xls')
fdata
total_bill tip sex smoker day time size 016.99 1.01Female No Sun Dinner2 110.34 1.66Male No Sun Dinner3 221.01 3.50Male No Sun Dinner3 323.68 3.31Male No Sun Dinner2 424.59 3.61Female No Sun Dinner4 525.29 4.71Male No Sun Dinner4
68.77 2.00Male No Sun Dinner2
726.88 3.12Male No Sun Dinner4 815.04 1.96Male No Sun Dinner2 914.78 3.23Male No Sun Dinner2 1010.27 1.71Male No Sun Dinner2 1135.26 5.00Female No Sun Dinner4 1215.42 1.57Male No Sun Dinner2 1318.43 3.00Male No Sun Dinner4 1414.83 3.02Female No Sun Dinner2 1521.58 3.92Male No Sun Dinner2 1610.33 1.67Female No Sun Dinner3 1716.29 3.71Male No Sun Dinner3 1816.97 3.50Female No Sun Dinner3 1920.65 3.35Male No Sat Dinner3 2017.92 4.08Male No Sat Dinner2 2120.29 2.75Female No Sat Dinner2
total_bill tip sex smoker day time size 2215.77 2.23Female No Sat Dinner2 2339.427.58Male No Sat Dinner4 2419.82 3.18Male No Sat Dinner2 2517.81 2.34Male No Sat Dinner4 2613.37 2.00Mal
e No Sat Dinner2 2712.69 2.00Male No Sat Dinner2 2821.70 4.30Male No Sat Dinner2 2919.65 3.00Female No Sat Dinner2 ........................ 21428.17 6.50Female Yes Sat Dinner3 21512.90 1.10Female Yes Sat Dinner2 21628.15 3.00Male Yes Sat Dinner5 21711.59 1.50Male Yes Sat Dinner2 2187.74 1.44Male Yes Sat Dinner2 21930.14 3.09Female Yes Sat Dinner4 22012.16 2.20Male Yes Fri Lunch2 22113.42 3.48Female Yes Fri Lunch2 2228.58 1.92Male Yes Fri Lunch1 22315.98 3.00Female No Fri Lunch3 22413.42 1.58Male Yes Fri Lunch2 22516.27 2.50Female Yes Fri Lunch2 22610.09 2.00Female Yes Fri Lunch2 22720.45 3.00Male No Sat Dinner4 22813.28 2.72Male No Sat Dinner2 22922.12 2.88Female Yes Sat Dinner2 23024.01 2.00Male Yes Sat Dinner4 23115.69 3.00Male Yes Sat Dinner3 23211.61 3.39Male No Sat Dinner2 23310.77 1.47Male No Sat Dinner2 23415.53 3.00Male Yes Sat Dinner2 23510.07 1.25Male No Sat Dinner2 23612.60 1.00Male Yes Sat Dinner2 23732.83 1.17Male Yes Sat Dinner2 23835.83 4.67Female No Sat Dinner3 23929.03 5.92Male No Sat Dinner3 24027.18 2.00Female Yes Sat Dinner2 24122.67 2.00Male Yes Sat Dinner2
24122.67 2.00Male Yes Sat Dinner2 total_bill tip sex smoker day time size 24217.82 1.75Male No Sat Dinner2
24318.78 3.00Female No Thur Dinner2 244 rows × 7 columns
3.分析数据
(1)查看数据待描述信息
fdata.describe().head()
total_bill tip size
count244.000000244.000000244.000000
mean19.785943 2.998279 2.569672
std8.902412 1.3836380.951100
min 3.070000 1.000000 1.000000
25%13.347500 2.000000 2.000000
(2)修改列名为汉字,并显⽰前5⾏数据
#修改列名为汉字total_bill tip sex smoker day time size
'day':'星期','time':'聚餐时间段','size':'⼈数'}),inplace=True)
fdata.head()
消费总额⼩费性别是否抽烟星期聚餐时间段⼈数
016.99 1.01Female No Sun Dinner2
110.34 1.66Male No Sun Dinner3
221.01 3.50Male No Sun Dinner3
323.68 3.31Male No Sun Dinner2
424.59 3.61Female No Sun Dinner4(3)增加⼀列“⼈均消费”
fdata['⼈均消费']=round(fdata['消费总额']/fdata['⼈数'],2)
fdata.head()
消费总额⼩费性别是否抽烟星期聚餐时间段⼈数⼈均消费
016.99 1.01Female No Sun Dinner28.49
110.34 1.66Male No Sun Dinner3 3.45
221.01 3.50Male No Sun Dinner37.00
323.68 3.31Male No Sun Dinner211.84
424.59 3.61Female No Sun Dinner4 6.15(4)查询抽烟男性中⼈均消费⼤于15的数据
# ⽅法1:
fdata[(fdata['是否抽烟']=='Yes')&(fdata['性别']=='Male')&(fdata['⼈均消费']>15)]
# ⽅法2:
# fdata[(fdata.是否抽烟=='Yes') &(fdata.性别=='Male') & (fdata.⼈均消费> 15) ]
# ⽅法3:
# fdata.query( '是否抽烟=="Yes" & 性别=="Male" & ⼈均消费>15')
消费总额⼩费性别是否抽烟星期聚餐时间段⼈数⼈均消费8332.68 5.00Male Yes Thur Lunch216.34 17050.8110.00Male Yes Sat Dinner316.94 17331.85 3.18Male Yes Sun Dinner215.92 17532.90 3.11Male Yes Sun Dinner216.45 17934.63 3.55Male Yes Sun Dinner217.32 18245.35 3.50Male Yes Sun Dinner315.12 18440.55 3.00Male Yes Sun Dinner220.27 23732.83 1.17Male Yes Sat Dinner216.42(5)分析⼩费和总⾦额的关系
#分析⼩费和总⾦额的关系,散点图
fdata.plot(kind='scatter',x='消费总额',y='⼩费')
#正相关关系
(6)分析男⼥顾客哪个更慷慨,分组看看男性还是⼥性的⼩费平均⽔平更⾼
#分析男⼥顾客哪个更慷慨,就是分组看看男性还是⼥性的⼩费平均⽔平更⾼
性别
Female 2.833448
Male 3.089618
Name: ⼩费, dtype: float64
(7)分析⽇期和⼩费的关系
#分析⽇期和⼩费的关系,直⽅图
print(fdata['星期'].unique())
upby('星期')['⼩费'].mean()
fig=r.plot(kind='bar',x='星期',y='⼩费',fontsize=12,rot=36)
# fig.axes.title.set_size(16)
['Sun' 'Sat' 'Thur' 'Fri']
(8)性别+抽烟书对慷慨度的影响
#性别+抽烟书对慷慨度的影响
upby(['性别','是否抽烟'])['⼩费'].mean()
fig=r.plot(kind='bar',x=['性别','是否抽烟'],y='⼩费',fontsize=12,rot=30) fig.axes.title.set_size(16)
python怎么读取excel文件数据(9)聚餐时间与⼩费数额的关系
#聚餐时间与⼩费数额的关系
upby('聚餐时间段')['⼩费'].mean()
fig=r.plot(kind='bar',x='聚餐时间',y='⼩费')
fig.axes.title.set_size(16)
从分析图可以发现,晚餐时段的⼩费⽐午餐时段的要⾼。
加油!
感谢!
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系QQ:729038198,我们将在24小时内删除。
发表评论