Python的TOP50数据可视化图形(包含代码)
使⽤python中最有⽤的50个数据可视化图形,并且⽤代码清晰的演⽰了使⽤matplotlib和seaborn库的过程并且展⽰了最终的结果。⼀、简介
下⾯的图表根据不同的⽬标被分成了7组。例如,如果你想画出⼀张两个元素的相互关系图,你可以在关联这⼀章节到适合你的图形,如果你想画⼀个时间序列的图形,那就再时间序列这⼀章节寻即可。
⼀个有效的图形包含以下⼏种特质:
1.准确、清晰的表达出数据意图,⽽不是扭曲事实;
2.设计简单,你不需要为了实现⼀张图形⽽花费⼤量的时间和精⼒;
3.以美的形式呈现出来,⽽不是类似于⿊底⽩字之类的;
4.不会表达出超出接受能⼒范围的信息量;
⼆、初始化
在展现下⾯这些图形之前,先定义⼀些配置项,你也可以根据个⼈的喜好重新定义。
# !pip install brewer2mpl
import numpy as np
import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sns
import warnings; warnings.filterwarnings(action='once')
large =22; med =16; small =12
params ={'axes.titlesize': large,
'legend.fontsize': med,
'figure.figsize':(16,10),
html投屏播放器源码'axes.labelsize': med,
'axes.titlesize': med,
'xtick.labelsize': med,
'ytick.labelsize': med,
'figure.titlesize': large}
plt.style.use('seaborn-whitegrid')
sns.set_style("white")
%matplotlib inline
# Version
print(mpl.__version__)#> 3.0.0
print(sns.__version__)#> 0.9.0
三、关联
这些图形展⽰了两个或更多变量间的关联关系,或者可以理解成⼀个变量相对于另⼀个变量发⽣的变化。
1.散点图
散点图是经典⽽⼜基础的图形,它可以清楚的展⽰出两个变量之间的关系。在python中你可以使⽤ plt.scatterplot() 很便捷的创建图形。
# Import dataset
midwest = pd.read_csv("raw.githubusercontent/selva86/datasets/master/midwest_filter.csv")
# Prepare Data
# Create as many colors as there are unique midwest['category']
categories = np.unique(midwest['category'])# midwest.category.drop_duplicates()
colors =[ab10(i/float(len(categories)-1))for i in range(len(categories))]
# Draw Plot for Each Category
plt.figure(figsize=(16,10), dpi=80, facecolor='w', edgecolor='k')
for i, category in enumerate(categories):
plt.scatter('area','poptotal', data=midwest.loc[midwest.category==category,:], s=20, c=colors[i], label=str(category))
# Decorations
plt.title("Scatterplot of Midwest Area vs Population", fontsize=22)
plt.legend(fontsize=12)
plt.show()
2.带边界的⽓泡图
有时候你需要画出⼀张⽓泡图,并且⽤边界让⼀部分点更显眼。下⾯的例⼦你可以使⽤ encircle() ⽅法并结合pandas的DataFrame数据实现这⼀效果。
from matplotlib import patches
from scipy.spatial import ConvexHull
import warnings; warnings.simplefilter('ignore')
sns.set_style("white")
# Step 1: Prepare Data
midwest = pd.read_csv("raw.githubusercontent/selva86/datasets/master/midwest_filter.csv")
# As many colors as there are unique midwest['category']
categories = np.unique(midwest['category'])
colors =[ab10(i/float(len(categories)-1))for i in range(len(categories))]
# Step 2: Draw Scatterplot with unique color for each category
fig = plt.figure(figsize=(16,10), dpi=80, facecolor='w', edgecolor='k')
for i, category in enumerate(categories):
plt.scatter('area','poptotal', data=midwest.loc[midwest.category==category,:], s='dot_size', c=colors[i], label=str(category), edgecolors='black', linewidths =.5)
# Step 3: Encircling
# stackoverflow/questions/44575681/how-do-i-encircle-different-data-sets-in-scatter-plot
def encircle(x,y, ax=None,**kw):
if not ax: a()
p = np.c_[x,y]
hull = ConvexHull(p)
poly = plt.Polygon(p[hull.vertices,:],**kw)
ax.add_patch(poly)
# Select data to be encircled
midwest_encircle_data = midwest.loc[midwest.state=='IN',:]
# Draw polygon surrounding vertices
encircle(midwest_encircle_data.area, midwest_encircle_data.poptotal, ec="k", fc="gold", alpha=0.1)
encircle(midwest_encircle_data.area, midwest_encircle_data.poptotal, ec="firebrick", fc="none", linewidth=1.5)
# Step 4: Decorations
xlabel='Area', ylabel='Population')
plt.title("Bubble Plot with Encircling", fontsize=22)
plt.legend(fontsize=12)
plt.show()
3.带线性回归最佳拟合线的散点图
如果你想了解两个变量如何相互改变,那么最合适的线就是要⾛的路。下图显⽰了数据中各组之间最佳拟合线的差异。要禁⽤分组并仅为整个数据集绘制⼀条最佳拟合线,请从下⾯的调⽤ sns.lmplot() 中删除该参数 hue=‘cyl’。
# Import Data
df = pd.read_csv("raw.githubusercontent/selva86/datasets/master/mpg_ggplot2.csv")
python代码画图案>按小鼠体重excel随机分组
df_select = df.l.isin([4,8]),:]
# Plot
sns.set_style("white")
gridobj = sns.lmplot(x="displ", y="hwy", hue="cyl", data=df_select,
height=7, aspect=1.6, robust=True, palette='tab10',
scatter_kws=dict(s=60, linewidths=.7, edgecolors='black'))
# Decorations
gridobj.set(xlim=(0.5,7.5), ylim=(0,50))
plt.title("Scatterplot with line of best fit grouped by number of cylinders", fontsize=20)
对于每列上的回归
或者,您可以在其⾃⼰的列中显⽰每个组的最佳拟合线。你可以通过在⾥⾯设置参数来实现这⼀点。
动漫制作技术专业主要学什么# Import Data
df = pd.read_csv("raw.githubusercontent/selva86/datasets/master/mpg_ggplot2.csv")
df_select = df.l.isin([4,8]),:]
playstore
# Each line in its own column
个人日记网站源码sns.set_style("white")
gridobj = sns.lmplot(x="displ", y="hwy",
data=df_select,
height=7,
robust=True,
palette='Set1',
col="cyl",
scatter_kws=dict(s=60, linewidths=.7, edgecolors='black'))
# Decorations
gridobj.set(xlim=(0.5,7.5), ylim=(0,50))
plt.show()

版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系QQ:729038198,我们将在24小时内删除。