series合并成dataframe_【S01E04】pandas之合并数据集
I.  数据库风格的合并——merge
i)  最简单的合并
<(df1, df2, on='key')  key为重叠列名
ii)  连接键列名不同
<(left, right, left_on='lkey', right_on='rkey')
iii) 连接⽅式(默认为inner)
<(left, right, on='key', how='outer')
iv)  连接键为多列
<(left, right, on=['key1','key2'])
v)  重复列名的处理
<(left, right, on='key', suffixes=['_left','_right'])
vi)  索引上的合并(索引作为连接键)
<(left, right, left_on='key', right_index=True)
II.  按索引合并——join
i)join实例⽅法实现按索引合并
left.join(right, how='outer')
ii)【参数DataFrame的索引】跟【调⽤者DataFrame的某个列】之间的连接
left.join(right, on='key')
iii)join⽅法合并多个DataFrame
df1.join([df2,df3], how='outer', sort=True)
III. 轴向连接——concat⽅法
i)  Series连接(axis=0)
ii)  Series连接(axis=1)
iii) 连接⽅式(默认join='outer')
iv)  指定在⾮连接轴上使⽤的索引
v)  区分连接⽚段
names=['level0', 'level1'])
vi)  抛弃⽆关⾏索引
IV.  合并重叠数据——combine_first()
df1bine_first(df2)
V.  df末尾追加数据——append
<( )可根据⼀个或多个键将不同DataFrame中的⾏连接起来。(类似数据库的连接操作,merge默认做的是"inner"连接,join默认做的是"left"连接)
实例⽅法combine_first( )可以将重复数据编接在⼀起,⽤⼀个对象中的值填充另⼀个对象中的值。
I. 数据库风格的合并——merge
Merge DataFrame objects by performing a database-style join operation by columns or indexes.
merge(left, right, how='inner', on=None, left_on=None, right_on=None,
left_index=False, right_index=False, sort=False, suffixes=('_x', '_y'),
copy=True, indicator=False, validate=None)
left : DataFrame
right : DataFrame
how : {'left', 'right', 'outer', 'inner'}, default 'inner'
* left: use only keys from left frame, similar to a SQL left outer join;
preserve key order
* right: use only keys from right frame, similar to a SQL right outer join;
preserve key order
* outer: use union of keys from both frames, similar to a SQL full outer
join; sort keys lexicographically
* inner: use intersection of keys from both frames, similar to a SQL inner          join; preserve the order of the left keys
on : label or list
Column or index level names to join on. These must be found in both
DataFrames. If `on` is None and not merging on indexes then this defaults        to the intersection of the columns in both DataFrames.
left_on : label or list, or array-like多表left join
Column or index level names to join on in the left DataFrame. Can also
be an array or list of arrays of the length of the left DataFrame.
These arrays are treated as if they are columns.
right_on : label or list, or array-like
Column or index level names to join on in the right DataFrame. Can also        be an array or list of arrays of the length of the right DataFrame.
These arrays are treated as if they are columns.
left_index : boolean, default False
Use the index from the left DataFrame as the join key(s). If it is a
MultiIndex, the number of keys in the other DataFrame (either the index        or a number of columns) must match the number of levels
right_index : boolean, default False
Use the index from the right DataFrame as the join key. Same caveats as        left_index
sort : boolean, default False
Sort the join keys lexicographically in the result DataFrame. If False,
the order of the join keys depends on the join type (how keyword)
suffixes : 2-length sequence (tuple, list, ...)
Suffix to apply to overlapping column names in the left and right
side, respectively
copy : boolean, default True
If False, do not copy data unnecessarily
重点参数
left
right
how
how
on
left_on
right_on
left_index
right_index
即数据、连接⽅式、连接键
数据集的合并(merge)或连接(join)运算是通过⼀个或多个键将⾏连接起来的。这些运算是关系型数据库的核⼼
i) 最简单的合并
最简单的连接(如果没有显式指定连接键,merge默认将重叠的列名当作键)
最好还是指定连接键:
ii) 连接键列名不同
如果在左右DataFrame中作为连接键的列有不同的列名,或者说左侧DataFrame中⽤作连接键的列与右侧DataFrame中⽤作连接键的列不同,可以⽤left_on和(或)right_on关键字(分别)显⽰指定
iii) 连接⽅式(默认为inner)
上⾯结果中没有c、d及与之相关的数据,因为merge默认做的是"inner"连接,结果中的键是交集。如果需要其他连接⽅式,⽤how关键字显式指定。
how='outer'
how='left'

版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系QQ:729038198,我们将在24小时内删除。