Pandas之read_excel()和to_excel()函数解析--688IT编程网

Pandas之read_excel（）和to_excel（）函数解析

read_excel()

加载函数为read_excel()，其具体参数如下。

read_excel(io, sheetname=0, header=0, skiprows=None, skip_footer=0, index_col=None,names=None, parse_cols=None, parse_dates=False,date_parser=None,na_values=None,thousands=None, co 常⽤参数解析：

io : string, path object ; excel 路径。

sheetname : string, int, mixed list of strings/ints, or None, default 0 返回多表使⽤sheetname=[0,1]，若sheetname=None是返回全表注意：int/string 返回的是dataframe，⽽

none和list返回的是dict of dataframe

header : int, list of ints, default 0 指定列名⾏，默认0，即取第⼀⾏，数据为列名⾏以下的数据若数据不含列名，则设定 header = None

skiprows : list-like,Rows to skip at the beginning，省略指定⾏数的数据

skip_footer : int,default 0, 省略从尾部数的int⾏数据

index_col : int, list of ints, default None指定列为索引列，也可以使⽤u”strings”

names : array-like, default None, 指定列的名字。

数据源：

sheet1:

ID NUM-1 NUM-2 NUM-3

36901 142 168 661

36902 78 521 602

36903 144 600 521

36904 95 457 468

36905 69 596 695

sheet2：

ID NUM-1 NUM-2 NUM-3

36906 190 527 691

36907 101 403 470

(1)函数原型

basestation ="F://pythonBook_PyPDAM/data/test.xls"

data = pd.read_excel(basestation)

print data

输出：是⼀个dataframe

ID NUM-1 NUM-2 NUM-3

0 36901 142 168 661

1 3690

2 78 521 602

2 3690

3 14

4 600 521

3 3690

4 9

5 457 468

4 3690

5 69 59

6 695

(2) sheetname参数：返回多表使⽤sheetname=[0,1]，若sheetname=None是返回全表注意：int/string 返回的是dataframe，⽽none和list返回的是dict of dataframe

data_1 = pd.read_excel(basestation,sheetname=[0,1])

print data_1

print type(data_1)

输出：dict of dataframe

OrderedDict([(0, ID NUM-1 NUM-2 NUM-3

0 36901 142 168 661

1 3690

2 78 521 602

2 3690

3 14

4 600 521

3 3690

4 9

5 457 468

4 3690

5 69 59

6 695),

(1, ID NUM-1 NUM-2 NUM-3

0 36906 190 527 691

1 36907 101 403 470)])

(3)header参数：指定列名⾏，默认0，即取第⼀⾏，数据为列名⾏以下的数据若数据不含列名，则设定 header = None ，注意这⾥还有列名的⼀⾏。

data = pd.read_excel(basestation,header=None)

print data

输出：

0 1 2 3

0 ID NUM-1 NUM-2 NUM-3

1 36901 14

2 168 661

2 36902 78 521 602

3 36903 14

4 600 521

4 36904 9

5 457 468

5 36905 69 59

6 695

data = pd.read_excel(basestation,header=[3])

print data

输出：

36903 144 600 521

0 36904 95 457 468

1 36905 69 596 695

(4)skiprows 参数：省略指定⾏数的数据

data = pd.read_excel(basestation,skiprows = [1])

print data

输出:

ID NUM-1 NUM-2 NUM-3

0 36902 78 521 602

1 36903 144 600 521

2 36904 95 457 468

3 36905 69 596 695

(5)skip_footer参数：省略从尾部数的int⾏的数据

data = pd.read_excel(basestation, skip_footer=3)

print data

输出：

ID NUM-1 NUM-2 NUM-3

0 36901 142 168 661

1 3690

2 78 521 602

(6)index_col参数：指定列为索引列，也可以使⽤u”strings”

data = pd.read_excel(basestation, index_col="NUM-3")

print data

输出：

ID NUM-1 NUM-2

NUM-3

661 36901 142 168

602 36902 78 521

521 36903 144 600

468 36904 95 457

695 36905 69 596

(7)names参数：指定列的名字。

data = pd.read_excel(basestation,names=["a","b","c","e"])

print data

a b c e

0 36901 142 168 661

1 3690

2 78 521 602

2 3690

3 14

4 600 521

3 3690

4 9

5 457 468

4 3690

5 69 59

6 695

具体参数如下

>>> print ad_excel)

Help on function read_excel in module l:

read_excel(io, sheetname=0, header=0, skiprows=None, skip_footer=0, index_col=None, names=None, parse_cols=None, parse_dates=False, date_parser=None, na_values=None, thousands=None, Read an Excel table into a pandas DataFrame

Parameters

----------

io : string, path object (pathlib.Path or py._path.local.LocalPath),

file-like object, pandas ExcelFile, or xlrd workbook.

The string could be a URL. Valid URL schemes include http, ftp, s3,

and file. For file URLs, a host is expected. For instance, a local

file could be file://localhost/path/to/workbook.xlsx

sheetname : string, int, mixed list of strings/ints, or None, default 0

Strings are used for sheet names, Integers are used in zero-indexed

sheet positions.

Lists of strings/integers are used to request multiple sheets.

Specify None to get all sheets.

str|int -> DataFrame is returned.

list|None -> Dict of DataFrames is returned, with keys representing

sheets.

Available Cases

* Defaults to 0 -> 1st sheet as a DataFrame

* 1 -> 2nd sheet as a DataFrame

* "Sheet1" -> 1st sheet as a DataFrame

* [0,1,"Sheet5"] -> 1st, 2nd & 5th sheet as a dictionary of DataFrames

* None -> All sheets as a dictionary of DataFrames

header : int, list of ints, default 0

Row (0-indexed) to use for the column labels of the parsed

DataFrame. If a list of integers is passed those row positions will

be combined into a ``MultiIndex``

skiprows : list-like

Rows to skip at the beginning (0-indexed)

skip_footer : int, default 0

Rows at the end to skip (0-indexed)

index_col : int, list of ints, default None

Column (0-indexed) to use as the row labels of the DataFrame.

Pass None if there is no such column. If a list is passed,

those columns will be combined into a ``MultiIndex``. If a

subset of data is selected with ``parse_cols``, index_col

is based on the subset.

names : array-like, default None

List of column names to use. If file contains no header row,

then you should explicitly pass header=None

converters : dict, default None

Dict of functions for converting values in certain columns. Keys can

either be integers or column labels, values are functions that take one

input argument, the Excel cell content, and return the transformed

content.

dtype : Type name or dict of column -> type, default None

Data type for data or columns. E.g. {'a': np.float64, 'b': np.int32}

Use `object` to preserve data as stored in Excel and not interpret dtype.

If converters are specified, they will be applied INSTEAD

of dtype conversion.

.. versionadded:: 0.20.0

true_values : list, default None

Values to consider as True

.. versionadded:: 0.19.0

false_values : list, default None

Values to consider as False

.. versionadded:: 0.19.0

parse_cols : int or list, default None

* If None then parse all columns,

* If int then indicates last column to be parsed

* If list of ints then indicates list of column numbers to be parsed

* If string then indicates comma separated list of Excel column letters and

column ranges (e.g. "A:E" or "A,C,E:F"). Ranges are inclusive of

both sides.

squeeze : boolean, default False

If the parsed data only contains one column then return a Series

na_values : scalar, str, list-like, or dict, default None

Additional strings to recognize as NA/NaN. If dict passed, specific

per-column NA values. By default the following values are interpreted

as NaN: '', '#N/A', '#N/A N/A', '#NA', '-1.#IND', '-1.#QNAN', '-NaN', '-nan',

'1.#IND', '1.#QNAN', 'N/A', 'NA', 'NULL', 'NaN', 'nan'.

thousands : str, default None

Thousands separator for parsing string columns to numeric. Note that

this parameter is only necessary for columns stored as TEXT in Excel,

any numeric columns will automatically be parsed, regardless of display

format.

keep_default_na : bool, default True

If na_values are specified and keep_default_na is False the default NaN

values are overridden, otherwise they're appended to.

verbose : boolean, default False

Indicate number of NA values placed in non-numeric columns

engine: string, default None

If io is not a buffer or path, this must be set to identify io.

Acceptable values are None or xlrd

convert_float : boolean, default True

convert integral floats to int (i.e., 1.0 --> 1). If False, all numeric

data will be read in as floats: Excel stores all numbers as floats

internally

has_index_names : boolean, default None

DEPRECATED: for version 0.17+ index names will be automatically

inferred based on index_col. To read Excel output from 0.16.2 and

prior that had saved index names, use True.

Returns

to_excel()

存储函数为_excel()，注意，必须是DataFrame写⼊excel, 即Write DataFrame to an excel sheet。其具体参数如下：

to_excel(self, excel_writer, sheet_name='Sheet1', na_rep='', float_format=None,columns=None, header=True, index=True, index_label=None,startrow=0, startcol=0, engine=None, merge_cells=True, en inf_rep='inf', verbose=True, freeze_panes=None)

object to

常⽤参数解析

excel_writer : string or ExcelWriter object File path or existing ExcelWriter⽬标路径

sheet_name : string, default ‘Sheet1’ Name of sheet which will contain DataFrame，填充excel的第⼏页

na_rep : string, default ”,Missing data representation 缺失值填充

float_format : string, default None Format string for floating point numbers

columns : sequence, optional，Columns to write 选择输出的的列。

header : boolean or list of string, default True Write out column names. If a list of string is given it is assumed to be aliases for the column names

index : boolean, default True，Write row names (index)

index_label : string or sequence, default None， Column label for index column(s) if desired. If None is given, andheader and index are True, then the index names are

used. A sequence should be given if the DataFrame uses MultiIndex.

startrow :upper left cell row to dump data frame

startcol :upper left cell column to dump data frame

engine : string, default None ，write engine to use - you can also set this via the options，io.excel.xlsx.writer, io.excel.xls.writer, l.xlsm.writer.

merge_cells : boolean, default True Write MultiIndex and Hierarchical Rows as merged cells.

encoding: string, default None encoding of the resulting excel file. Only necessary for xlwt,other writers support unicode natively.

inf_rep : string, default ‘inf’ Representation for infinity (there is no native representation for infinity in Excel)

freeze_panes : tuple of integer (length 2), default None Specifies the one-based bottommost row and rightmost column that is to be frozen

数据源：

ID NUM-1 NUM-2 NUM-3

0 36901 142 168 661

1 3690

2 78 521 602

2 3690

3 14

4 600 521

3 3690

4 9

5 457 468

4 3690

5 69 59

6 695

5 3690

6 165 453

加载数据：

basestation ="F://python/data/test.xls"

basestation_end ="F://python/data/test_end.xls"

data = pd.read_excel(basestation)

(1)参数excel_writer，输出路径。

<_excel(basestation_end)

输出：

ID NUM-1 NUM-2 NUM-3

0 36901 142 168 661

1 3690

2 78 521 602

2 3690

3 14

4 600 521

3 3690

4 9

5 457 468

4 3690

5 69 59

6 695

5 3690

6 165 453

(2)sheet_name，将数据存储在excel的那个sheet页⾯。

<_excel(basestation_end,sheet_name="sheet2")

(3)na_rep，缺失值填充

<_excel(basestation_end,na_rep="NULL")

输出：

ID NUM-1 NUM-2 NUM-3

0 36901 142 168 661

1 3690

2 78 521 602

2 3690

3 14

4 600 521

3 3690

4 9

5 457 468

4 3690

5 69 59

6 695

5 3690

6 165 453 NULL

(4)colums参数： sequence, optional，Columns to write 选择输出的的列。

<_excel(basestation_end,columns=["ID"])

输出

0 36901

1 36902

2 36903

3 36904

4 36905

5 36906

(5)header 参数： boolean or list of string，默认为True，可以⽤list命名列的名字。header = False 则不输出题头_excel(basestation_end,header=["a","b","c","d"])

输出：

a b c d

0 36901 142 168 661

1 3690

2 78 521 602

2 3690

3 14

4 600 521

3 3690

4 9

5 457 468

4 3690

5 69 59

6 695

5 3690

6 165 453

<_excel(basestation_end,header=False,columns=["ID"])

header = False 则不输出题头

输出：

0 36901

1 36902

2 36903

3 36904

4 36905

5 36906

(6)index : boolean, default True Write row names (index)

默认为True，显⽰index，当index=False 则不显⽰⾏索引（名字）。

index_label : string or sequence, default None

设置索引列的列名。

<_excel(basestation_end,index=False)

输出：

ID NUM-1 NUM-2 NUM-3

36901 142 168 661

36902 78 521 602

36903 144 600 521

36904 95 457 468

36905 69 596 695

36906 165 453

<_excel(basestation_end,index_label=["f"])

输出：

f ID NUM-1 NUM-2 NUM-3

0 36901 142 168 661

1 3690

2 78 521 602

2 3690

3 14

4 600 521

3 3690

4 9

5 457 468

4 3690

5 69 59

6 695

5 3690

6 165 453

688IT编程网

Pandas之read_excel()和to_excel()函数解析

发表评论

推荐文章

一种基于正则表达式的DBC文件解析及报文分析方法[发明专利]

工龄小数点提取

非零金额正则表达式

提取文本中数字的函数

vue数字相加小数点变长-概述说明以及解释

热门文章

正则表达式零宽断言详解

文本匹配规则

excel中使用正则

1-31正则表达式

anki之高级筛选

BUAA_OO_2021_第一单元总结

insert语句递增写法

sublime text 3在行前插入递增数字序号的方法

字符串只允许数字和英文的正则

powerbuilder 正则表达式

Shell脚本编写的高级技巧利用正则表达式进行字符串匹配

JAVA正则表达式的三种模式:贪婪,勉强和占有的讨论

go regexp匹配规则

oracle regexp_substr 实现原理

基本的元字符回溯引用和前后查匹配模式

elasticsearch query dsl正则

oracle sql正则表达式

GA-设置目标

仅匹配全角片假名的正则表达式

beautifulsoupfind_all怎样把带有某种属性的标签选出而不含该属性的标 ...

最新文章

工龄小数点提取

非零金额正则表达式

提取文本中数字的函数

vue数字相加小数点变长-概述说明以及解释

vue validate 正则验证小数长度

0.5的倍数的正则表达式

标签列表

688IT编程网

Pandas之read_excel()和to_excel()函数解析

发表评论

推荐文章

一种基于正则表达式的DBC文件解析及报文分析方法[发明专利]

工龄小数点提取

非零金额 正则表达式

提取文本中数字的函数

vue数字相加小数点变长-概述说明以及解释

热门文章

正则表达式零宽断言详解

文本匹配规则

excel中使用正则

1-31正则表达式

anki之高级筛选

BUAA_OO_2021_第一单元总结

insert语句递增写法

sublime text 3在行前插入递增数字序号的方法

字符串只允许数字和英文的正则

powerbuilder 正则表达式

Shell脚本编写的高级技巧利用正则表达式进行字符串匹配

JAVA正则表达式的三种模式:贪婪,勉强和占有的讨论

go regexp匹配规则

oracle regexp_substr 实现原理

基本的元字符 回溯引用和前后查 匹配模式

elasticsearch query dsl正则

oracle sql正则表达式

GA-设置目标

仅匹配全角片假名的正则表达式

beautifulsoupfind_all怎样把带有某种属性的标签选出而不含该属性的标 ...

最新文章

工龄小数点提取

非零金额 正则表达式

提取文本中数字的函数

vue数字相加小数点变长-概述说明以及解释

vue validate 正则验证小数长度

0.5的倍数的正则表达式

标签列表

非零金额正则表达式

基本的元字符回溯引用和前后查匹配模式

非零金额正则表达式