pythonpandas中⽂件的读写——read_csv()读取⽂件read_csv()读取⽂件
1.python读取⽂件的⼏种⽅式
read_csv 从⽂件,url,⽂件型对象中加载带分隔符的数据。默认分隔符为逗号
read_table 从⽂件,url,⽂件型对象中加载带分隔符的数据。默认分隔符为制表符(“\t”)
read_fwf 读取定宽列格式数据(也就是没有分隔符)
read_cliboard 读取剪切板中的数据,可以看做read_table的剪切板。在将⽹页转换为表格时很有⽤
2.读取⽂件的简单实现
程序代码:
ad_csv('D:/project/python_instruct/test_data1.csv')
print('⽤read_csv读取的csv⽂件:', df)
ad_table('D:/project/python_instruct/test_data1.csv', sep=',')
print('⽤read_table读取csv⽂件:', df)
ad_csv('D:/project/python_instruct/test_data2.csv', header=None)
print('⽤read_csv读取⽆标题⾏的csv⽂件:', df)
ad_csv('D:/project/python_instruct/test_data2.csv', names=['a', 'b', 'c', 'd', 'message'])
print('⽤read_csv读取⾃定义标题⾏的csv⽂件:', df)
names=['a', 'b', 'c', 'd', 'message']
ad_csv('D:/project/python_instruct/test_data2.csv', names=names, index_col='message')
print('read_csv读取时指定索引:', df)
ad_csv('D:/project/python_instruct/test_data3.csv', index_col=['key1', 'key2'])
print('read_csv将多个列做成⼀个层次化索引:')
print(parsed)
print(list(open('D:/project/python_instruct/')))
ad_table('D:/project/python_instruct/', sep='\s+')
print('read_table利⽤正则表达式处理⽂件读取:')
print(result)
输出结果:
⽤read_csv读取的csv⽂件:
a b c d message
0 1 2 3 4 hello
1 5 6 7 8 world
2 9 10 11 12 foo
⽤read_table读取csv⽂件:
a b c d message
0 1 2 3 4 hello
1 5 6 7 8 world
2 9 10 11 12 foo
⽤read_csv读取⽆标题⾏的csv⽂件:
0 1 2 3 4
0 1 2 3 4 hello
1 5 6 7 8 world
2 9 10 11 12 foo
⽤read_csv读取⾃定义标题⾏的csv⽂件:
a b c d message
0 1 2 3 4 hello
1 5 6 7 8 world
2 9 10 11 12 foo
read_csv读取时指定索引:
a b c d
message
hello 1 2 3 4
world 5 6 7 8
foo 9 10 11 12
read_csv将多个列做成⼀个层次化索引:
value1 value2
key1 key2
one a 1 2
b 3 4
c 5 6
d 7 8
two a 9 10
b 11 12
c 13 14
d 15 16
[' A B C \n', 'aaa -0.26 -0.1 -0.4\n', 'bbb -0.92 -0.4 -0.7\n', 'ccc -0.34 -0.5 -0.8\n', 'ddd -0.78 -0.3 -0.2'] read_table利⽤正则表达式处理⽂件读取:
A B C
aaa -0.26 -0.1 -0.4
bbb -0.92 -0.4 -0.7
ccc -0.34 -0.5 -0.8
ddd -0.78 -0.3 -0.2
3分块读取⼤型数据集
先看代码:
ad_csv('D:\project\python_instruct\')
print('原始⽂件:', result)
输出:
Traceback (most recent call last):
File "<ipython-input-5-6eb71b2a5e94>", line 1, in <module>
runfile('D:/project/python_instruct/Test.py', wdir='D:/project/python_instruct')
File "D:\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 866, in runfile
execfile(filename, namespace)
File "D:\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 102, in execfile
exec(ad(), filename, 'exec'), namespace)
File "D:/project/python_instruct/Test.py", line 75, in <module>
ad_csv('D:\project\python_instruct\')
File "D:\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 562, in parser_f
return _read(filepath_or_buffer, kwds)
File "D:\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 325, in _read
ad()
python怎么读文件夹下的文件夹
File "D:\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 815, in read
ret = self._ad(nrows)
File "D:\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 1314, in read
data = self._ad(nrows)
File "pandas\parser.pyx", line 805, in pandas.ad (pandas\parser.c:8748)
File "pandas\parser.pyx", line 827, in pandas.parser.TextReader._read_low_memory (pandas\parser.c:9003) File "pandas\parser.pyx", line 881, in pandas.parser.TextReader._read_rows (pandas\parser.c:9731)
File "pandas\parser.pyx", line 868, in pandas.parser.TextReader._tokenize_rows (pandas\parser.c:9602)
File "pandas\parser.pyx", line 1865, in pandas.parser.raise_parser_error (pandas\parser.c:23325) CParserError: Error tokenizing data. C error: out of memory
发现数据集⼤得已经超出内存。我们可以读取⼏⾏看看,如前10⾏:
ad_csv('D:\project\python_instruct\', nrows=10)
print('只读取⼏⾏:')
print(result)
输出结果:
0 0\t296\t3\t1\t10\t1\t12\t1\t13\t1\t14\t1\
1 1\t271\t8\t1\t17\t1\t22\t1\t31\t0\t34\
2 2\t158\t0\t0\t5\t1\t10\t1\t11\t1\t13\t1\
3 3\t413\t0\t1\t5\t1\t194\t1\t354\t1\t3462\
4 4\t142\t1\t0\t5\t1\t7\t1\t11\t1\t14\t1\t18\t1\...
5 5\t272\t2\t1\t3\t1\t4\t1\t12\t1\t13\t1\t14\t1\...
6 6\t59\t9\t1\t13\t1\t46991\t0\t66930\t0\t85672\...
7 7\t131\t4\t1\t11\t1\t20\t1\t24\t1\t26\t0\
8 8\t326\t0\t0\t1\t1\t12\t1\t13\t1\t17\t1\
9 9\t12\t0\t0\t6\t1\t10\t1\t13\t1\t18\
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系QQ:729038198,我们将在24小时内删除。
发表评论