python⽂件分块读取_Python多进程分块读取超⼤⽂件的⽅法本⽂实例讲述了Python多进程分块读取超⼤⽂件的⽅法。分享给⼤家供⼤家参考,具体如下:
读取超⼤的⽂本⽂件,使⽤多进程分块读取,将每⼀块单独输出成⽂件
# -*- coding: GBK -*-
import urlparse
import datetime
import os
from multiprocessing import Process,Queue,Array,RLock
"""
多进程分块读取⽂件
"""
WORKERS = 4
BLOCKSIZE = 100000000
FILE_SIZE = 0
def getFilesize(file):
"""
获取要读取⽂件的⼤⼩
"""
global FILE_SIZE
fstream = open(file,'r')
python怎么读取py文件fstream.seek(0,os.SEEK_END)
FILE_SIZE = ll()
fstream.close()
def process_found(pid,array,file,rlock):
global FILE_SIZE
global JOB
global PREFIX
"""
进程处理
Args:
pid:进程编号
array:进程间共享队列,⽤于标记各进程所读的⽂件块结束位置
file:所读⽂件名称
各个进程先从array中获取当前最⼤的值为起始位置startpossition
结束的位置endpossition (startpossition+BLOCKSIZE) if (startpossition+BLOCKSIZE)
if startpossition==FILE_SIZE则进程结束
if startpossition==0则从0开始读取
if startpossition!=0为防⽌⾏被block截断的情况,先读⼀⾏不处理,从下⼀⾏开始正式处理
if 当前位置 <=endpossition 就readline
否则越过边界,就从新查array中的最⼤值
"""
fstream = open(file,'r')
while True:
rlock.acquire()
print 'pid%s'%pid,','.join([str(v) for v in array])
startpossition = max(array)
endpossition = array[pid] = (startpossition+BLOCKSIZE) if (startpossition+BLOCKSIZE) lease()
if startpossition == FILE_SIZE:#end of the file
print 'pid%s end'%(pid)
break
elif startpossition !=0:
fstream.seek(startpossition)
pos = ss = ll()
ostream = open('/data/download/tmp_pid'+str(pid)+'_jobs'+str(endpossition),'w')
while pos
#处理line
line = adline()
ostream.write(line)
pos = ll()
print 'pid:%s,startposition:%s,endposition:%s,pos:%s'%(pid,ss,pos,pos)
ostream.flush()
ostream.close()
ee = ll()
fstream.close()
def main():
global FILE_SIZE
print w().strftime("%Y/%d/%m %H:%M:%S") file = "/data/pds/download/scmcc_log/tmp_format_2011004.log" getFilesize(file)
print FILE_SIZE
rlock = RLock()
array = Array('l',WORKERS,lock=rlock)
threads=[]
for i in range(WORKERS):
p=Process(target=process_found, args=[i,array,file,rlock]) threads.append(p)
for i in range(WORKERS):
threads[i].start()
for i in range(WORKERS):
threads[i].join()
print w().strftime("%Y/%d/%m %H:%M:%S") if __name__ == '__main__':
main()
希望本⽂所述对⼤家Python程序设计有所帮助。
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系QQ:729038198,我们将在24小时内删除。
发表评论