python 处理文本文件--688IT编程网

【转】python 处理文本文件

2010-06-01 11:10:53| 分类： Python | 标签：python cookbook |字号订阅

1.从文件读取文本或数据

一次将文件内容读入一个长字符串的最简便方法

如：all_the_text = open('').read( ) # 文本文件的全部文本

all_the_data = open('abinfile', 'rb').read( ) # 2进制文件的全部数据

更好的方法是将文件对象和一个变量绑定，可以及时关闭文件。比如，读取文本文件内容：

如：file_object = open('') # 打开文件

all_the_text = ad( ) # 文本文件的全部文本

file_object.close( ) # 使用完毕，关闭文件

将文本文件的全部内容按照分行作为一个list读出有5种方法：

list_of_all_the_lines = adlines( ) # 方法 1

list_of_all_the_lines = ad( ).splitlines(1) # 方法 2

list_of_all_the_lines = ad().splitlines( ) # 方法 3

list_of_all_the_lines = ad( ).split('\n') # 方法 4

list_of_all_the_lines = list(file_object) # 方法 5

2.文件中写入文本或2进制数据

将一个(大)字符串写入文件的最简单的方法如下:

如：open('', 'w').write(all_the_text) # 写入文本到文本文件

open('abinfile', 'wb').write(all_the_data) # 写入数据到2进制文件

更好的方法是将文件对象和一个变量绑定，可以及时关闭文件。比如，文本文件写入内容：

如：file_object = open('', 'w')

file_object.write(all_the_text)

file_object.close( )

写入文件的内容更多时不是一个大字符串，而是一个字符串的list(或其他序列)，这时应该使用writelines方法(此方法同样适用于2进制文件的写操作)

如：file_object.writelines(list_of_text_strings)

open('abinfile', 'wb').writelines(list_of_data_strings)

3.读取文件的指定一行

如：import linecache

#thefiepath 文件路径

#desired_line_number 整数，文件的特定行

theline = line(thefilepath, desired_line_number)

4.需要统计文件的行数

如：count = len(open(thefilepath).readlines( )) #方法1

count = 0 #方法2

for line in open(thefilepath).xreadlines( ): count += 1

5.读取INI配置文件

import ConfigParser

import string

_ConfigDefault = {

"database.dbms": "mysql",

"database.name": "",

"database.user": "root",

"database.password": "",

"database.host": "127.0.0.1"

}

def LoadConfig(file, config={}):

"""

returns a dictionary with keys of the form

<section>.<option> and the corresponding values

"""

#返回一个字典，格式如下: key: <section>.option>

value : 对应的值

config = py( )

cp = ConfigParser.ConfigParser( )

for sec in cp.sections( ):

name = string.lower(sec)

for opt in cp.options(sec):

config[name + "." + string.lower(opt)] = string.strip(

<(sec, opt))

return config

if _ _name_ _=="_ _main_ _":

print LoadConfig("some.ini", _ConfigDefault)

6.有ZIP压缩文件，不需要解压，直接检查其包含的部分或全部文件信息

如：import zipfile

z = zipfile.ZipFile("zipfile.zip", "r")

for filename in z.namelist( ):

print 'File:', filename,

bytes = z.read(filename)

print 'has',len(bytes),'bytes'

7.分解出文件路径所有组成部分

如：

import os, sys

def splitall(path):

allparts = []

while 1:

parts = os.path.split(path)

if parts[0] == path: # sentinel for absolute paths #绝对路径的哨兵

allparts.insert(0, parts[0])

break

elif parts[1] == path: # sentinel for relative paths #相对路径的哨兵

allparts.insert(0, parts[1])

break

else: #处理其余部分

path = parts[0]

allparts.insert(0, parts[1])

return allparts

8.遍历检查目录, 或者遍历以某目录为根目录的完整的目录树，获取符合特定模式的全部文件

如：

import os.path, fnmatch

def listFiles(root, patterns='*', recurse=1, return_folders=0):

# Expand patterns from semicolon-separated string to list

pattern_list = patterns.split(';')

# Collect input and output arguments into one bunch

class Bunch:

def _ _init_ _(self, **kwds): self._ _dict_ _.update(kwds)

arg = Bunch(recurse=recurse, pattern_list=pattern_list,

return_folders=return_folders, results=[])

def visit(arg, dirname, files):

# Append sults all relevant files (and perhaps folders)

for name in files:

fullname = path(os.path.join(dirname, name)) #目录规范化

urn_folders or os.path.isfile(fullname): #判断是否返回目录。是否是文件

for pattern in arg.pattern_list: #模式匹配用 "or" ，符合一个就ok

if fnmatch.fnmatch(name, pattern):

break

# Block recursion if recursion was disallowed

if urse: files[:]=[] #把list中目录包含的文件/子目录置空，子目录没了哈

os.path.walk(root, visit, arg)

sults

9.给定搜索路径(分隔符分开的欲搜索路径组成的字符串)，查第一个

python怎么读取文件中的数据名称符合的文件

如：

import os, string

def search_file(filename, search_path, pathsep=os.pathsep):

""" Given a search path, find file with requested name """

for path in string.split(search_path, pathsep):

candidate = os.path.join(path, filename)

if ists(candidate): return os.path.abspath(candidate)

return None

if _ _name_ _ == '_ _ _main_ _':

search_path = '/bin' + os.pathsep + '/usr/bin' # ; on Windows, : on Unix

find_file = search_file('ls',search_path)

if find_file:

print "File found at %s" % find_file

else:

print "File not found"

10.在更新文件前，需要进行备份 : 按照标准协议是在原文件末尾加3位数字作为版本号来备份原文件

如：

def VersionFile(file_spec, vtype='copy'):

import os, shutil

if os.path.isfile(file_spec):

# or, do other error checking:

if vtype not in 'copy', 'rename':

vtype = 'copy'

# Determine root filename so the extension doesn't get longer

n, e = os.path.splitext(file_spec)

# Is e an integer?

try:

num = int(e)

root = n

except ValueError:

root = file_spec

# Find next available file version

for i in xrange(1000):

new_file = '%s.%03d' % (root, i)

if not os.path.isfile(new_file):

if vtype == 'copy':

else:

return 1

return 0

if _ _name_ _ == '_ _main_ _':

# test code (you will need a file )

print VersionFile('')

688IT编程网

python 处理文本文件

发表评论

推荐文章

java正则表达式选择题

一种基于正则表达式的DBC文件解析及报文分析方法[发明专利]

工龄小数点提取

非零金额正则表达式

提取文本中数字的函数

热门文章

利用正则表达式实现文本数据提取与处理

正则表达式零宽断言详解

文本匹配规则

excel中使用正则

1-31正则表达式

anki之高级筛选

BUAA_OO_2021_第一单元总结

insert语句递增写法

sublime text 3在行前插入递增数字序号的方法

字符串只允许数字和英文的正则

powerbuilder 正则表达式

Shell脚本编写的高级技巧利用正则表达式进行字符串匹配

JAVA正则表达式的三种模式:贪婪,勉强和占有的讨论

go regexp匹配规则

oracle regexp_substr 实现原理

基本的元字符回溯引用和前后查匹配模式

elasticsearch query dsl正则

oracle sql正则表达式

GA-设置目标

仅匹配全角片假名的正则表达式

最新文章

java正则表达式选择题

工龄小数点提取

非零金额正则表达式

提取文本中数字的函数

vue数字相加小数点变长-概述说明以及解释

vue validate 正则验证小数长度

标签列表

688IT编程网

python 处理文本文件

发表评论

推荐文章

java正则表达式 选择题

一种基于正则表达式的DBC文件解析及报文分析方法[发明专利]

工龄小数点提取

非零金额 正则表达式

提取文本中数字的函数

热门文章

利用正则表达式实现文本数据提取与处理

正则表达式零宽断言详解

文本匹配规则

excel中使用正则

1-31正则表达式

anki之高级筛选

BUAA_OO_2021_第一单元总结

insert语句递增写法

sublime text 3在行前插入递增数字序号的方法

字符串只允许数字和英文的正则

powerbuilder 正则表达式

Shell脚本编写的高级技巧利用正则表达式进行字符串匹配

JAVA正则表达式的三种模式:贪婪,勉强和占有的讨论

go regexp匹配规则

oracle regexp_substr 实现原理

基本的元字符 回溯引用和前后查 匹配模式

elasticsearch query dsl正则

oracle sql正则表达式

GA-设置目标

仅匹配全角片假名的正则表达式

最新文章

java正则表达式 选择题

工龄小数点提取

非零金额 正则表达式

提取文本中数字的函数

vue数字相加小数点变长-概述说明以及解释

vue validate 正则验证小数长度

标签列表

java正则表达式选择题

非零金额正则表达式

基本的元字符回溯引用和前后查匹配模式

java正则表达式选择题

非零金额正则表达式