Tensorflow:文件读写--688IT编程网

Tensorflow：⽂件读写

写⽂件

这⾥的写, 指的是把各种格式的数据(如字符, 图⽚等)统⼀转换成Tensorflow的标准⽀持格式TFRecord.

TFRecord是输⼊数据统⼀管理的格式, 它其实是⼀种⼆进制⽂件.

写⼊:

通过将数据填⼊到tf.train.Example类, Example的protocol buffer包含了字段的tf.train.Features, 使⽤数据修改Features, 实现将protocol buffer序列化成⼀个字符串, 再通过tf.python_io.TFRecordWriter类将序列化的字符串写⼊到TFRecord中.

读出:

使⽤tf.TFRecordReader读取器, 通过tf.parse_single_example解析器解析, parse_single_example操作可以将Example protocol buffer解析为张量, 然后⽤解码器 tf.decode_raw解码.

相关类/函数

tf.python_io.TFRecordWriter

把记录写⼊到TFRecords⽂件的类.

init()

'''

作⽤: 创建⼀个TFRecordWriter对象, 将数据记录到指定的TFRecord⽂件中.

参数:

path: (must)TFRecords⽂件的路径;

options: TFRecordOptions对象;

'''

write()

'''

作⽤: 将⼀条序列化字符串记录写⼊到⽂件中.

参数:

record: string, 序列化字符串记录.

'''

close()

'''

作⽤: 关闭TFRecordWriter.

'''

Example是使⽤某种规则规则化后的数据, 通过使⽤TFRecordWriter写⼊到TFRecord中.

Example包含⼀个键值对数据结构(与dict相同), 使⽤属性features记录, 因此, 初始化时必须传⼊这个features参数, 它是⼀个tf.train.Features对象.

init()

'''

作⽤: 初始化⼀个Example.

参数:

features: tf.train.Features对象, 其中每条记录的key表⽰数据的描述, value为固定数据类型的特殊处理的数据.

'''

SerializeToString()

'''

作⽤: 把这个Example序列化成字符串, 将这个字符串通过TFRecordWriter写⼊到TFRecord中.

'''

协议化的描述数据信息, 结构为键值对, key为字符串, ⽤来描述数据, value为tf.train.Feature对象, ⼀个Feature包含⼀种数据类型的list, list中有若⼲数据.

list有三种: BytesList, FloatList, Int64List

init()

'''

作⽤: 初始化⼀个Features

参数:

feature: dict字典, key为数据名称, value为tf.train.Feature对象, 特殊的数据list.

'''

"label":tf.train.Feature(int64_ain.Int64List(value=[train_labels_values[i]]))})

⼀个Feature包含⼀种数据类型的list, list中有若⼲数据.

list类型主要有BytesList, FloatList, Int64List三种类型

init()

'''

作⽤: 初始化⼀个Feature, 内含数据队列.

参数:

bytes_list: 队列数据为byte(⼀般是字符串)时使⽤, 将tf.train.BytesList赋给此参数;

float_list: 队列数据为float时使⽤, 将tf.train.FloatList赋给此参数;

int64_list: 队列数据为int64时使⽤, 将tf.train.Int64List赋给此参数.

'''

字符, 浮点, 整形三种数据队列

'''

作⽤: 初始化⼀个类型的队列, 传⼊数据

参数:

value: 将相应类型的数据放在list中, 赋值给此参数

'''

⽣成TFRecord例⼦

读取csv⽂件, 并转为TFRecord

这⾥使⽤Kaggle中Getting Started级别的Digit Recognizer题⽬, 是MNIST⼿写数字的csv格式, 这⾥使⽤train.csv作为例⼦, 实现将csv⽂件转换为TFRecord的操作.

import numpy as np

import pandas as pd

import tensorflow as tf

train_ad_csv(filepath_or_buffer="train.csv")

train_labels_frame=train_frame.pop(item="label")

train_values=train_frame.values

train_labels_values=train_labels_frame.values

train_size=train_values.shape[0]

writer=tf.python_io.TFRecordWriter(path="train.tfrecords")

for i in range(train_size):

image_raw=train_values[i].tostring()

ain.Example(

ain.Features(

feature={

"image_raw":tf.train.Feature(bytes_ain.BytesList(value=[image_raw])),

"label":tf.train.Feature(int64_ain.Int64List(value=[train_labels_values[i]])) }

)

writer.write(record=example.SerializeToString())

writer.close()

读取图⽚, 转换为TFRecord格式

from PIL import Image # 这⾥使⽤PIL包读取图⽚

def_int64_feature(value):

ain.Feature(int64_list = tf.train.Int64List(value = [value]))

def_bytes_feature(value):

ain.Feature(bytes_list = tf.train.BytesList(value = [value]))

def convert_to(data_path, name):

rows = 64

cols = 64

depth = DEPTH

python怎么读csv数据

for ii in range(12):

writer = tf.python_io.TFRecordWriter(name + str(ii) + '.tfrecords')

for img_name in os.listdir(data_path)[ii*16384 : (ii+1)*16384]:

img_path = data_path + img_name

img = Image.open(img_path)

h, w = img.size[:2]

j, k = (h - OUTPUT_SIZE) / 2, (w - OUTPUT_SIZE) / 2

box = (j, k, j + OUTPUT_SIZE, k+ OUTPUT_SIZE)

img = p(box = box)

img = size((rows,cols))

img_raw = bytes()

example = tf.train.Example(features = tf.train.Features(feature = {

'height': _int64_feature(rows),

'weight': _int64_feature(cols),

'depth': _int64_feature(depth),

'image_raw': _bytes_feature(img_raw)}))

writer.write(example.SerializeToString())

writer.close()

读⽂件

Tensorflow中, 有三种主要的读取数据⽂件的读写器类, 有共⽤的操作⽂件的Ops.

tf.TextLineReader ⽤于读取csv⽂件, 配合tf.decode_csv()⽅法使⽤;

tf.FixedLengthRecordReader ⽤于读取⼆进制编码⽂件, 配合tf.decode_raw()解码器⽅法使⽤;

tf.TFRecordReader ⽤于读取TFRecord⽂件, 配合tf.parse_single_example()解析器和tf.decode_raw()解码器⽅法使⽤.三个类均继承于⽗类tf.ReaderBase, 常⽤的⽅法有:

read()

FixedLengthRecordReader

init()

'''

作⽤: ⽣成⼀个每次读取固定长度数据的读取器.

参数:

record_bytes: (must)整数, 固定的读取长度;

header_bytes: 整数, 头数据长度, 默认为0;

footer_bytes: 整数, 尾数据长度, 默认为0;

'''

使⽤例⼦见章节中最后的读取数据的例⼦.

TextLineReader

init()

'''

作⽤: ⽣成⼀个每次读取⼀⾏内容的读取器

参数:

skip_header_lines: 整数, 需要跳过的头⾏数, 默认为0;

'''

使⽤:

filename_queue = tf.train.string_input_producer(["file0.csv", "file1.csv"])

reader = tf.TextLineReader()

key, value = ad(filename_queue)

# Default values, in case of empty columns. Also specifies the type of the

# decoded result.

record_defaults = [[1], [1], [1], [1], [1]]

col1, col2, col3, col4, col5 = tf.decode_csv(

value, record_defaults=record_defaults)

features = tf.concat(0, [col1, col2, col3, col4])

with tf.Session() as sess:

# Start populating the filename queue.

coord = tf.train.Coordinator()

threads = tf.train.start_queue_runners(coord=coord)

for i in range(1200):

# Retrieve a single instance:

example, label = sess.run([features, col5])

coord.join(threads)

每次read的执⾏都会从⽂件中读取⼀⾏内容, decode_csv() 操作会解析这⼀⾏内容并将其转为张量列表. 如果输⼊的参数有缺失, record_default参数可以根据张量的类型来设置默认值.

TFRecordReader

init()

'''

作⽤: ⽣成⼀个每次从TFRecord中读取⼀个Features数据的读取器

参数:

options: TFRecordOptions;

'''

使⽤例⼦

def read_and_decode(filename):

#根据⽂件名⽣成⼀个队列

filename_queue = tf.train.string_input_producer([filename])

reader = tf.TFRecordReader()

_, serialized_example = ad(filename_queue) #返回⽂件名和⽂件

features = tf.parse_single_example(serialized_example,

features={

'label': tf.FixedLenFeature([], tf.int64),

'img_raw' : tf.FixedLenFeature([], tf.string),

})

img = tf.decode_raw(features['img_raw'], tf.uint8)

img = tf.reshape(img, [224, 224, 3])

img = tf.cast(img, tf.float32) * (1. / 255) - 0.5

label = tf.cast(features['label'], tf.int32)

return img, label

辅助读取⽂件数据的函数

688IT编程网

Tensorflow:文件读写

发表评论

推荐文章

java正则表达式选择题

一种基于正则表达式的DBC文件解析及报文分析方法[发明专利]

工龄小数点提取

非零金额正则表达式

提取文本中数字的函数

热门文章

利用正则表达式实现文本数据提取与处理

正则表达式零宽断言详解

文本匹配规则

excel中使用正则

1-31正则表达式

anki之高级筛选

BUAA_OO_2021_第一单元总结

insert语句递增写法

sublime text 3在行前插入递增数字序号的方法

字符串只允许数字和英文的正则

powerbuilder 正则表达式

Shell脚本编写的高级技巧利用正则表达式进行字符串匹配

JAVA正则表达式的三种模式:贪婪,勉强和占有的讨论

go regexp匹配规则

oracle regexp_substr 实现原理

基本的元字符回溯引用和前后查匹配模式

elasticsearch query dsl正则

oracle sql正则表达式

GA-设置目标

仅匹配全角片假名的正则表达式

最新文章

java正则表达式选择题

工龄小数点提取

非零金额正则表达式

提取文本中数字的函数

vue数字相加小数点变长-概述说明以及解释

vue validate 正则验证小数长度

标签列表

688IT编程网

Tensorflow:文件读写

发表评论

推荐文章

java正则表达式 选择题

一种基于正则表达式的DBC文件解析及报文分析方法[发明专利]

工龄小数点提取

非零金额 正则表达式

提取文本中数字的函数

热门文章

利用正则表达式实现文本数据提取与处理

正则表达式零宽断言详解

文本匹配规则

excel中使用正则

1-31正则表达式

anki之高级筛选

BUAA_OO_2021_第一单元总结

insert语句递增写法

sublime text 3在行前插入递增数字序号的方法

字符串只允许数字和英文的正则

powerbuilder 正则表达式

Shell脚本编写的高级技巧利用正则表达式进行字符串匹配

JAVA正则表达式的三种模式:贪婪,勉强和占有的讨论

go regexp匹配规则

oracle regexp_substr 实现原理

基本的元字符 回溯引用和前后查 匹配模式

elasticsearch query dsl正则

oracle sql正则表达式

GA-设置目标

仅匹配全角片假名的正则表达式

最新文章

java正则表达式 选择题

工龄小数点提取

非零金额 正则表达式

提取文本中数字的函数

vue数字相加小数点变长-概述说明以及解释

vue validate 正则验证小数长度

标签列表

java正则表达式选择题

非零金额正则表达式

基本的元字符回溯引用和前后查匹配模式

java正则表达式选择题

非零金额正则表达式