Imagenet图像分类训练总结(基于Tensorflow2.0实现)--688IT编程网

Imagenet图像分类训练总结（基于Tensorflow2.0实现）最近看到AWS在18年年底的⼀篇论⽂（Bag of Tricks for Image Classification with Convolutional Neural Networks），是李沐和他的同事们总结的在图像分类中⽤到的⼀些技巧，可以提⾼分类的准确率，我也照着论⽂提到的技巧测试了⼀下，基于Tensorflow 2.1版本，搭建了⼀个Darknet53的模型（这也是⼤名⿍⿍的YOLOV3的⾻⼲⽹络），在这个基础上来对Imagenent进⾏分类的训练。

⽹络模型的搭建

import tensorflow as tf

from tensorflow.keras import Model

l=tf.keras.layers

category_num = 80

vector_size = 3*(1+4+category_num)

def _conv(inputs, filters, kernel_size, strides, padding, bias=False, normalize=True, activation='relu'):

output = inputs

padding_str = 'same'

if padding>0:

output = l.ZeroPadding2D(padding=padding, data_format='channels_first')(output)

padding_str = 'valid'

output = l.Conv2D(filters, kernel_size, strides, padding_str, \

'channels_first', use_bias=bias, \

kernel_initializer='he_normal', \

kernel_regularizer=ularizers.l2(l=5e-4))(output)

if normalize:

output = l.BatchNormalization(axis=1)(output)

if activation=='relu':

output = l.ReLU()(output)

if activation=='relu6':

output = l.ReLU(max_value=6)(output)

if activation=='leaky_relu':

output = l.LeakyReLU(alpha=0.1)(output)

return output

def _residual(inputs, out_channels, activation='relu', name=None):

output1 = _conv(inputs, out_channels//2, 1, 1, 0, False, True, 'leaky_relu')

output2 = _conv(output1, out_channels, 3, 1, 1, False, True, 'leaky_relu')

output = l.Add(name=name)([inputs, output2])

return output

def darknet53_base():

image = tf.keras.Input(shape=(3,None,None))

net = _conv(image, 32, 3, 1, 1, False, True, 'leaky_relu') #32*H*W

net = _conv(net, 64, 3, 2, 1, False, True, 'leaky_relu') #64*H/2*W/2

net = _residual(net, 64, 'leaky_relu') #64*H/2*W/2

net = _conv(net, 128, 3, 2, 1, False, True, 'leaky_relu') #128*H/4*W/4

net = _residual(net, 128, 'leaky_relu') #128*H/4*W/4

net = _conv(net, 256, 3, 2, 1, False, True, 'leaky_relu') #256*H/8*W/8

net = _residual(net, 256, 'leaky_relu') #256*H/8*W/8

route1 = l.Activation('linear', dtype='float32', name='route1')(net)

net = _conv(net, 512, 3, 2, 1, False, True, 'leaky_relu') #512*H/16*W/16

net = _residual(net, 512, 'leaky_relu') #512*H/16*W/16

route2 = l.Activation('linear', dtype='float32', name='route2')(net)

net = _conv(net, 1024, 3, 2, 1, False, True, 'leaky_relu') #1024*H/32*W/32

net = _residual(net, 1024, 'leaky_relu') #1024*H/32*W/32

route3 = l.Activation('linear', dtype='float32', name='route3')(net)

net = tf.reduce_mean(net, axis=[2,3], keepdims=True)

net = _conv(net, 1000, 1, 1, 0, True, False, 'linear') #1000

net = l.Flatten(data_format='channels_first', name='logits')(net)

net = l.Activation('linear', dtype='float32', name='output')(net)

model = tf.keras.Model(inputs=image, outputs=[net, route1, route2, route3])

return model

在以上的代码中，Darknet53模型有4个输出，其中route1, route2, route3这三个是留待以后搭建YOLO V3⽹络时⽤的，在图像分类中暂时⽤不上。

图像预处理

论⽂介绍了以下的图像预处理的步骤：

1. 随机采样图⽚，解码为[0,255]的32位浮点数

2. 在图⽚中随机剪切⼀个长宽⽐在[3/4, 4/3]之间的矩形，其⾯积为图⽚⾯积的[8%, 100%]之间的⼀个随机数。然后把剪切后的图⽚

缩放到224*224

3. 随机翻转图⽚

4. 随机调整图⽚的hue, 饱和度，明亮度，调整系数是在[0.6, 1.4]之间的⼀个随机值。

5. 给图⽚添加PCA噪⾳，其系数为⾼斯分布(0,0.1)的⼀个随机值

6. 标准化图⽚的RGB的值，给RGB 3个Channel分别减去123.68,116.779,103.939，然后再除以58.393,5

7.12,57.375

模型训练

论⽂中提到了以下⼀些技巧：

1. ⼤的Batch Size的训练，随着Batch Size的增⼤线性增⼤学习率，例如Batch 128的学习率为0.1，那么Batch 256的学习率为

0.2。我的显卡在FP16精度下最⼤只能⽀持128的Batch Size，因此这条技巧对我没有⽤。

2. 学习率的热⾝，在模型刚开始训练的时候，需要从0开始逐渐增⼤学习率。例如设定初始学习率为0.1，在头1000个Batch的训练

时，学习率是从0线性增长到0.1，这样可以有助于尽快帮助模型进⼊稳定学习的状态。

3. 对残差⽹络的每个残差块的最后⼀个Batch Normalization层的γ初始化为0，这可以帮助模型在初始阶段的训练。

4. 对权重参数的偏差项不要做L2正则化

5. 如果显卡⽀持混合精度计算，则可以提⾼模型训练速度，同时不会⽹络性能不会有下降。

6. 学习率采⽤余弦下降的⽅式，实际应⽤中我还是采⽤了Step Decay的⽅式，因为这样可以更可控和减少训练时间。因为余弦下降的⽅

式学习率改变的太慢了，⽤Step Decay可以根据Loss值的情况来灵活调整学习率，能更快⼀些完成训练。当然如果显卡性能⾜够强⼤的话，⽤余弦下降的⽅式就最⽅便省⼼了。

7. 采⽤标签平滑的⽅式来处理训练数据，例如Imagenet⾥⾯有1000个种类的图像，对于每⼀个特定的图像，其对应的类别的Target不

是设为1，⽽是设为0.9，其他999个类别设置为0.1/999

8. 知识蒸馏，就是⽤⼀个更复杂也更⾼准确度的教师⽹络，来帮助现有⽹络提升性能。例如⽤⼀个RESNET152的⽹络来帮助⼀个

RESNET52的⽹络。这⾥我没有采⽤这个技巧。

9. Mix-Up训练，也就是每次对2张采样图⽚进⾏线性整合，相应的Label也要做线性整合。这种⽅式需要增加训练的次数。我这⾥也没

有采⽤这个技巧。不过这个技巧对于做⽬标识别会⽐较有⽤，可以增强模型的健壮性。

代码

完整的训练代码如下：

import tensorflow as tf

import tensorflow_addons as tfa

import math

import os

import random

import time

import numpy as np

from darknet53_model import darknet53_base

from tensorflow.keras.mixed_precision import experimental as mixed_precision

l = tf.keras.layers

policy = mixed_precision.Policy('mixed_float16')

mixed_precision.set_policy(policy)

imageWidth = 224

imageHeight = 224

imageDepth = 3

batch_size = 128

resize_min = 256

train_images = 1280000

batches_per_epoch = train_images//batch_size

train_epochs = 80

total_steps = batches_per_epoch*train_epochs

random_min_aspect = 0.75

random_max_aspect = 1/0.75

random_min_area = 0.08

random_angle = 7.

initial_warmup_steps = 1000

initial_lr = 0.02

eigvec = tf.constant([[-0.5675, 0.7192, 0.4009], [-0.5808, -0.0045, -0.8140], [-0.5836, -0.6948, 0.4203]], shape=[3,3], dtype=tf.float32) eigval = tf.constant([55.46, 4.794, 1.148], shape=[3,1], dtype=tf.float32)

mean_RGB = tf.constant([123.68, 116.779, 109.939], dtype=tf.float32)

std_RGB = tf.constant([58.393, 57.12, 57.375], dtype=tf.float32)

train_files_names = os.listdir('../train_tf/')

train_files = ['../train_tf/'+item for item in train_files_names]

valid_files_names = os.listdir('../valid_tf/')

valid_files = ['../valid_tf/'+item for item in valid_files_names]

# Parse TFRECORD and distort the image for train

def _parse_function(example_proto):

features = {

"image": tf.io.FixedLenFeature([], tf.string, default_value=""),

"height": tf.io.FixedLenFeature([1], tf.int64, default_value=[0]),

"width": tf.io.FixedLenFeature([1], tf.int64, default_value=[0]),

"channels": tf.io.FixedLenFeature([1], tf.int64, default_value=[3]),

"colorspace": tf.io.FixedLenFeature([], tf.string, default_value=""),

"img_format": tf.io.FixedLenFeature([], tf.string, default_value=""),

"label": tf.io.FixedLenFeature([1], tf.int64, default_value=[0]),

"bbox_xmin": tf.io.VarLenFeature(tf.float32),

"bbox_xmax": tf.io.VarLenFeature(tf.float32),

"bbox_ymin": tf.io.VarLenFeature(tf.float32),

"bbox_ymax": tf.io.VarLenFeature(tf.float32),

"text": tf.io.FixedLenFeature([], tf.string, default_value=""),

"filename": tf.io.FixedLenFeature([], tf.string, default_value="")

}

parsed_features = tf.io.parse_single_example(example_proto, features)

image_decoded = tf.image.decode_jpeg(parsed_features["image"], channels=3)

image_decoded = tf.cast(image_decoded, dtype=tf.float32)

# Random crop the image

shape = tf.shape(image_decoded)

height, width = shape[0], shape[1]

random_aspect = tf.random.uniform(shape=[], minval=random_min_aspect, maxval=random_max_aspect) random_area = tf.random.uniform(shape=[], minval=random_min_area, maxval=1.0)

crop_width = tf.math.sqrt(

tf.divide(

tf.multiply(

tf.cast(tf.multiply(height,width), tf.float32),

random_area),

random_aspect)

)

crop_height = tf.cast(crop_width * random_aspect, tf.int32)

crop_height = tf.cond(crop_height<height, lambda:crop_height, lambda:height)

crop_width = tf.cast(crop_width, tf.int32)

crop_width = tf.cond(crop_width<width, lambda:crop_width, lambda:width)

cropped = tf.image.random_crop(image_decoded, [crop_height, crop_width, 3])

resized = size(cropped, [imageHeight, imageWidth])

# Flip to add a little more random distortion in.

flipped = tf.image.random_flip_left_right(resized)

# Random rotate the image

angle = tf.random.uniform(shape=[], minval=-random_angle, maxval=random_angle)*np.pi/180

rotated = ate(flipped, angle)

# Random distort the image

distorted = tf.image.random_hue(rotated, max_delta=0.3)

distorted = tf.image.random_saturation(distorted, lower=0.6, upper=1.4)

distorted = tf.image.random_brightness(distorted, max_delta=0.3)

# Add PCA noice

alpha = al([3], mean=0.0, stddev=0.1)

pca_noice = tf.reshape(tf.matmul(tf.multiply(eigvec,alpha), eigval), [3])

distorted = tf.add(distorted, pca_noice)

# Normalize RGB

distorted = tf.subtract(distorted, mean_RGB)

distorted = tf.divide(distorted, std_RGB)

image_train = tf.transpose(distorted, perm=[2, 0, 1])

features = {'input_1': image_train}

labels = tf.one_hot(parsed_features["label"][0], depth=1000)

return features, labels

def train_input_fn():

dataset_train = tf.data.TFRecordDataset(train_files)

dataset_train = dataset_train.map(_parse_function, num_parallel_calls=4)

dataset_train = dataset_train.shuffle(

buffer_size=12800,

reshuffle_each_iteration=True

)

dataset_train = peat(10)

dataset_train = dataset_train.batch(batch_size)

dataset_train = dataset_train.prefetch(batch_size)

return dataset_train

def _parse_test_function(example_proto):

features = {

"image": tf.io.FixedLenFeature([], tf.string, default_value=""),

"height": tf.io.FixedLenFeature([1], tf.int64, default_value=[0]),

"width": tf.io.FixedLenFeature([1], tf.int64, default_value=[0]),

"channels": tf.io.FixedLenFeature([1], tf.int64, default_value=[3]),

"colorspace": tf.io.FixedLenFeature([], tf.string, default_value=""),

"img_format": tf.io.FixedLenFeature([], tf.string, default_value=""),

"label": tf.io.FixedLenFeature([1], tf.int64, default_value=[0]),

"bbox_xmin": tf.io.VarLenFeature(tf.float32),

"bbox_xmax": tf.io.VarLenFeature(tf.float32),

"bbox_ymin": tf.io.VarLenFeature(tf.float32),

"bbox_ymax": tf.io.VarLenFeature(tf.float32),

"text": tf.io.FixedLenFeature([], tf.string, default_value=""),

"filename": tf.io.FixedLenFeature([], tf.string, default_value="")

}

parsed_features = tf.io.parse_single_example(example_proto, features)

image_decoded = tf.image.decode_jpeg(parsed_features["image"], channels=3)

image_decoded = tf.cast(image_decoded, dtype=tf.float32)

shape = tf.shape(image_decoded)

height, width = shape[0], shape[1]

resized_height, resized_width = tf.cond(height<width,

lambda: (resize_min, tf.cast(tf.multiply(tf.cast(width, tf.float64),tf.divide(resize_min,height)), tf.int32)), lambda: (tf.cast(tf.multiply(tf.cast(height, tf.float64),tf.divide(resize_min,width)), tf.int32), resize_min)) image_resized = size(image_decoded, [resized_height, resized_width])

# calculate how many to be center crop

shape = tf.shape(image_resized)

height, width = shape[0], shape[1]

amount_to_be_cropped_h = (height - imageHeight)

crop_top = amount_to_be_cropped_h // 2

amount_to_be_cropped_w = (width - imageWidth)

crop_left = amount_to_be_cropped_w // 2

image_cropped = tf.slice(image_resized, [crop_top, crop_left, 0], [imageHeight, imageWidth, -1])

# Normalize RGB

image_valid = tf.subtract(image_cropped, mean_RGB)

image_valid = tf.divide(image_valid, std_RGB)

image_valid = tf.transpose(image_valid, perm=[2, 0, 1])

features = {'input_1': image_valid}

labels = tf.one_hot(parsed_features["label"][0], depth=1000)

return features, labels

def val_input_fn():

dataset_valid = tf.data.TFRecordDataset(valid_files)

dataset_valid = dataset_valid.map(_parse_test_function, num_parallel_calls=4)

dataset_valid = dataset_valid.batch(batch_size)

dataset_valid = dataset_valid.prefetch(batch_size)

return dataset_valid

boundaries = [30000, 60000, 90000, 120000, 150000, 170000, 190000, 220000, 260000, 275000] values = [0.02, 0.01, 0.005, 0.002, 0.001, 0.0005, 0.00025, 0.0001, 0.00005, 0.000025, 0.00001] learning_rate_fn = tf.keras.optimizers.schedules.PiecewiseConstantDecay(boundaries, values)

class LRCallback(tf.keras.callbacks.Callback):

def __init__(self, starttime):

super(LRCallback, self).__init__()

resized

self.epoch_starttime = starttime

self.batch_starttime = starttime

def on_train_batch_end(self, batch, logs):

step = tf._del.optimizer.iterations)

lr = tf._del.optimizer.lr)

# Initial warmup phase, linearly increase the learning rate

if step < initial_warmup_steps:

newlr = (initial_lr/initial_warmup_steps)*step

tf.keras.backend.set_del.optimizer.lr, newlr)

# Calculate the lr based on cosine decay, not used here

'''

else:

newlr = (s(step/total_steps*math.pi))*initial_lr/2

tf.keras.backend.set_del.optimizer.lr, newlr)

'''

if step%100==0:

elasp_time = time.time()-self.batch_starttime

self.batch_starttime = time.time()

#Step decay learning rate

if step >= initial_warmup_steps:

tf.keras.backend.set_del.optimizer.lr, learning_rate_fn(step))

print("Steps:{}, LR:{:6.4f}, Loss:{:4.2f}, Time:{:4.1f}s"\

format(step, lr, logs['loss'], elasp_time))

688IT编程网

Imagenet图像分类训练总结(基于Tensorflow2.0实现)

发表评论

推荐文章

随机森林算法介绍及R语言实现

基于随机森林优化的神经网络算法在冬小麦产量预测中的应用研究_百度文 ...

基于正则化贪心森林算法的情感分析方法研究

随机森林算法和grandientboosting算法

基于随机森林的图像分类算法研究

热门文章

随机森林算法的改进方法

基于随机森林算法的风险预警模型研究

Python中的随机森林算法详解

随机森林发展历史

如何使用随机森林进行时间序列数据模式识别(八)

随机森林回归模型原理

如何使用随机森林进行时间序列数据模式识别(六)

如何使用随机森林进行时间序列数据预测(四)

如何使用随机森林进行异常检测(六)

随机森林算法和grandientboosting算法 -回复

随机森林方法总结全面

随机森林算法原理和步骤

随机森林的原理

随机森林重要性

随机森林算法

机器学习中随机森林的原理

随机森林算法原理

使用计算机视觉技术进行动物识别的技巧

基于crf命名实体识别实验总结

transformer预测模型训练方法

最新文章

随机森林算法介绍及R语言实现

基于随机森林优化的神经网络算法在冬小麦产量预测中的应用研究_百度文 ...

基于正则化贪心森林算法的情感分析方法研究

随机森林算法和grandientboosting算法

基于随机森林的图像分类算法研究

随机森林结合直接正交信号校正的模型传递方法

标签列表

688IT编程网

Imagenet图像分类训练总结(基于Tensorflow2.0实现)

发表评论

推荐文章

随机森林算法介绍及R语言实现

基于随机森林优化的神经网络算法在冬小麦产量预测中的应用研究_百度文 ...

基于正则化贪心森林算法的情感分析方法研究

随机森林算法和grandientboosting算法

基于随机森林的图像分类算法研究

热门文章

随机森林算法的改进方法

基于随机森林算法的风险预警模型研究

Python中的随机森林算法详解

随机森林发展历史

如何使用随机森林进行时间序列数据模式识别(八)

随机森林回归模型原理

如何使用随机森林进行时间序列数据模式识别(六)

如何使用随机森林进行时间序列数据预测(四)

如何使用随机森林进行异常检测(六)

随机森林算法和grandientboosting算法 -回复

随机森林方法总结全面

随机森林算法原理和步骤

随机森林的原理

随机森林 重要性

随机森林算法

机器学习中随机森林的原理

随机森林算法原理

使用计算机视觉技术进行动物识别的技巧

基于crf命名实体识别实验总结

transformer预测模型训练方法

最新文章

随机森林算法介绍及R语言实现

基于随机森林优化的神经网络算法在冬小麦产量预测中的应用研究_百度文 ...

基于正则化贪心森林算法的情感分析方法研究

随机森林算法和grandientboosting算法

基于随机森林的图像分类算法研究

随机森林结合直接正交信号校正的模型传递方法

标签列表

随机森林重要性