批标准化tf.keras.layers.BatchNormalization参数解析与应用分析

批标准化tf.keras.layers.BatchNormalization参数解析与应⽤分

析

Table of Contents

说明：

1. 我⽤的tensorflow是1.14.0。 tensorflow的⼀些参数含义在1.*版本和

2.*版本上是不同的。这点需要注意。以下⾯代码中的参

数“trainable=True/False”为例，在1.*版本和2.*版本上是完全相反的含义。（劝⼤家早⽇逃离tensorflow）.

2. trainable参数：是在批标准化层的类对象参数。

training参数：是批标准化的类对象的调⽤函数call()的参数。

3. trainable参数：

上⾯提到，在tensorflow 2.0和tensorflow 1.*中，对于批标准化层的trainable参数的相同设置有不同的含义。下⾯第⼆个代码框内有介绍。

所应⽤的到底是哪种含义，建议直接去源码查看说明，我⽤的是tf 1.14.0, 在批标准化层的说明。

参数介绍

基类的定义如下：

class BatchNormalizationBase(Layer):

def __init__(self,

axis=-1,# 指向[NHWC]的channel维度，当数据shape为[NCHW]时，令axis=1

momentum=0.99,# 计算均值与⽅差的滑动平均时使⽤的参数（滑动平均公式中的beta，不要与这⾥混淆）

epsilon=1e-3,

center=True,# bool变量，决定是否使⽤批标准化⾥的beta参数(是否进⾏平移)

scale=True,# bool变量，决定是否使⽤批标准化⾥的gamma参数(是否进⾏缩放)

beta_initializer='zeros',# 调⽤s_initializer()，beta参数的0初始化，beta参数是平移参数

gamma_initializer='ones',# 调⽤s_initializer()，gamma参数的1初始化,gamma参数是缩放参数

moving_mean_initializer='zeros',# 均值的滑动平均值的初始化，初始均值为0

moving_variance_initializer='ones',# ⽅差的滑动平均值的初始化，初始均值为1# 可见初始的均值与⽅差是标准正态分布的均值与⽅差

beta_regularizer=None,# beta参数的正则化向，⼀般不⽤

gamma_regularizer=None,# gamma 参数的正则化向，⼀般不⽤

beta_constraint=None,# beta参数的约束项，⼀般不⽤

gamma_constraint=None,# gamma 参数的约束项，⼀般不⽤

renorm=False,

renorm_clipping=None,

renorm_momentum=0.99,

fused=None,

trainable=True,# 默认为True，这个我觉得就不要改了，没必要给⾃⼰⿇烦，

# 就是把我们标准化公式⾥⾯的参数添加到

# GraphKeys.TRAINABLE_VARIABLES这个集合⾥⾯去，

# 因为只有添加进去了，参数才能更新，毕竟γ和β是需要学习的参数。

# 但是，tf.keras.layers.BatchNormalization中并没有做到这⼀点，

# 所以需要⼿⼯执⾏这⼀操作。

virtual_batch_size=None,

adjustment=None,

name=None,

**kwargs):

>>>>####

##只介绍参数，具体执⾏代码省略

>>>>#

def _get_training_value(self, training=None):

>##

###该函数说明了training在不同取值时的处理，把输⼊的training参数转为bool变量输出，

###这⾥主要关注对training=None的处理

>##

if training is None:

training = K.learning_phase() # K表⽰keras.backend,learning_phase()函数返回当前状态flag，是train还是test阶段，供keras使⽤ if self._USE_V2_BEHAVIOR:

if isinstance(training, int):

training = bool(training)

if base_layer_utils.is_in_keras_graph():

training = math_ops.logical_and(training, self._get_trainable_var())

else:

training = math_ops.logical_and(training, ainable)

return training

def call(self, inputs,# 就是输⼊数据，默认shape=[NHWC]，如果是其它shape，要对上⾯的axis值进⾏修改

training=None # 有三种选择：True,False，None，⽤于判断⽹络是处于训练阶段还是测试阶段。

# `training=True`: ⽹络处于训练阶段，The layer will normalize its inputs

# using the mean and variance of the current batch of inputs.

# `training=False`: ⽹络处于测试阶段或inference阶段，The layer will normalize its inputs using

tensorflow版本选择

# the mean and variance of its moving statistics, learned during training.

# 即，training=True：使⽤当前批次的均值与⽅差进⾏标准化；training=False，使⽤滑动均值，滑动⽅差进⾏标准化。 ):

training = self._get_training_value(training)

###

###只介绍参数，具体执⾏代码省略

###

关于trainable的设置，以下是keras的说明：

"""

class BatchNormalization(normalization.BatchNormalizationBase):

__doc__ = place_in_base_docstring([

('{{TRAINABLE_ATTRIBUTE_NOTE}}',

'''

**About setting `ainable = False` on a `BatchNormalization layer:**

关于 BatchNormalization 层中 ainable = False 的设置：

The meaning of setting `ainable = False` is to freeze the layer,

< its internal state will not change during training:

its trainable weights will not be updated

during `fit()` or `train_on_batch()`, and its state updates will not be run.

对于⼀个⼀般的层，设置ainable = False表⽰冻结这⼀层的参数，使这⼀层的内部状态不随着训练过程改变，即这⼀层的可训练参数不被更新，也即，在`fit()` o

Usually, this does not necessarily mean that the layer is run in inference

mode (which is normally controlled by the `training` argument that can

be passed when calling a layer). "Frozen state" and "inference mode"

are two separate concepts.

通常，设置ainable = False并不⼀定意味着这⼀层处于inference状态（测试状态），（模型是否处于inference状态，通常调⽤该层的call函数时⽤⼀个叫trainin

However, in the case of the `BatchNormalization` layer, **setting

`trainable = False` on the layer means that the layer will be

subsequently run in inference mode** (meaning that it will use

the moving mean and the moving variance to normalize the current batch,

rather than using the mean and variance of the current batch).

但是，在BatchNormalization中，设置trainable = False 意味着这⼀层会以“推断模式”运⾏。

这就意味着，如果在训练过程中设置批标准化层的trainable = False，就意味着批标准化过程中会使⽤滑动均值与滑动⽅差来执⾏当前批次数据的批标准化，⽽不是使⽤----》个⼈理解：对于批标准化，我们希望的是，在训练过程中使⽤每个minibatch⾃⼰的均值与⽅差执⾏标准化，同时保持⼀个滑动均值与滑动⽅差在测试过程中使⽤

This behavior has been introduced in TensorFlow 2.0, in order

to enable `ainable = False` to produce the most commonly

expected behavior in the convnet fine-tuning use case.

这⼀操作已经被引⼊到TensorFlow 2.0中，⽬的是使`ainable = False`产⽣最期待的⾏为：以便在⽹络fine-tune中使⽤。

---》个⼈理解：在⽹络fine-tune中，我们希望冻结⼀些层的参数，仅仅训练个别层的参数。对于批标准化层来说，我们希望这⼀层在训练过程中仍旧使⽤已经训练好的

Note that:

- This behavior only occurs as of TensorFlow 2.0. In 1.*,

setting `ainable = False` would freeze the layer but would

not switch it to inference mode.

注意：这⼀⾏为仅仅发⽣在TensorFlow 2.0上。在1.*版本上，设置标准化层的`ainable = False`，

仍旧只会冻结标准化层的gamma和beta，仍旧使⽤当前批次的--》个⼈理解：在1.*版本上，设置标准化层的`ainable = False`，得到的操作是：

1）标准化层的gamma和beta不被训练

2）执⾏标准化时，使⽤的是当前批次的均值和⽅差，⽽不是滑动均值和滑动⽅差。

3）滑动均值和滑动⽅差仍旧会被计算吗？这有待确定。

- Setting `trainable` on an model containing other layers will

recursively set the `trainable` value of all inner layers.

当给⼀整个model设置trainable参数时，相当于给其内部的每个层都设置了这⼀相同的参数。

- If the value of the `trainable`

attribute is changed after calling `compile()` on a model,

the new value doesn't take effect for this model

until `compile()` is called again.

如果，model在调⽤“compile()”时改变了trainable参数，新的trainable参数值并不影响这个model，直到再次调⽤“compile()”函数。

''')])

"""

函数调⽤

综上，在调⽤tf.keras.layers.BatchNormalization 时，我们⼏乎不需要设定任何参数，只需要输⼊数据就好。

但是

1. tf.keras.layers.BatchNormalization有⼀个bug：⽆论“trainable=True"还

是“trainable=False"，tf.keras.layers.BatchNormalization都不会把批标准化中的变量放到 tf.GraphKeys.UPDATE_OPS,

bn_update_ops中去，所以需要⼿动添加。

⽰例：

import tensorflow as tf

input = tf.ones([1, 2, 2, 3])

output = tf.keras.layers.BatchNormalization(trainable=None)(input,training=True)

#output = tf.keras.layers.BatchNormalization(trainable=True)(input,training=True)

#output = tf.keras.layers.BatchNormalization(trainable=False)(input,training=True)

update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)

print(update_ops)

"""

以上三种情况都是返回 []

"""

根据打印结果可见，keras没有把批标准化中的变量添加到tf.GraphKeys.UPDATE_OPS中，所以不能直接调⽤

tf.GraphKeys.UPDATE_OPS 去进⾏更新节点的提取。

因此，需要⼿动把BN的更新节点到tf.GraphKeys.UPDATE_OPS，的⽅法如下：

import tensorflow as tf

input = tf.ones([1, 2, 2, 3])

output = tf.keras.layers.BatchNormalization()(input,training=True)

# ⼿动添加⽅法

ops = tf.get_default_graph().get_operations()

bn_update_ops = [x for x in ops if ("AssignMovingAvg" in x.name pe=="AssignSubVariableOp")]

tf.add_to_collection(tf.GraphKeys.UPDATE_OPS,bn_update_ops)

update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)

print(update_ops)

2. 批标准化的⼀个⽐较重要的参数是"training"，在⼀些其他批标准化函数中需要⼿⼯设定，在keras⾥既可以通过⼿⼯设定，也可以通过另外⼀种⽅式设定：⽤tf.keras.backend.set_learning_phase()来设定。

from tensorflow.keras import backend as K

# 设置keras的训练状态，模拟训练或测试状态

K.set_learning_phase(1) # 1 代表训练状态， 0 代表测试状态

is_training = K.learning_phase()

print(is_training)

"""

打印： 1

"""

⽤tf.keras.backend.set_learning_phase()设定训练状态（⼀个全局变量）后，tf.keras.layers.BatchNormalization可以识别这⼀状态，然后对training=None进⾏⾃动处理：令training=True或False.这个操作在tf.keras.layers.BatchNormalization的源码中有所体现，这⾥不再赘述。

⽰例：

import tensorflow as tf

from keras import backend as K

# 设置keras的训练状态，模拟训练或测试状态

K.set_learning_phase(0) # 1 代表训练状态， 0 代表测试状态

is_training = K.learning_phase()

print ("is_training =",K.learning_phase())

input = tf.ones([1, 2, 2, 3])

output = tf.keras.layers.BatchNormalization()(input,training=None)

# ⼿动添加⽅法

ops = tf.get_default_graph().get_operations()

bn_update_ops = [x for x in ops if ("AssignMovingAvg" in x.name pe=="AssignSubVariableOp")]

tf.add_to_collection(tf.GraphKeys.UPDATE_OPS,bn_update_ops)

update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)

print(”update_ops=“，update_ops)

"""

当设置 K.set_learning_phase(0) 时，打印：

is_training = 0

update_ops= [[]]

当设置 K.set_learning_phase(1) 时，打印：

is_training = 1

update_ops= [[

<tf.Operation 'batch_normalization/AssignMovingAvg/AssignSubVariableOp' type=AssignSubVariableOp>,

<tf.Operation 'batch_normalization/AssignMovingAvg_1/AssignSubVariableOp' type=AssignSubVariableOp>

]]

"""

设置training=None时可能存在的问题 :tf.keras.backend.learning_phase()的特点

tf.keras.backend.learning_phase()的设定要出现在所有节点（尤其批标准化层）的定义之前

tf.keras.backend.learning_phase()会获取⼀个全局变量，是⼀个tensorflow的bool型tensor。在使⽤前需要预先设定。如果不预先设定，在执⾏全部变量初始化时会被初始化为False.如下：

import tensorflow as tf

from tensorflow.keras import backend as K

is_training = K.learning_phase()

print(is_training)

with tf.Session() as sess:

tf.global_variables_initializer().run()

print(sess.run(is_training))

"""

打印：

Tensor("keras_learning_phase:0", shape=(), dtype=bool)

False

"""

多说⼀句，如果不⽤ tensorflow.keras.backend.set_learning_phase()预先设定神经⽹络的训练或测试状态，keras或tensorflow是不会通过你的其他⾏为代码任务判断你是在训练还是在测试的，它还没那么智能，⽽且也很难有⼀个很明显的⾏为特征来指⽰这⼀点。因此，如果在⽤tf.keras.layers.BatchNormalization批标准化时打算⽤training=None的参数设置，⼀定要预先⽤

tf.keras.backend.set_learning_phase()设定好程序的运⾏状态。

python深度学习tensorflow和fme结合,实现档案扫描件数据自动分类

« 上一篇

基于web端和C++的两种深度学习模型部署方式

688IT编程网

批标准化tf.keras.layers.BatchNormalization参数解析与应用分析_百 ...

发表评论

推荐文章

java正则表达式选择题

一种基于正则表达式的DBC文件解析及报文分析方法[发明专利]

工龄小数点提取

非零金额正则表达式

提取文本中数字的函数

热门文章

excel文字递增函数公式

数字递增公式

notepad 正则变量运算

C++regex库常用函数及实例

js正则表达式之前瞻后顾与非捕获分组

indesign正则数字和英文之间的空格

C#匹配中文字符串的4种正则表达式分享

PHP正则表达式匹配中文字符

匹配中文汉字的正则表达式介绍

Python正则表达式如何进行字符串替换

orcl中用正则表达式

sql正则表达式excel

dataframe正则表达式

postgress sql正则

el-upload accept 正则表达式

半小时正则表达式

判断科学计数法的正则

根据url判断静态资源的方法

Java正则表达式-匹配正负浮点数

替换模糊匹配正则-hive

最新文章

一种基于正则表达式的DBC文件解析及报文分析方法[发明专利]

能被5整除的十进制整数的正规表达式

大于0小于等于1的正则表达式

linux grep 26个字母

java pattern 正则表达式

掌握文本编辑器中的搜索和替换技巧

标签列表

688IT编程网

批标准化tf.keras.layers.BatchNormalization参数解析与应用分析_百 ...

发表评论

推荐文章

java正则表达式 选择题

一种基于正则表达式的DBC文件解析及报文分析方法[发明专利]

工龄小数点提取

非零金额 正则表达式

提取文本中数字的函数

热门文章

excel文字递增函数公式

数字递增公式

notepad 正则变量运算

C++regex库常用函数及实例

js正则表达式之前瞻后顾与非捕获分组

indesign正则数字和英文之间的空格

C#匹配中文字符串的4种正则表达式分享

PHP正则表达式匹配中文字符

匹配中文汉字的正则表达式介绍

Python正则表达式如何进行字符串替换

orcl中用正则表达式

sql正则表达式excel

dataframe正则表达式

postgress sql正则

el-upload accept 正则表达式

半小时 正则表达式

判断科学计数法的正则

根据url判断静态资源的方法

Java正则表达式-匹配正负浮点数

替换模糊匹配正则-hive

最新文章

一种基于正则表达式的DBC文件解析及报文分析方法[发明专利]

能被5整除的十进制整数的正规表达式

大于0小于等于1的正则表达式

linux grep 26个字母

java pattern 正则表达式

掌握文本编辑器中的搜索和替换技巧

标签列表

java正则表达式选择题

非零金额正则表达式

半小时正则表达式