GenerativeAdversarialNets论⽂笔记+代码解析
论⽂地址:
源码地址:
Generative Adversarial Networks
⽂章的题⽬为Generative Adversarial Networks,简单明了。
⾸先Generative,我们知道在机器学习中含有两种模型,⽣成式模型(Generative Model)和判别式模型(Discriminative Model)。⽣成式模型研究的是联合分布概率,主要⽤来⽣成具有和训练样本分布⼀致的样本;判别式模型研究的是条件分布概率,主要⽤来对训练样本分类,两者具体的区别在这⾥不再赘述。判别式模型因其多⽅⾯的优势,在以往的研究和应⽤中占了很⼤的⽐例,尤其是在⽬标识别和分类等⽅⾯;⽽⽣成式模型则只有很少的研究和发展,⽽且模型的计算量也很⼤很复杂,实际上的应⽤也就⽐判别式模型要少很多。⽽本⽂就是⼀篇对⽣成式模型的研究,并且这个研究⼀提出来便在整个机器学习界掀起了轩然⼤波,连深度学习的⿐祖Yann LeCun都说⽣成对抗⽹络是让他最激动的深度学习进展,也可见这项研究的伟⼤,当然也说明了这篇⽂章做描述的⽣成式模型是和传统的⽣成式模型是很不同的,那么不同之处在哪⾥呢?
然后Adversarial,这边是这篇⽂章与传统的⽣成式模型的不同之处;对抗,谁与之对抗呢?当然就是判别式模型,那么,如何对抗呢?这也就是这篇⽂章主要的研究内容。在原⽂中,作者说了这么⼀段话:“The generative model can be thought of as analogous to a team of counterfeiters, trying to produce fake currency and use it without detection, while the discriminative model is analogous to the police, trying to detect the counterfeit currency.”⼤意就是,⽣成式模型就像是罪犯,专门造,判别式模型就像是警察,专门辨别,那⽣成式模型就要努⼒提⾼⾃⼰的技术使造的不被发现,⽽判别式模型就要努⼒提⾼⾃⼰辨别的能⼒。这就类似于⼀个⼆⼈博弈问题。最终的结果就是两⼈达到⼀个平衡的状态,这就是对抗。
那么这样的对抗在机器⾥是怎样实现的呢?
在论⽂中,真实数据x的分布为1维的⾼斯分布p(data),⽣成器G为⼀个多层感知机,它从随机噪声中随机挑选数据z输⼊,输出为G(z),G的分布为p(g)。判别器D也是⼀个多层感知机,它的输出D(x)代表着判定判别器的输⼊x属于真实数据⽽不是来⾃于⽣成器的概率。再回到博弈问题,G的⽬的是让p(g)和p(data)⾜够像,那么D就⽆法将来⾃于G的数据鉴别出来,即D(G(z))⾜够⼤;⽽D的⽬的是能够正确的将G(z)鉴别出来,即使D(x)⾜够⼤,且D(G(z))⾜够⼩,即(D(x)+(1-D(G(z))))⾜够⼤。
那么模型的⽬标函数就出来了,对于⽣成器G,⽬标函数为(梯度下降):
⽽对于判别器D,⽬标函数为(梯度上升):
于是模型就成了优化这两个⽬标函数的问题了。这样的话就只需要反向传播来对模型训练就好了,没有像传统的⽣成式模型的最⼤似然函数的计算、马尔科夫链或其他推理等运算了。
这就是这篇⽂章⼤概的思路,具体的内容细节以及⼀些数学公式的推导可以仔细阅读论⽂原⽂以及本⼈的译⽂,在这篇⽂章的开头给出了链接,应该都是⽐较好理解的。
GAN in TensorFlow
参考:
该代码在TensorFlow上利⽤⽣成对抗⽹络来近似1维的⾼斯分布。
在代码的开头便定义了这个均值为4,⽅差为0.5的⾼斯分布:
class DataDistribution(object):
def __init__(self):
self.mu = 4
self.sigma = 0.5
def sample(self, N):
samples = al(self.mu, self.sigma, N)
samples.sort()
return samples
以及⼀个初始的⽣成器分布:
class GeneratorDistribution(object):
def __init__(self, range):
self.range = range
def sample(self, N):
return np.linspace(-self.range, self.range, N) + \
np.random.random(N) * 0.01
然后定义了⼀个线性运算:
def linear(input, output_dim, scope=None, stddev=1.0):
norm = tf.random_normal_initializer(stddev=stddev)
const = tf.constant_initializer(0.0)
with tf.variable_scope(scope or 'linear'):
w = tf.get_variable('w', [_shape()[1], output_dim], initializer=norm) #输⼊的第⼆维作为数据维度
b = tf.get_variable('b', [output_dim], initializer=const)
return tf.matmul(input, w) + b
即简单的y=wx+b的运算,代码中使⽤了tf.variable_scope(),实际上这是使⽤了⼀个名为scope的变量空间,再通过tf.get_variable()定义该空间下的变量,变量的名字为“scope/w”和“scope/b”,这在很复杂的模型中有利于简化代码,并且⽅便⽤来共享变量,在后⾯也⽤到了共享变量。
接下来,定义了⽣成器运算:
def generator(input, h_dim):
h0 = tf.nn.softplus(linear(input, h_dim, 'g0'))
h1 = linear(h0, 1, 'g1')
return h1
判别器运算:
def discriminator(input, h_dim, minibatch_layer=True):
h0 = tf.tanh(linear(input, h_dim * 2, 'd0'))
h1 = tf.tanh(linear(h0, h_dim * 2, 'd1'))
# without the minibatch layer, the discriminator needs an additional layer
# to have enough capacity to separate the two distributions correctly
if minibatch_layer:
h2 = minibatch(h1)
else:
h2 = tf.tanh(linear(h1, h_dim * 2, scope='d2'))
h3 = tf.sigmoid(linear(h2, 1, scope='d3'))
return h3
这⾥有⼀个minibatch,minibatch的内容在原始论⽂中并没有提到,在后⾯我们会说到这个。总的来说⽣成器和判别器都是很简单的模型。
def optimizer(loss, var_list, initial_learning_rate):
decay = 0.95
num_decay_steps = 150
batch = tf.Variable(0)
learning_rate = ponential_decay(
initial_learning_rate,
batch,
num_decay_steps,
decay,
staircase=True
)
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(
loss,
global_step=batch,
var_list=var_list
)
return optimizer
定义了模型的优化器,模型的学习率使⽤指数型衰减,模型使⽤梯度下降来进⾏损失函数的优化。
接下来定义了GAN类,在GAN类中主要介绍以下⼏个部分:
def _create_model(self):
# In order to make sure that the discriminator is providing useful gradient
# information to the generator from the start, we're going to pretrain the
# discriminator using a maximum likelihood objective. We define the network
# for this pretraining step scoped as D_pre.
with tf.variable_scope('D_pre'):
self.pre_input = tf.placeholder(tf.float32, shape=(self.batch_size, 1))
self.pre_labels = tf.placeholder(tf.float32, shape=(self.batch_size, 1))
D_pre = discriminator(self.pre_input, self.mlp_hidden_size, self.minibatch)
self.pre_loss = tf.reduce_mean(tf.square(D_pre - self.pre_labels))
self.pre_opt = optimizer(self.pre_loss, None, self.learning_rate)
# This defines the generator network - it takes samples from a noise
# distribution as input, and passes them through an MLP.
with tf.variable_scope('G'):
self.z = tf.placeholder(tf.float32, shape=(self.batch_size, 1))
variable怎么记self.G = generator(self.z, self.mlp_hidden_size)
# The discriminator tries to tell the difference between samples from the
# true data distribution (self.x) and the generated samples (self.z).
#
# Here we create two copies of the discriminator network (that share parameters),
# as you cannot use the same network with different inputs in TensorFlow.
with tf.variable_scope('D') as scope:
self.x = tf.placeholder(tf.float32, shape=(self.batch_size, 1))
self.D1 = discriminator(self.x, self.mlp_hidden_size, self.minibatch)
self.D2 = discriminator(self.G, self.mlp_hidden_size, self.minibatch)
# Define the loss for discriminator and generator networks (see the original
# paper for details), and create optimizers for both
self.loss_d = tf.reduce_mean(-tf.log(self.D1) - tf.log(1 - self.D2))
self.loss_g = tf.reduce_mean(-tf.log(self.D2))
vars = tf.trainable_variables()
self.d_pre_params = [v for v in vars if v.name.startswith('D_pre/')]
self.d_params = [v for v in vars if v.name.startswith('D/')]
self.g_params = [v for v in vars if v.name.startswith('G/')]
self.opt_d = optimizer(self.loss_d, self.d_params, self.learning_rate)
self.opt_g = optimizer(self.loss_g, self.g_params, self.learning_rate)
与⽂章中不同的是,这⾥使⽤了三种模型:D_pre、G和D。
D_pre是在训练G之前,对D先进⾏⼀个预训练,这样能够在训练初期为G提供⾜够的梯度来进⾏更新。
G是⽣成器模型,通过将⼀个噪声数据输⼊到这个多层感知机,输出⼀个具有p(g)分布的数据。
D是判别器模型,代码中⽤到了use_variables(),⽬的是共享变量,因为真实数据和来⾃⽣成器的数据均输⼊到了判别器中,使⽤同⼀个变量,如果不共享,那么将会出现严重的问题,模型的输出代表着输⼊来⾃于真是数据的概率。
然后是两个损失函数,三个模型的参数集以及两个优化器。
def train(self):
with tf.Session() as session:
tf.global_variables_initializer().run()
# pretraining discriminator
num_pretrain_steps = 1000
for step in range(num_pretrain_steps):
d = (np.random.random(self.batch_size) - 0.5) * 10.0
labels = norm.pdf(d, loc=self.data.mu, scale=self.data.sigma)
pretrain_loss, _ = session.run([self.pre_loss, self.pre_opt], {
self.pre_input: np.reshape(d, (self.batch_size, 1)),
self.pre_labels: np.reshape(labels, (self.batch_size, 1))
})
self.weightsD = session.run(self.d_pre_params)
# copy weights from pre-training over to new D network
for i, v in enumerate(self.d_params):
session.run(v.assign(self.weightsD[i]))
for step in range(self.num_steps):
# update discriminator
x = self.data.sample(self.batch_size)
z = sample(self.batch_size)
loss_d, _ = session.run([self.loss_d, self.opt_d], {
self.x: np.reshape(x, (self.batch_size, 1)),
self.z: np.reshape(z, (self.batch_size, 1))
})
# update generator
z = sample(self.batch_size)
loss_g, _ = session.run([self.loss_g, self.opt_g], {
self.z: np.reshape(z, (self.batch_size, 1))
})
if step % self.log_every == 0:
print('{}: {}\t{}'.format(step, loss_d, loss_g))
self._plot_distributions(session)
训练过程包含了先前三个模型的训练,先进⾏1000步的D_pre预训练,预训练利⽤随机数作为训练样本,随机数字对应的正态分布的值作为训练标签,损失函数为军⽅误差,训练完成后,将D_pre的参数传递给D,然后在同时对G和D进⾏更新。
之后还有⼀些从训练完成的模型中采样、打印等函数操作,代码也⽐较简单,这⾥就不进⾏解析了。
然后我们可以运⾏代码来看看效果,在cmd中输⼊:
python gan.py
稍等⼀会,就可以在显⽰屏上看到打印信息,代表的含义是:(步数,D的loss,G的loss)。然后会看到⼀个这样的图:
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系QQ:729038198,我们将在24小时内删除。
发表评论