pytorch实现L2和L1正则化regularization的操作
torch.optim集成了很多优化器,如SGD,Adadelta,Adam,Adagrad,RMSprop等,这些优化器⾃带的⼀个参数
weight_decay,⽤于指定权值衰减率,相当于L2正则化中的λ参数,注意torch.optim集成的优化器只有L2正则化⽅法,你可以查看注释,参数weight_decay 的解析是:
weight_decay (float, optional): weight decay (L2 penalty) (default: 0)
使⽤torch.optim的优化器,可如下设置L2正则化
optimizer = optim.Adam(model.parameters(),lr=learning_rate,weight_decay=0.01)
但是这种⽅法存在⼏个问题,
(1)⼀般正则化,只是对模型的权重W参数进⾏惩罚,⽽偏置参数b是不进⾏惩罚的,⽽torch.optim的优化器weight_decay 参数指定的权值衰减是对⽹络中的所有参数,包括权值w和偏置b同时进⾏惩罚。很多时候如果对b 进⾏L2正则化将会导致严重的⽋拟合,因此这个时候⼀般只需要对权值w进⾏正则即可。(PS:这个我真不确定,源码解析是 weight decay (L2 penalty) ,但有些⽹友说这种⽅法会对参数偏置b也进⾏惩罚,可解惑的⽹友给个明确的答复)
(2)缺点:torch.optim的优化器固定实现L2正则化,不能实现L1正则化。如果需要L1正则化,可如下实现:
(3)根据正则化的公式,加⼊正则化后,loss会变原来⼤,⽐如weight_decay=1的loss为10,那么weight_decay=100
时,loss输出应该也提⾼100倍左右。⽽采⽤torch.optim的优化器的⽅法,如果你依然采⽤loss_fun= nn.CrossEntropyLoss()进⾏计算loss,你会发现,不管你怎么改变weight_decay的⼤⼩,loss会跟之前没有加正则化的⼤⼩差不多。这是因为你的loss_fun损失函数没有把权重W的损失加上。
(4)采⽤torch.optim的优化器实现正则化的⽅法,是没问题的!只不过很容易让⼈产⽣误解,对鄙⼈⽽⾔,我更喜欢TensorFlow的正则化实现⽅法,只需要tf.get_collection(tf.GraphKeys.REGULARIZATION_LOSSES),实现过程⼏乎跟正则化的公式对应的上。
(5)Github项⽬源码:
为了,解决这些问题,我特定⾃定义正则化的⽅法,类似于TensorFlow正则化实现⽅法。
2. 如何判断正则化作⽤了模型?
⼀般来说,正则化的主要作⽤是避免模型产⽣过拟合,当然啦,过拟合问题,有时候是难以判断的。但是,要判断正则化是否作⽤了模型,还是很容易的。下⾯我给出两组训练时产⽣的loss和Accuracy的log
信息,⼀组是未加⼊正则化的,⼀组是加⼊正则化:
2.1 未加⼊正则化loss和Accuracy
优化器采⽤Adam,并且设置参数weight_decay=0.0,即⽆正则化的⽅法
optimizer = optim.Adam(model.parameters(),lr=learning_rate,weight_decay=0.0)
训练时输出的 loss和Accuracy信息
step/epoch:0/0,Train Loss: 2.418065, Acc: [0.15625]
step/epoch:10/0,Train Loss: 5.194936, Acc: [0.34375]
step/epoch:20/0,Train Loss: 0.973226, Acc: [0.8125]
step/epoch:30/0,Train Loss: 1.215165, Acc: [0.65625]
step/epoch:40/0,Train Loss: 1.808068, Acc: [0.65625]
step/epoch:50/0,Train Loss: 1.661446, Acc: [0.625]
正则化是最小化策略的实现
step/epoch:60/0,Train Loss: 1.552345, Acc: [0.6875]
step/epoch:70/0,Train Loss: 1.052912, Acc: [0.71875]
step/epoch:80/0,Train Loss: 0.910738, Acc: [0.75]
step/epoch:90/0,Train Loss: 1.142454, Acc: [0.6875]
step/epoch:100/0,Train Loss: 0.546968, Acc: [0.84375]
step/epoch:110/0,Train Loss: 0.415631, Acc: [0.9375]
step/epoch:120/0,Train Loss: 0.533164, Acc: [0.78125]
step/epoch:130/0,Train Loss: 0.956079, Acc: [0.6875]
step/epoch:140/0,Train Loss: 0.711397, Acc: [0.8125]
2.1 加⼊正则化loss和Accuracy
优化器采⽤Adam,并且设置参数weight_decay=10.0,即正则化的权重lambda =10.0
optimizer = optim.Adam(model.parameters(),lr=learning_rate,weight_decay=10.0)
这时,训练时输出的 loss和Accuracy信息:
step/epoch:0/0,Train Loss: 2.467985, Acc: [0.09375]
step/epoch:10/0,Train Loss: 5.435320, Acc: [0.40625]
step/epoch:20/0,Train Loss: 1.395482, Acc: [0.625]
step/epoch:30/0,Train Loss: 1.128281, Acc: [0.6875]
step/epoch:40/0,Train Loss: 1.135289, Acc: [0.6875]
step/epoch:50/0,Train Loss: 1.455040, Acc: [0.5625]
step/epoch:60/0,Train Loss: 1.023273, Acc: [0.65625]
step/epoch:70/0,Train Loss: 0.855008, Acc: [0.65625]
step/epoch:80/0,Train Loss: 1.006449, Acc: [0.71875]
step/epoch:90/0,Train Loss: 0.939148, Acc: [0.625]
step/epoch:100/0,Train Loss: 0.851593, Acc: [0.6875]
step/epoch:110/0,Train Loss: 1.093970, Acc: [0.59375]
step/epoch:120/0,Train Loss: 1.699520, Acc: [0.625]
step/epoch:130/0,Train Loss: 0.861444, Acc: [0.75]
step/epoch:140/0,Train Loss: 0.927656, Acc: [0.625]
当weight_decay=10000.0
step/epoch:0/0,Train Loss: 2.337354, Acc: [0.15625]
step/epoch:10/0,Train Loss: 2.222203, Acc: [0.125]
step/epoch:20/0,Train Loss: 2.184257, Acc: [0.3125]
step/epoch:30/0,Train Loss: 2.116977, Acc: [0.5]
step/epoch:40/0,Train Loss: 2.168895, Acc: [0.375]
step/epoch:50/0,Train Loss: 2.221143, Acc: [0.1875]
step/epoch:60/0,Train Loss: 2.189801, Acc: [0.25]
step/epoch:70/0,Train Loss: 2.209837, Acc: [0.125]
step/epoch:80/0,Train Loss: 2.202038, Acc: [0.34375]
step/epoch:90/0,Train Loss: 2.192546, Acc: [0.25]
step/epoch:100/0,Train Loss: 2.215488, Acc: [0.25]
step/epoch:110/0,Train Loss: 2.169323, Acc: [0.15625]
step/epoch:120/0,Train Loss: 2.166457, Acc: [0.3125]
step/epoch:130/0,Train Loss: 2.144773, Acc: [0.40625]
step/epoch:140/0,Train Loss: 2.173397, Acc: [0.28125]
2.3 正则化说明
就整体⽽⾔,对⽐加⼊正则化和未加⼊正则化的模型,训练输出的loss和Accuracy信息,我们可以发现,加⼊正则化后,loss 下降的速度会变慢,准确率Accuracy的上升速度会变慢,并且未加⼊正则化模型的loss和Accuracy的浮动⽐较⼤(或者⽅差⽐较⼤),⽽加⼊正则化的模型训练loss和Accuracy,表现的⽐较平滑。
并且随着正则化的权重lambda越⼤,表现的更加平滑。这其实就是正则化的对模型的惩罚作⽤,通过正则化可以使得模型表现的更加平滑,即通过正则化可以有效解决模型过拟合的问题。
3.⾃定义正则化的⽅法
为了解决torch.optim优化器只能实现L2正则化以及惩罚⽹络中的所有参数的缺陷,这⾥实现类似于TensorFlow正则化的⽅法。
3.1 ⾃定义正则化Regularization类
这⾥封装成⼀个实现正则化的Regularization类,各个⽅法都给出了注释,⾃⼰慢慢看吧,有问题再留⾔吧
# 检查GPU是否可⽤
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
# device='cuda'
print("-----device:{}".format(device))
print("-----Pytorch version:{}".format(torch.__version__))
class Module):
def __init__(self,model,weight_decay,p=2):
'''
:param model 模型
:param weight_decay:正则化参数
:param p: 范数计算中的幂指数值,默认求2范数,
当p=0为L2正则化,p=1为L1正则化
'''
super(Regularization, self).__init__()
if weight_decay <= 0:
print("param weight_decay can not <=0")
exit(0)
self.weight_decay=weight_decay
self.p=p
self.weight__weight(model)
self.weight_info(self.weight_list)
def to(self,device):
'''
指定运⾏模式
:param device: cude or cpu
:return:
'''
self.device=device
super().to(device)
return self
def forward(self, model):
self.weight__weight(model)#获得最新的权重
reg_loss = ularization_loss(self.weight_list, self.weight_decay, p=self.p)
return reg_loss
def get_weight(self,model):
'''
获得模型的权重列表
:param model:
:return:
'''
weight_list = []
for name, param in model.named_parameters():
if 'weight' in name:
weight = (name, param)
weight_list.append(weight)
return weight_list
def regularization_loss(self,weight_list, weight_decay, p=2):
'''
计算张量范数
:param weight_list:
:param p: 范数计算中的幂指数值,默认求2范数
:param weight_decay:
:return:
'''
# weight_decay=Variable(torch.FloatTensor([weight_decay]).to(self.device),requires_grad=True)
# reg_loss=Variable(torch.FloatTensor([0.]).to(self.device),requires_grad=True)
# weight_decay=torch.FloatTensor([weight_decay]).to(self.device)
# reg_loss=torch.FloatTensor([0.]).to(self.device)
reg_loss=0
for name, w in weight_list:
l2_reg = (w, p=p)
reg_loss = reg_loss + l2_reg
reg_loss=weight_decay*reg_loss
return reg_loss
def weight_info(self,weight_list):
'''
打印权重列表信息
:param weight_list:
:return:
'''
print("---------------regularization weight---------------")
for name ,w in weight_list:
print(name)
print("---------------------------------------------------")
3.2 Regularization使⽤⽅法
使⽤⽅法很简单,就当⼀个普通Pytorch模块来使⽤:例如
# 检查GPU是否可⽤
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print("-----device:{}".format(device))
print("-----Pytorch version:{}".format(torch.__version__))
weight_decay=100.0 # 正则化参数
model = my_net().to(device)
# 初始化正则化
if weight_decay>0:
reg_loss=Regularization(model, weight_decay, p=2).to(device)
else:
print("no regularization")
criterion= nn.CrossEntropyLoss().to(device) # CrossEntropyLoss=softmax+cross entropy optimizer = optim.Adam(model.parameters(),lr=learning_rate)#不需要指定参数weight_decay # train
batch_train_data=...
batch_train_label=...
out = model(batch_train_data)
# loss and regularization
loss = criterion(input=out, target=batch_train_label)
if weight_decay > 0:
loss = loss + reg_loss(model)
total_loss = loss.item()
# backprop
<_grad()#清除当前所有的累积梯度
total_loss.backward()
optimizer.step()
训练时输出的 loss和Accuracy信息:
(1)当weight_decay=0.0时,未使⽤正则化
step/epoch:0/0,Train Loss: 2.379627, Acc: [0.09375]
step/epoch:10/0,Train Loss: 1.473092, Acc: [0.6875]
step/epoch:20/0,Train Loss: 0.931847, Acc: [0.8125]
step/epoch:30/0,Train Loss: 0.625494, Acc: [0.875]
step/epoch:40/0,Train Loss: 2.241885, Acc: [0.53125]
step/epoch:50/0,Train Loss: 1.132131, Acc: [0.6875]
step/epoch:60/0,Train Loss: 0.493038, Acc: [0.8125]
step/epoch:70/0,Train Loss: 0.819410, Acc: [0.78125]
step/epoch:80/0,Train Loss: 0.996497, Acc: [0.71875]
step/epoch:90/0,Train Loss: 0.474205, Acc: [0.8125]
step/epoch:100/0,Train Loss: 0.744587, Acc: [0.8125]
step/epoch:110/0,Train Loss: 0.502217, Acc: [0.78125]
step/epoch:120/0,Train Loss: 0.531865, Acc: [0.8125]
step/epoch:130/0,Train Loss: 1.016807, Acc: [0.875]
step/epoch:140/0,Train Loss: 0.411701, Acc: [0.84375]
(2)当weight_decay=10.0时,使⽤正则化
---------------------------------------------------
step/epoch:0/0,Train Loss: 1563.402832, Acc: [0.09375]
step/epoch:10/0,Train Loss: 1530.002686, Acc: [0.53125]
step/epoch:20/0,Train Loss: 1495.115234, Acc: [0.71875]
step/epoch:30/0,Train Loss: 1461.114136, Acc: [0.78125]
step/epoch:40/0,Train Loss: 1427.868164, Acc: [0.6875]
step/epoch:50/0,Train Loss: 1395.430054, Acc: [0.6875]
step/epoch:60/0,Train Loss: 1363.358154, Acc: [0.5625]
step/epoch:70/0,Train Loss: 1331.439697, Acc: [0.75]
step/epoch:80/0,Train Loss: 1301.334106, Acc: [0.625]
step/epoch:90/0,Train Loss: 1271.505005, Acc: [0.6875]
step/epoch:100/0,Train Loss: 1242.488647, Acc: [0.75]
step/epoch:110/0,Train Loss: 1214.184204, Acc: [0.59375]
step/epoch:120/0,Train Loss: 1186.174561, Acc: [0.71875]
step/epoch:130/0,Train Loss: 1159.148438, Acc: [0.78125]
step/epoch:140/0,Train Loss: 1133.020020, Acc: [0.65625]
(3)当weight_decay=10000.0时,使⽤正则化
step/epoch:0/0,Train Loss: 1570211.500000, Acc: [0.09375]
step/epoch:10/0,Train Loss: 1522952.125000, Acc: [0.3125]
step/epoch:20/0,Train Loss: 1486256.125000, Acc: [0.125]
step/epoch:30/0,Train Loss: 1451671.500000, Acc: [0.25]
step/epoch:40/0,Train Loss: 1418959.750000, Acc: [0.15625]
step/epoch:50/0,Train Loss: 1387154.000000, Acc: [0.125]
step/epoch:60/0,Train Loss: 1355917.500000, Acc: [0.125]
step/epoch:70/0,Train Loss: 1325379.500000, Acc: [0.125]
step/epoch:80/0,Train Loss: 1295454.125000, Acc: [0.3125]
step/epoch:90/0,Train Loss: 1266115.375000, Acc: [0.15625]
step/epoch:100/0,Train Loss: 1237341.000000, Acc: [0.0625]
step/epoch:110/0,Train Loss: 1209186.500000, Acc: [0.125]
step/epoch:120/0,Train Loss: 1181584.250000, Acc: [0.125]
step/epoch:130/0,Train Loss: 1154600.125000, Acc: [0.1875]
step/epoch:140/0,Train Loss: 1128239.875000, Acc: [0.125]
对⽐torch.optim优化器的实现L2正则化⽅法,这种Regularization类的⽅法也同样达到正则化的效果,并且与TensorFlow类似,loss把正则化的损失也计算了。
此外更改参数p,如当p=0表⽰L2正则化,p=1表⽰L1正则化。
4. Github项⽬源码下载
《Github项⽬源码》
以上为个⼈经验,希望能给⼤家⼀个参考,也希望⼤家多多⽀持。如有错误或未考虑完全的地⽅,望不吝赐教。

版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系QQ:729038198,我们将在24小时内删除。