实现能够在训练过程中⼿动更改学习率
在和和三篇⽂章的基础上写的这篇⽂章
之前我们使⽤的是:
exp_lr_scheduler = optim.lr_scheduler.StepLR(optimizer_conv, step_size=3, gamma=0.1)
去⾃动递减学习率,但是这种⽅法还是⼗分死板的,希望实现能够⼿动根据收敛地效果去更改学习率的⼤⼩。所以在这⾥就是⽤了ipdb调试⼯具
1)
⾸先我们会使⽤argparse去实现命令⾏解析,实现通过设置--debugFile命令,然后通过⽣成和删除指定⽂件夹去进⼊调试状态:
def getArgs():
#设置解析
parser = argparse.ArgumentParser()
parser.add_argument('--debugFile', nargs='?', default='None', type=str)
args = parser.parse_args()
return vars(args) #⽤vars()内建函数化为字典
2)然后在train_model()函数中添加:
# 进⼊debug模式
# print('args : ', args['debugFile'])
if ists(args['debugFile']):
import ipdb;
ipdb.set_trace()
实现当读取到该指定的⽂件夹后进⼊调试状态
3)
整个代码如下:
# coding:utf8
from torchvision import datasets, models
from torch import nn, optim
from torchvision import transforms as T
from torch.utils import data
import os
import copy
import time
import torch
import argparse
#⾸先进⾏数据的处理
data_dir = './data'
device = torch.device("cuda:0"if torch.cuda.is_available() else"cpu")
#转换图⽚数据
normalize = T.Normalize([0.485, 0.456, 0.406],[0.229, 0.224, 0.225])
data_transforms ={
'train': T.Compose([
T.RandomResizedCrop(224),#从图⽚中⼼截取
T.RandomHorizontalFlip(),#随机⽔平翻转给定的PIL.Image,翻转概率为0.5
T.ToTensor(),#转成Tensor格式,⼤⼩范围为[0,1]
normalize
]
),
'val': T.Compose([
T.Resize(256),#重新设定⼤⼩
T.CenterCrop(224),
T.ToTensor(),
normalize
]),
}
#加载图⽚
#man的label为0, woman的label为1
image_datasets = {x : datasets.ImageFolder(os.path.join(data_dir, x), data_transforms[x]) for x in ['train', 'val']}
#得到train和val中的数据量
dataset_sizes = {x : len(image_datasets[x].imgs) for x in ['train', 'val']}
dataloaders = {x : data.DataLoader(image_datasets[x], batch_size=4, shuffle=True,num_workers=4) for x in ['train', 'val']} #然后选择使⽤的模型
model_conv = snet18(pretrained=True)
#冻结参数,不训练卷积层⽹络
#for param in model_conv.parameters():
# quires_grad = False
#提取fc全连接层中固定的参数,后⾯的训练只对全连接层的参数进⾏优化
fc_features = model_conv.fc.in_features
#修改类别为2,即man和woman
model_conv.fc = nn.Linear(fc_features, 2)
(device)
#定义使⽤的损失函数为交叉熵代价函数
criterion = nn.CrossEntropyLoss()
#定义使⽤的优化器
#optimizer_conv = optim.SGD(model_conv.fc.parameters(), lr=0.0001, momentum=0.9)
#optimizer_conv = optim.SGD(model_conv.parameters(), lr=0.0001, momentum=0.9)
optimizer_conv = optim.Adam(model_conv.parameters(), lr=0.0001, betas=(0.9, 0.99))
#设置⾃动递减的学习率,等间隔调整学习率,即在7个step时,将学习率调整为 lr*gamma
# exp_lr_scheduler = optim.lr_scheduler.StepLR(optimizer_conv, step_size=3, gamma=0.1)
#exp_lr_scheduler = optim.lr_scheduler.ReduceLROnPlateau(optimizer_conv, mode='min', verbose=True)
# 训练模型
# 参数说明:
# model:待训练的模型
# criterion:评价函数
# optimizer:优化器
# scheduler:学习率
# num_epochs:表⽰实现完整训练的次数,⼀个epoch表⽰⼀整個训练周期
# def train_model(model, criterion, optimizer, scheduler, args, num_epochs=20):
def train_model(model, criterion, optimizer, args, num_epochs=20):
# 定义训练开始时间
since = time.time()
#⽤于保存最优的权重
best_model_wts = copy.deepcopy(model.state_dict())
#最优精度值
best_train_acc = 0.0
best_val_acc = 0.0
best_iteration = 0
# # meters,统计指标:平滑处理之后的损失,还有混淆矩阵
# loss_meter = meter.AverageValueMeter()#能够计算所有数的平均值和标准差,⽤来统计⼀个epoch中损失的平均值 # confusion_matrix = meter.ConfusionMeter(2)#⽤来统计分类问题中的分类情况,是⼀个⽐准确率更详细的统计指标 # 对整个数据集进⾏num_epochs次训练
for epoch in range(num_epochs):
print('Epoch {}/{}'.format(epoch, num_epochs - 1))
print('-' * 10)
#⽤于存储train acc还没有与val acc⽐较之前的值
temp = 0
# Each epoch has a training and validation phase
# 每轮训练训练包含`train`和`val`的数据
for phase in ['train', 'val']:
if phase == 'train':
# 学习率步进
# scheduler.step()
# 设置模型的模式为训练模式(因为在预测模式下,采⽤了`Dropout`⽅法的模型会关闭部分神经元)
else:
# 预测模式
model.eval() # Set model to evaluate mode
running_loss = 0.0
running_corrects = 0
# Iterate over data.
# 遍历数据,这⾥的`dataloaders`近似于⼀个迭代器,每⼀次迭代都⽣成⼀批`inputs`和`labels`数据,
# ⼀批有四个图⽚,⼀共有dataset_sizes['train']/4或dataset_sizes['val']/4批
# 这⾥循环⼏次就看有⼏批数据
for inputs, labels in dataloaders[phase]:
inputs = (device) # 当前批次的训练输⼊
labels = (device) # 当前批次的标签输⼊
# print('input : ', inputs)
# print('labels : ', labels)
# 将梯度参数归0
<_grad()
# 前向计算
# track history if only in train
with torch.set_grad_enabled(phase == 'train'):
# 相应输⼊对应的输出
outputs = model(inputs)
resized# print('outputs : ', outputs)
# 取输出的最⼤值作为预测值preds,dim=1,得到每⾏中的最⼤值的位置索引,⽤来判别其为0或1
_, preds = torch.max(outputs, 1)
# print('preds : ', preds)
# 计算预测的输出与实际的标签之间的误差
loss = criterion(outputs, labels)
# backward + optimize only if in training phase
if phase == 'train':
# 对误差进⾏反向传播
loss.backward()
#scheduler.step(loss) #当使⽤的学习率递减函数为optim.lr_scheduler.ReduceLROnPlateau时,使⽤在这⾥ # 执⾏优化器对梯度进⾏优化
optimizer.step()
# loss_meter.add(loss.item())
# confusion_matrix.add(outputs.detach(), labels.detach())
# 进⼊debug模式
# print('args : ', args['debugFile'])
if ists(args['debugFile']):
import ipdb;
ipdb.set_trace()
# statistics
# 计算`running_loss`和`running_corrects`
#loss.item()得到的是此时损失loss的值
#inputs.size(0)得到的是⼀批图⽚的数量,这⾥为4
#两者相乘得到的是4张图⽚的总损失
#叠加得到所有数据的损失
running_loss += loss.item() * inputs.size(0)
#torch.sum(preds == labels.data)判断得到的结果中有⼏个正确,running_corrects得到四个中正确的个数
#叠加得到所有数据中判断成功的个数
running_corrects += torch.sum(preds == labels.data)
# 当前轮的损失,除以所有数据量个数得到平均loss值
epoch_loss = running_loss / dataset_sizes[phase]
# 当前轮的精度,除以所有数据量个数得到平均准确度
epoch_acc = running_corrects.double() / dataset_sizes[phase]
print('{} Loss: {:.4f} Acc: {:.4f}'.format(phase, epoch_loss, epoch_acc))
# deep copy the model
# 对模型进⾏深度复制
if phase == 'train' and epoch_acc > best_train_acc:
temp = epoch_acc
if phase =='val' and epoch_acc > 0 and epoch_acc < temp:
best_train_acc = temp
best_val_acc = epoch_acc
best_iteration = epoch
best_model_wts = copy.deepcopy(model.state_dict())
# 计算训练所需要的总时间
time_elapsed = time.time() - since
print('Training complete in {:.0f}m {:.0f}s'.format(time_elapsed // 60, time_elapsed % 60))
print('Best epoch: {:4f}'.format(best_iteration))
print('Best train Acc: {:4f}'.format(best_train_acc))
print('Best val Acc: {:4f}'.format(best_val_acc))
# load best model weights
# 加载模型的最优权重
model.load_state_dict(best_model_wts)
return model
def getArgs():
#设置解析
parser = argparse.ArgumentParser()
parser.add_argument('--debugFile', nargs='?', default='None', type=str)
args = parser.parse_args()
return vars(args) #⽤vars()内建函数化为字典
if __name__ == '__main__':
args_dist = getArgs()
print(args_dist)
# model_train = train_model(model_conv, criterion, optimizer_conv, exp_lr_scheduler, args_dist)
model_train = train_model(model_conv, criterion, optimizer_conv, args_dist)
torch.save(model_train, 'GenderTest1_18.pkl')
View Code
4)
然后运⾏:
(deeplearning) userdeMBP:resnet18 user$ python train.py --debugFile=./debug
{'debugFile': './debug'}
Epoch 0/19
----------
train Loss: 0.7313 Acc: 0.6000
val Loss: 0.6133 Acc: 0.5500
Epoch 1/19
----------
train Loss: 0.3051 Acc: 0.9500
val Loss: 0.5630 Acc: 0.7000
Epoch 2/19
-
---------
train Loss: 0.1872 Acc: 0.9000
val Loss: 0.8300 Acc: 0.6500
Epoch 3/19
----------
train Loss: 0.3791 Acc: 0.8500
val Loss: 1.1445 Acc: 0.6000
Epoch 4/19
----------
train Loss: 0.4880 Acc: 0.8000
val Loss: 0.5832 Acc: 0.7000
Epoch 5/19
----------
这时候在当前⽂件夹下⽣成⼀个名为debug的⽂件夹,就会进⼊调试模式:
Epoch 5/19
----------
--Call--
> /anaconda3/envs/deeplearning/lib/python3.6/site-packages/torch/autograd/grad_mode.py(129)__exit__()
128
--> 129 def __exit__(self, *args):
130 torch.set_grad_enabled(self.prev)
ipdb> u #进⼊上⼀条命令
> /Users/user/pytorch/gender_test_work/resnet18/train.py(151)train_model()
150 import ipdb;
--> 151 ipdb.set_trace()
152
ipdb> for group in optimizer.param_groups: group['lr'] #查看当前的学习率值
0.0001
ipdb> for group in optimizer.param_groups: group['lr']=0.01 #更改为新的学习率值
然后这时候删除debug⽂件夹,在调试中运⾏c命令继续向下运⾏:
ipdb> c
train Loss: 1.0321 Acc: 0.7500
val Loss: 8590.6042 Acc: 0.6000
Epoch 6/19
----------
train Loss: 2.5962 Acc: 0.4000
val Loss: 23884.8344 Acc: 0.5000
Epoch 7/19
----------
train Loss: 1.0793 Acc: 0.5500
val Loss: 65956.7039 Acc: 0.5000
Epoch 8/19
----------
train Loss: 1.6199 Acc: 0.4500
val Loss: 16973.9813 Acc: 0.5000
Epoch 9/19
----------
train Loss: 1.4478 Acc: 0.3500
val Loss: 1580.6444 Acc: 0.5000
Epoch 10/19
----------
然后再⽣成debug⽂件夹进⼊调试命令,查看此时的学习率,可见此时的学习率果然为调后的0.01:Epoch 10/19
----------
> /Users/user/pytorch/gender_test_work/resnet18/train.py(151)train_model()
150 import ipdb;
--> 151 ipdb.set_trace()
152
ipdb> for group in optimizer.param_groups: group['lr']
0.01
ipdb>
上⾯的训练结果什么的不要太在意,只是为了演⽰随便跑的
5)
中间有出现⼀个问题:
SyntaxError: non-default argument follows default argument
这种错误原因是将没有默认值的参数在定义时放在了有默认值的参数的后⾯如:
def train_model(model, criterion, optimizer, scheduler, num_epochs=200, args_dist):
应该写成:
def train_model(model, criterion, optimizer, scheduler, args_dist, num_epochs=200):
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系QQ:729038198,我们将在24小时内删除。
发表评论