CS231n计算机视觉作业1-Q1-写⼀个K近邻分类器(如何开始作业)
⽂章⽬录
1开始
从下列⽹站下载作业包
2下载数据集
需要下载CIFAR-10数据集,如果是LINUX可以直接运⾏如下代码
cd cs231n/datasets
./get_datasets.sh
如果是windows可以⽤git运⾏,也可以简单的直接⽤记事本打开get_datasets.sh⽂件,记事本内内容是
# Get CIFAR10
wget o.edu/~kriz/
tar -xzvf
rm
#意思是注释
wget下载,通过wget url来完成下载
tar解压
rm删除
复制到迅雷或直接打开⽹站也能下载,下载后datasets⽂件内为
这⾥需要留意后续读取数据集的⽂件为cifar10_dir = 'cs231n/datasets/cifar-10-batches-py',这个⽂件在cifar-10-python内部,所以需要拿出来
3开始编程
# Run some setup code for this notebook.
import random
import numpy as np
from cs231n.data_utils import load_CIFAR10
import matplotlib.pyplot as plt
# This is a bit of magic to make matplotlib figures appear inline in the notebook
# rather than in a new window.
#这⾥有⼀个⼩技巧可以让matplotlib画的图出现在notebook页⾯上,⽽不是新建⼀个画图窗⼝.
%matplotlib inline
# Some more magic so that the notebook will reload external python modules;
# see stackoverflow/questions/1907993/autoreload-of-modules-in-ipython
#另⼀个⼩技巧,可以使 notebook ⾃动重载外部 python 模块.[点击此处查看详情][4]
#也就是说,当从外部⽂件引⼊的函数被修改之后,在notebook中调⽤这个函数,得到的被改过的函数.
%load_ext autoreload
%autoreload 2
中间出现如下问题
在这⾥按照⽹上的办法,将scipy降版本到1.20发现并不管⽤,装了pillow包后这个问题解决了。
内容理解(⾮运⾏代码)
下⾯两⾏代码
%load_ext autoreload
%autoreload 2
4加载CIFAR-10 原始数据
# Load the raw CIFAR-10 data.
cifar10_dir ='cs231n/datasets/cifar-10-batches-py'
# Cleaning up variables to prevent loading data multiple times (which may cause memory issue)
try:
del X_train, y_train
del X_test, y_test
print('Clear previously loaded data.')
except:
pass
X_train, y_train, X_test, y_test = load_CIFAR10(cifar10_dir)
# As a sanity check, we print out the size of the training and test data.
print('Training data shape: ', X_train.shape)
print('Training labels shape: ', y_train.shape)
print('Test data shape: ', X_test.shape)
print('Test labels shape: ', y_test.shape)
load_CIFAR10在data_utils.py⾥这个⽂件在作业包中
def load_CIFAR10(ROOT):
""" load all of cifar """
xs =[]
ys =[]
for b in range(1,6):
f = os.path.join(ROOT,'data_batch_%d'%(b,))
X, Y = load_CIFAR_batch(f)
xs.append(X)
ys.append(Y)
Xtr = np.concatenate(xs)
Ytr = np.concatenate(ys)
del X, Y
Xte, Yte = load_CIFAR_batch(os.path.join(ROOT,'test_batch')) return Xtr, Ytr, Xte, Yte
def load_CIFAR_batch(filename):
""" load single batch of cifar """
with open(filename,'rb')as f:
datadict = load_pickle(f)# dict类型
X = datadict['data']# X, ndarray, 像素值
Y = datadict['labels']# Y, list, 标签, 分类
# reshape, ⼀维数组转为矩阵10000⾏3列。每个entries是32x32
# transpose,转置
# astype,复制,同时指定类型
X = X.reshape(10000,3,32,32).transpose(0,2,3,1).astype("float") Y = np.array(Y)
return X, Y
⽤法
def load_pickle(f):
version = platform.python_version_tuple()
if version[0]=='2':
return pickle.load(f)
elif version[0]=='3':
return pickle.load(f, encoding='latin1')
raise ValueError("invalid python version: {}".format(version))
确定python的版本,pickle.load 反序列化为python的数据类型
5看数据集中的样本
这⾥我们将训练集中每⼀类的样本都随机挑出⼏个进⾏展⽰
# Visualize some examples from the dataset.
# We show a few examples of training images from each class.
classes =['plane','car','bird','cat','deer','dog','frog','horse','ship','truck'] num_classes =len(classes)
samples_per_class =7
for y, cls in enumerate(classes):
idxs = np.flatnonzero(y_train == y)
idxs = np.random.choice(idxs, samples_per_class, replace=False)
for i, idx in enumerate(idxs):
plt_idx = i * num_classes + y +1
plt.subplot(samples_per_class, num_classes, plt_idx)
plt.imshow(X_train[idx].astype('uint8'))
plt.axis('off')
if i ==0:
plt.title(cls)
plt.show()
# Subsample the data for more efficient code execution in this exercise
num_training =5000
mask =list(range(num_training))
X_train = X_train[mask]
y_train = y_train[mask]
num_test =500
mask =list(range(num_test))
X_test = X_test[mask]
y_test = y_test[mask]
# Reshape the image data into rows
X_train = np.reshape(X_train,(X_train.shape[0],-1))
X_test = np.reshape(X_test,(X_test.shape[0],-1))
print(X_train.shape, X_test.shape)
5.1numpy.flatnonzero():
该函数输⼊⼀个矩阵,返回扁平化后矩阵中⾮零元素的位置(index)例⼦:
这⾥⼀定要数组,要是列表就会输出空值
5.2 np.random.choice
np.random.choice的
5.3 plt.subplot
subplot(numRows, numCols, plotNum)
import numpy as np
import matplotlib.pyplot as plt
# 分成2x2,占⽤第⼀个,即第⼀⾏第⼀列的⼦图
plt.subplot(2,2,1)
# 分成2x2,占⽤第⼆个,即第⼀⾏第⼆列的⼦图
plt.subplot(2,,22)
# 分成2x1,占⽤第⼆个,即第⼆⾏
plt.subplot(2,1,2)
plt.show()
6创建kNN分类器对象
记住 kNN 分类器不进⾏操作,只是将训练数据进⾏了简单的存储
from cs231n.classifiers import KNearestNeighbor
import pickleclassifier = KNearestNeighbor()
这⾥出现如下问题
在这⾥安装了future包就解决了,推荐使⽤anaconda可以随时改变和配置包
现在我们可以使⽤kNN分类器对测试数据进⾏分类了。我们可以将测试过程分为以下两步:⾸先,我们需要计算测试样本到所有训练样本的距离。
得到距离矩阵后,出离测试样本最近的k个训练样本,选择出现次数最多的类别作为测试样本的类别
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系QQ:729038198,我们将在24小时内删除。
发表评论