这大概是最细的YOLOX中的MosaicAndMixup实现源码分析了吧--688IT编程网

这⼤概是最细的YOLOX中的MosaicAndMixup实现源码分析了吧博客园排版有bug，更好的阅读体验请见

前⾔

看了yolox后发现数据增强是真的nb，但是⾃⼰想如何实现的时候就感觉不太⾏了（不能简洁的实现）。⼜⼀想，数据增强这种trick肯定会⽤到其他⽹络的dataloader⾥⾯啊，所以仔细研究了⼀下代码复现⼀下。

最后附上我⾃⼰封装的mosaic和mixup，不⾃⼰封装到时候现copy别⼈的都不知bug在哪虽然核⼼与原论⽂差不多

Mosaic

源码分析

下⾯根据yolox源码进⾏分析：

yolox想法是先⽣成⼀个Dataset类，然后根据这个类可以进⾏iterater，故写了⼀个pull_item函数。

基于以上，然后可以定义到MosaicDetection类

class MosaicDetection(Dataset):

"""Detection dataset wrapper that performs mixup for normal dataset."""

def __init__(

self, dataset, img_size, mosaic=True, preproc=None,

degrees=10.0, translate=0.1, mosaic_scale=(0.5, 1.5),

mixup_scale=(0.5, 1.5), shear=2.0, perspective=0.0,

enable_mixup=True, mosaic_prob=1.0, mixup_prob=1.0, *args

super().__init__(img_size, mosaic=mosaic)

self._dataset = dataset

self.preproc = preproc

self.degrees = degrees

self.scale = mosaic_scale

self.shear = shear

self.perspective = perspective

self.mixup_scale = mixup_scale

self.mixup_prob = mixup_prob

self.local_rank = get_local_rank()

参数含义就不讲了，关键是self._dataset这个字段，可以看出Mosaic是在原先的Dataset基础上实现的。

也就是说需要的只是重写getitem和len，下⾯开始讲解getitem

第⼀部分图⽚拼接

def __getitem__(self, idx):

able_mosaic and random.random() < saic_prob:

mosaic_labels = []

input_dim = self._dataset.input_dim

input_h, input_w = input_dim[0], input_dim[1]

# yc, xc = s, s # mosaic center x, y

# 画布⼤⼩为input_h,input_w

# 拼接公共点位置

yc = int(random.uniform(0.5 * input_h, 1.5 * input_h))

xc = int(random.uniform(0.5 * input_w, 1.5 * input_w))

# 3 additional image indices

indices = [idx] + [random.randint(0, len(self._dataset) - 1) for _ in range(3)]

for i_mosaic, index in enumerate(indices):

img, _labels, _, img_id = self._dataset.pull_item(index)

# 得到的第⼀张图⽚的原始⼤⼩

h0, w0 = img.shape[:2]

scale = min(1. * input_h / h0, 1. * input_w / w0)

# 放⼤到input size

img = size(

img, (int(w0 * scale), int(h0 * scale)), interpolation=cv2.INTER_LINEAR

)

# generate output mosaic image

(h, w, c) = img.shape[:3]

# ⽣成⼀个新的画布，颜⾊是114

if i_mosaic == 0:

mosaic_img = np.full((input_h * 2, input_w * 2, c), 114, dtype=np.uint8)

# suffix l means large image, while s means small image in mosaic aug.

# 根据图⽚的先后顺序分别放⼊左上、右上、左下、右下四个⽅向。

# 函数返回的是基于画布的新坐标和原图像的坐标（要注意由于0.5-1.5倍，原图像可能会超出画布范围

(l_x1, l_y1, l_x2, l_y2), (s_x1, s_y1, s_x2, s_y2) = get_mosaic_coordinate(

mosaic_img, i_mosaic, xc, yc, w, h, input_h, input_w

)

# 赋值到画布

mosaic_img[l_y1:l_y2, l_x1:l_x2] = img[s_y1:s_y2, s_x1:s_x2]

plt.imshow(mosaic_img)

plt.show()

# 坐标偏移量

padw, padh = l_x1 - s_x1, l_y1 - s_y1

labels = _py()

# Normalized xywh to pixel xyxy format

# 个⼈觉得这个注释意思有问题（可能我理解错了？下⾯细说

# 这是转换到新坐标轴的坐标

if _labels.size > 0:

# 左上⾓坐标

labels[:, 0] = scale * _labels[:, 0] + padw

labels[:, 1] = scale * _labels[:, 1] + padh

# 右下

labels[:, 2] = scale * _labels[:, 2] + padw

labels[:, 3] = scale * _labels[:, 3] + padh

mosaic_labels.append(labels)

plt.imshow(mosaic_img)

plt.show()

⼤概思路是先随机得到四张图⽚，然后创建⼀个⼤⼩为⽹络输⼊两倍的input，随机（0.5-1.5 scale）⽣成⼀个mosaic center（简单理解就是四张图⽚的公共点）。之后按照顺序拼接到左上、右上、左下、右下四个部分。

当⼀张图⽚放⼊画布时，得到x，y的原偏移量（padw，padh），然后计算偏移后的bbox位置。

有个问题是新bbox的坐标，注释写的是xywh转x1 y1 x2 y2，但是个⼈实现的时候发现输⼊是bbox的x1y1x2y2转换能正确框出，有⽆评论区⼤佬说明⼀下。

第⼆部分：图像旋转与剪切

if len(mosaic_labels):

# 将bbox超出画布部分变为画布边缘

mosaic_labels = np.concatenate(mosaic_labels, 0)

np.clip(mosaic_labels[:, 0], 0, 2 * input_w, out=mosaic_labels[:, 0])

np.clip(mosaic_labels[:, 1], 0, 2 * input_h, out=mosaic_labels[:, 1])

np.clip(mosaic_labels[:, 2], 0, 2 * input_w, out=mosaic_labels[:, 2])

np.clip(mosaic_labels[:, 3], 0, 2 * input_h, out=mosaic_labels[:, 3])

# 顺时针旋转degree°，输出新的图像和新的bbox坐标

mosaic_img, mosaic_labels = random_perspective(

mosaic_img,

mosaic_labels,

degrees=self.degrees,

anslate,

scale=self.scale,

shear=self.shear,

perspective=self.perspective,

border=[-input_h // 2, -input_w // 2],

) # border to remove

这⼀部分就⽐较简单了，先是⽤clip函数处理好画布，然后旋转⼀个⾓度，旋转后bbox坐标变化其实可以不⽤关⼼，因为⾓度很⼩物体⼏乎超不出bbox的范围。细究旋转代码可以⾃⼰去看看我不想看了，最后还裁剪成了input size，所以这个最后输出还是input size⽽不是2*input size

Mix up

论⽂mosaic后半部分还增加了mixup（可选，但默认使⽤

# -----------------------------------------------------------------

# CopyPaste: /abs/2012.07177

# -----------------------------------------------------------------

if (

and not len(mosaic_labels) == 0

and random.random() < self.mixup_prob

# 如果mosaic_prob=0.5 mixup_prob=0.5这⾥0.5*0.5是0.25的概率mixup了

mosaic_img, mosaic_labels = self.mixup(mosaic_img, mosaic_labels, self.input_dim)

# 这⾥还增加了其他的预处理

mix_img, padded_labels = self.preproc(mosaic_img, mosaic_labels, self.input_dim)

img_info = (mix_img.shape[1], mix_img.shape[0])

# -----------------------------------------------------------------

# img_info and img_id are not used for training.

# They are also hard to be specified on a mosaic image.

# -----------------------------------------------------------------

return mix_img, padded_labels, img_info, img_id

else:

# 这个else是和mosaic的if对应的，不mosaic则默认只有预处理

self._dataset._input_dim = self.input_dim

img, label, img_info, img_id = self._dataset.pull_item(idx)

img, label = self.preproc(img, label, self.input_dim)

return img, label, img_info, img_id

# mixup函数

def mixup(self, origin_img, origin_labels, input_dim):

jit_factor = random.uniform(*self.mixup_scale)

# 图像是否翻转

FLIP = random.uniform(0, 1) > 0.5

cp_labels = []

# 保证不是背景 load_anno函数不涉及图像读取会更快（coco类

resizedwhile len(cp_labels) == 0:

cp_index = random.randint(0, self.__len__() - 1)

cp_labels = self._dataset.load_anno(cp_index)

# 确定不是背景后再载⼊img

img, cp_labels, _, _ = self._dataset.pull_item(cp_index)

# 创建画布

if len(img.shape) == 3:

cp_img = np.ones((input_dim[0], input_dim[1], 3), dtype=np.uint8) * 114

else:

cp_img = np.ones(input_dim, dtype=np.uint8) * 114

# 计算scale

cp_scale_ratio = min(input_dim[0] / img.shape[0], input_dim[1] / img.shape[1]) # resize

resized_img = size(

img,

(int(img.shape[1] * cp_scale_ratio), int(img.shape[0] * cp_scale_ratio)),

interpolation=cv2.INTER_LINEAR,

)

# 放⼊画布

cp_img[

: int(img.shape[0] * cp_scale_ratio), : int(img.shape[1] * cp_scale_ratio)

] = resized_img

# 画布放⼤jit factor倍

cp_img = size(

cp_img,

(int(cp_img.shape[1] * jit_factor), int(cp_img.shape[0] * jit_factor)),

)

cp_scale_ratio *= jit_factor

if FLIP:

cp_img = cp_img[:, ::-1, :]

# 以上创建好了⼀个可以mix up的图像

# 下⾯开始mix up

# 创建的画布向输⼊的图像上⾯叠加

origin_h, origin_w = cp_img.shape[:2]

target_h, target_w = origin_img.shape[:2]

# 取最⼤⾯积然后全部padding 0

padded_img = np.zeros(

(max(origin_h, target_h), max(origin_w, target_w), 3), dtype=np.uint8

)

# 放⼊新画布（也只有新画布

padded_img[:origin_h, :origin_w] = cp_img

# 随机偏移量

x_offset, y_offset = 0, 0

if padded_img.shape[0] > target_h:

y_offset = random.randint(0, padded_img.shape[0] - target_h - 1)

if padded_img.shape[1] > target_w:

x_offset = random.randint(0, padded_img.shape[1] - target_w - 1)

# 裁剪画布

padded_cropped_img = padded_img[

y_offset: y_offset + target_h, x_offset: x_offset + target_w

]

# 调整scale后画布中图像的bbox坐标

cp_bboxes_origin_np = adjust_box_anns(

cp_labels[:, :4].copy(), cp_scale_ratio, 0, 0, origin_w, origin_h

)

# 是否镜像翻转

if FLIP:

cp_bboxes_origin_np[:, 0::2] = (

origin_w - cp_bboxes_origin_np[:, 0::2][:, ::-1]

)

# 调整裁剪后bbox坐标（以裁剪左上⾓为新的原点

cp_bboxes_transformed_np = cp_bboxes_py()

cp_bboxes_transformed_np[:, 0::2] = np.clip(

cp_bboxes_transformed_np[:, 0::2] - x_offset, 0, target_w

)

cp_bboxes_transformed_np[:, 1::2] = np.clip(

cp_bboxes_transformed_np[:, 1::2] - y_offset, 0, target_h

)

# 通过五个条件判断offset是否合理，下⾯细说

keep_list = box_candidates(cp_bboxes_origin_np.T, cp_bboxes_transformed_np.T, 5)

# 满⾜条件则合并label和image

if keep_list.sum() >= 1.0:

cls_labels = cp_labels[keep_list, 4:5].copy()

box_labels = cp_bboxes_transformed_np[keep_list]

labels = np.hstack((box_labels, cls_labels))

origin_labels = np.vstack((origin_labels, labels))

origin_img = origin_img.astype(np.float32)

origin_img = 0.5 * origin_img + 0.5 * padded_cropped_img.astype(np.float32)

return origin_img.astype(np.uint8), origin_labels

总体来说⽐较好理解，因为坐标变换⽅法和mosaic相同，⽽最头疼的就是坐标变换了。

⾸先随机出⼀个⾮背景图像（必定有bbox的图像），然后缩放到input size，再放⼊input size（⽐如650*640）⼤⼩的画布。然后画布整体放⼤到jit facotr倍，在原图和新图中寻最⼤的画布，在⼤画布中随机出裁剪偏移量，裁剪，检查没问题后mix up即可。

⼤致流程如下（省略了寻最⼤的画布过程）：

下⾯讲检查函数box_candidates：

def box_candidates(box1, box2, wh_thr=2, ar_thr=20, area_thr=0.2):

# box1(4,n), box2(4,n)

# Compute candidate boxes which include follwing 5 things:

# box1 before augment, box2 after augment, wh_thr (pixels), aspect_ratio_thr, area_ratio

w1, h1 = box1[2] - box1[0], box1[3] - box1[1]

w2, h2 = box2[2] - box2[0], box2[3] - box2[1]

ar = np.maximum(w2 / (h2 + 1e-16), h2 / (w2 + 1e-16)) # aspect ratio

return (

(w2 > wh_thr)

& (h2 > wh_thr)

& (w2 * h2 / (w1 * h1 + 1e-16) > area_thr)

& (ar < ar_thr)

) # candidates

就是将偏移后的box和偏移前的box进⾏⽐较，四项指标分别是偏移后的box宽度，⾼度，⾯积，box长宽⽐

注释⾥写的五个实现只有四个

{% image ,alt='最终结果，中间的那两个是mix up',height=60vh %}

⾃⽤代码

因为yolox等⾥⾯肯定是⽤了各种东西对dataloader加速⽐如pycoco类封装（这个包不是很懂）、preload等，⼀时半会也看不完。只好剥离

了，loader的效率估计不会那么⾼以后变成⼤⽜了再加吧

# -*- coding:utf-8 -*-

# @Author : Dummerfu

# @Contact : github/dummerchen

# @Time : 2021/9/25 14:06

import math

from draw_box_utli import draw_box

from torch.utils.data import Dataset

from VocDataset import VocDataSet

import matplotlib as mpl

import random

import cv2

import numpy as np

from matplotlib import pyplot as plt

def get_mosaic_coordinate(mosaic_image, mosaic_index, xc, yc, w, h, input_h, input_w): # TODO update doc

# index0 to top left part of image

if mosaic_index == 0:

x1, y1, x2, y2 = max(xc - w, 0), max(yc - h, 0), xc, yc

small_coord = w - (x2 - x1), h - (y2 - y1), w, h

# index1 to top right part of image

elif mosaic_index == 1:

x1, y1, x2, y2 = xc, max(yc - h, 0), min(xc + w, input_w * 2), yc

small_coord = 0, h - (y2 - y1), min(w, x2 - x1), h

# index2 to bottom left part of image

elif mosaic_index == 2:

x1, y1, x2, y2 = max(xc - w, 0), yc, xc, min(input_h * 2, yc + h)

small_coord = w - (x2 - x1), 0, w, min(y2 - y1, h)

# index2 to bottom right part of image

elif mosaic_index == 3:

x1, y1, x2, y2 = xc, yc, min(xc + w, input_w * 2), min(input_h * 2, yc + h) # noqa small_coord = 0, 0, min(w, x2 - x1), min(y2 - y1, h)

return (x1, y1, x2, y2), small_coord

def random_perspective(

img,

targets=(),

degrees=10,

translate=0.1,

scale=0.1,

shear=10,

perspective=0.0,

border=(0, 0),

# targets = [cls, xyxy]

height = img.shape[0] + border[0] * 2 # shape(h,w,c)

width = img.shape[1] + border[1] * 2

# Center

C = np.eye(3)

C[0, 2] = -img.shape[1] / 2 # x translation (pixels)

C[1, 2] = -img.shape[0] / 2 # y translation (pixels)

# Rotation and Scale

R = np.eye(3)

a = random.uniform(-degrees, degrees)

# a += random.choice([-180, -90, 0, 90]) # add 90deg rotations to small rotations

s = random.uniform(scale[0], scale[1])

# s = 2 ** random.uniform(-scale, scale)

R[:2] = RotationMatrix2D(angle=a, center=(0, 0), scale=s)

# Shear

S = np.eye(3)

S[0, 1] = math.tan(random.uniform(-shear, shear) * math.pi / 180) # x shear (deg)

S[1, 0] = math.tan(random.uniform(-shear, shear) * math.pi / 180) # y shear (deg)

# Translation

T = np.eye(3)

T[0, 2] = (

random.uniform(0.5 - translate, 0.5 + translate) * width

) # x translation (pixels)

T[1, 2] = (

random.uniform(0.5 - translate, 0.5 + translate) * height

) # y translation (pixels)

# Combined rotation matrix

M = T @ S @ R @ C # order of operations (right to left) is IMPORTANT

>>>>>##

# For Aug out of Mosaic

# s = 1.

# M = np.eye(3)

>>>>>##

if (border[0] != 0) or (border[1] != 0) or (M != np.eye(3)).any(): # image changed

if perspective:

img = cv2.warpPerspective(

wiki的搭建

« 上一篇

PHP将身份证正反面两张照片合成一张图片的代码

688IT编程网

这大概是最细的YOLOX中的MosaicAndMixup实现源码分析了吧

发表评论

推荐文章

java正则表达式选择题

一种基于正则表达式的DBC文件解析及报文分析方法[发明专利]

工龄小数点提取

非零金额正则表达式

提取文本中数字的函数

热门文章

利用正则表达式实现文本数据提取与处理

正则表达式零宽断言详解

文本匹配规则

excel中使用正则

1-31正则表达式

anki之高级筛选

BUAA_OO_2021_第一单元总结

insert语句递增写法

sublime text 3在行前插入递增数字序号的方法

字符串只允许数字和英文的正则

powerbuilder 正则表达式

Shell脚本编写的高级技巧利用正则表达式进行字符串匹配

JAVA正则表达式的三种模式:贪婪,勉强和占有的讨论

go regexp匹配规则

oracle regexp_substr 实现原理

基本的元字符回溯引用和前后查匹配模式

elasticsearch query dsl正则

oracle sql正则表达式

GA-设置目标

仅匹配全角片假名的正则表达式

最新文章

java正则表达式选择题

工龄小数点提取

非零金额正则表达式

提取文本中数字的函数

vue数字相加小数点变长-概述说明以及解释

vue validate 正则验证小数长度

标签列表

688IT编程网

这大概是最细的YOLOX中的MosaicAndMixup实现源码分析了吧

发表评论

推荐文章

java正则表达式 选择题

一种基于正则表达式的DBC文件解析及报文分析方法[发明专利]

工龄小数点提取

非零金额 正则表达式

提取文本中数字的函数

热门文章

利用正则表达式实现文本数据提取与处理

正则表达式零宽断言详解

文本匹配规则

excel中使用正则

1-31正则表达式

anki之高级筛选

BUAA_OO_2021_第一单元总结

insert语句递增写法

sublime text 3在行前插入递增数字序号的方法

字符串只允许数字和英文的正则

powerbuilder 正则表达式

Shell脚本编写的高级技巧利用正则表达式进行字符串匹配

JAVA正则表达式的三种模式:贪婪,勉强和占有的讨论

go regexp匹配规则

oracle regexp_substr 实现原理

基本的元字符 回溯引用和前后查 匹配模式

elasticsearch query dsl正则

oracle sql正则表达式

GA-设置目标

仅匹配全角片假名的正则表达式

最新文章

java正则表达式 选择题

工龄小数点提取

非零金额 正则表达式

提取文本中数字的函数

vue数字相加小数点变长-概述说明以及解释

vue validate 正则验证小数长度

标签列表

java正则表达式选择题

非零金额正则表达式

基本的元字符回溯引用和前后查匹配模式

java正则表达式选择题

非零金额正则表达式