yolov3训练自己的数据(coco数据制作篇)--688IT编程网

yolov3训练⾃⼰的数据（coco数据制作篇）

在GPU版本配置ok之后下⾯就要看制作⾃⼰的数据了。

我们的dataset是coco格式的，如果是voc格式的其他教程⼤多都是voc的吧。记录下coco格式的数据制作。

因为yolov3最终要将图⽚位置信息与图⽚的label信息保存在txt⽂件⾥的，所以现在⾸先要做的就是解析coco格式的json⽂件啦。（coco ⽬标检测数据集标注⽬标信息采⽤的是数据格式是json，其内容本质是⼀种字典结构，字典堆栈和列表信息内容维护。coco⾥⾯的id和类名字对应：总共80类，但id号到90）

⽬录

⼀ check原数据集的class信息跟bbox信息

see⼀下coco json⾥⾯的形式，因为只做ob，所以只需要提取bbox就可以了，下⾯要看⼀下bbox对应的4个值到底是什么意思啊。验证后即为左上⾓的坐标+w，h

以这张图⽚的4个bbox计算中⼼点100*100的矩形，如下：（我的本本没有装cv2，⽤plt画的，代码如下）

import json

import matplotlib.pyplot as plt

import matplotlib.image as mpimg

data = json.load(open('instances_train2018.json','r')) #json⽂件

image = mpimg.imread('/Users/zhangzhenghao/Desktop/000001.jpg')

img_id =2018006000001 #我⽤来测试的id

for ann in data['annotations']:

if ann['image_id'] == img_id:

x = ann['bbox'][0]

y = ann['bbox'][1]

w = ann['bbox'][2]

h = ann['bbox'][3]

x,y,w,h = int(x),int(y),int(w),int(h)

print(x,y,w,h)

fig = plt.figure()

ax = fig.add_subplot(111)

rect = plt.Rectangle((x,y),w,h)

ax.add_patch(rect)

plt.imshow(image)

plt.show()

确定了bbox代表的信息，那么下⼀步就是要将annotation的信息转化为txt⽂件啦。⼆ json解析

最终的label信息要跟图⽚的名字⼀样的，以txt结尾，⾥⾯包含五个数据。

类别id，中⼼化的x，y（中⼼点的坐标），w，h

下⾯是我的coco解析json并⽣成相应的txt⽂件的py代码：

from __future__ import print_function

import os, sys, zipfile

鸟哥评价swooleimport json

def convert(size, box):

dw = 1./(size[0])

dh = 1./(size[1])

x = box[0] + box[2] / 2.0

y = box[1] + box[3] / 2.0

w = box[2]

h = box[3]

x = x*dw

w = w*dw

y = y*dh

h = h*dh

return (x,y,w,h)

在线有道翻译

json_file='pascal_train2012_cococate.json' # # Object Instance 类型的标注

data=json.load(open(json_file,'r'))

ana_txt_save_path = "/Users/zhangzhenghao/Desktop/dataset/new" #保存的路径

if not ists(ana_txt_save_path):

os.makedirs(ana_txt_save_path)

for img in data['images']:

#print(img["file_name"])

filename = img["file_name"]

img_width = img["width"]

img_height = img["height"]

#print(img["height"])

#print(img["width"])

img_id = img["id"]

ana_txt_name = filename.split(".")[0] + ".txt"#对应的txt名字，与jpg⼀致

print(ana_txt_name)

f_txt = open(os.path.join(ana_txt_save_path, ana_txt_name), 'w')

for ann in data['annotations']:

if ann['image_id']==img_id:

python解析json文件

#annotation.append(ann)

#print(ann["category_id"], ann["bbox"])

box = convert((img_width,img_height), ann["bbox"])

f_txt.write("%s %s %s %s %s\n"%(ann["category_id"], box[0], box[1], box[2], box[3])) f_txt.close()

结果如下：

三⽣成图⽚对应的绝对路径的与

yolo要求将所有的图⽚的绝对路径放在⼀个txt⽂件中，所以下⼀步就需要将训练集图⽚的绝对路径转换到⼀个txt⽂件中。

我们有⼀部分数据是.tiff格式的，我不知道能不能⽤因为官⽅提供的demo都是jpg⽂件格式的，于是进⾏了⼀波convert，代码如下：

import os

from PIL import Image

yourpath = '/media/pengjk/30213d25-fae8-4100-9d8b-9aed2bb5a8df/myimages'

for root, dirs, files in os.walk(yourpath, topdown=False):

for name in files:

print(os.path.join(root, name))

if os.path.splitext(os.path.join(root, name))[1].lower() == ".tiff":

if os.path.isfile(os.path.splitext(os.path.join(root, name))[0] + ".jpg"):

print "A jpeg file already exists for %s" % name

# If a jpeg is *NOT* present, create one from the tiff.

else:

outfile = os.path.splitext(os.path.join(root, name))[0] + ".jpg"

try:

im = Image.open(os.path.join(root, name))

print "Generating jpeg for %s" % name

im.thumbnail(im.size)

im.save(outfile, "JPEG", quality=100)

except Exception, e:

print e

然后下⾯是⽣成绝对路径的代码：

# -*- coding: utf-8 -*-

import time

import os

import shutil

import string

def readFilename(path, allfile):

filelist = os.listdir(path)

for filename in filelist:

filepath = os.path.join(path, filename)

if os.path.isdir(filepath):

readFilename(filepath, allfile)

else:

allfile.append(filepath)

return allfile

if __name__ == '__main__':

path1 = "/Users/zhangzhenghao/Desktop/test/val_coco" //图⽚的⽂件夹

allfile1 = []

allfile1 = readFilename(path1, allfile1)

allname1 = []

txtpath = "/Users/zhangzhenghao/Desktop/test/val_coco" + ""//放⼊的txt⽂件

for name in allfile1:

file_cls = name.split("/")[-1].split(".")[-1]

if file_cls == 'txt':

李峋爱心代码复制with open(txtpath, 'a+') as fp:

fp.write("".join(name) + "\n")

这样就ok啦。

然后在这个过程中⼜有⼀点⼩波折，就是我发现我们的数据集⼤概有18万张图⽚来train，但是实际⽣成的.txt⽂件只有89000多条，于是只能将多余的筛出去啦。。⽤的⽐较笨的⽅法，筛了1个多⼩时，勿喷代码：（这⾥⽤到了python中的shutil包，可以直接执⾏命令啦，为了⽅便将这89000万张图⽚重新建了个⽂件夹复制过来了。）

# -*- coding: utf-8 -*-

import time

import os

import shutil

import string

def readFilename(path, allfile):

filelist = os.listdir(path)

for filename in filelist:

filepath = os.path.join(path, filename)

if os.path.isdir(filepath):

readFilename(filepath, allfile)

else:

网时代教育培训机构在哪里

allfile.append(filepath)

return allfile

if __name__ == '__main__':

path1 = "/Users/zhangzhenghao/Desktop/test/val_data"

new_path ="/Users/zhangzhenghao/Desktop/test/new"

allfile1 = []结构体数组整体赋值

allfile1 = readFilename(path1, allfile1)

allname1 = []

for name in allfile1:

file_cls = name.split("/")[-1].split(".")[-1]

file_nas = name.split("/")[-1].split(".")[0]

if file_cls == 'jpg':

label = open('/Users/zhangzhenghao/Desktop/test/', 'r')

for line in label:

label_nas = line.split("/")[-1].split(".")[0]

if file_nas == label_nas:

然后再⽣成⼀遍路径代码就能⼀⼀对应了。

然后做的就是将对应的train的jpg和txt val的jpg和txt放在同⼀个⽂件夹下⾯。（有⼈说可以分开放，但yolov好像有点问题，稳妥起见我们就直接放在⼀起吧）。

688IT编程网

yolov3训练自己的数据(coco数据制作篇)

发表评论

推荐文章

一种基于正则表达式的DBC文件解析及报文分析方法[发明专利]

工龄小数点提取

非零金额正则表达式

提取文本中数字的函数

vue数字相加小数点变长-概述说明以及解释

热门文章

正则表达式零宽断言详解

文本匹配规则

excel中使用正则

1-31正则表达式

anki之高级筛选

BUAA_OO_2021_第一单元总结

insert语句递增写法

sublime text 3在行前插入递增数字序号的方法

字符串只允许数字和英文的正则

powerbuilder 正则表达式

Shell脚本编写的高级技巧利用正则表达式进行字符串匹配

JAVA正则表达式的三种模式:贪婪,勉强和占有的讨论

go regexp匹配规则

oracle regexp_substr 实现原理

基本的元字符回溯引用和前后查匹配模式

elasticsearch query dsl正则

oracle sql正则表达式

GA-设置目标

仅匹配全角片假名的正则表达式

beautifulsoupfind_all怎样把带有某种属性的标签选出而不含该属性的标 ...

最新文章

工龄小数点提取

非零金额正则表达式

提取文本中数字的函数

vue数字相加小数点变长-概述说明以及解释

vue validate 正则验证小数长度

0.5的倍数的正则表达式

标签列表

688IT编程网

yolov3训练自己的数据(coco数据制作篇)

发表评论

推荐文章

一种基于正则表达式的DBC文件解析及报文分析方法[发明专利]

工龄小数点提取

非零金额 正则表达式

提取文本中数字的函数

vue数字相加小数点变长-概述说明以及解释

热门文章

正则表达式零宽断言详解

文本匹配规则

excel中使用正则

1-31正则表达式

anki之高级筛选

BUAA_OO_2021_第一单元总结

insert语句递增写法

sublime text 3在行前插入递增数字序号的方法

字符串只允许数字和英文的正则

powerbuilder 正则表达式

Shell脚本编写的高级技巧利用正则表达式进行字符串匹配

JAVA正则表达式的三种模式:贪婪,勉强和占有的讨论

go regexp匹配规则

oracle regexp_substr 实现原理

基本的元字符 回溯引用和前后查 匹配模式

elasticsearch query dsl正则

oracle sql正则表达式

GA-设置目标

仅匹配全角片假名的正则表达式

beautifulsoupfind_all怎样把带有某种属性的标签选出而不含该属性的标 ...

最新文章

工龄小数点提取

非零金额 正则表达式

提取文本中数字的函数

vue数字相加小数点变长-概述说明以及解释

vue validate 正则验证小数长度

0.5的倍数的正则表达式

标签列表

非零金额正则表达式

基本的元字符回溯引用和前后查匹配模式

非零金额正则表达式