统计XML⽂件内标签的种类和其数量
统计XML⽂件内标签的种类和其数量
对于⾃⼰(或师弟师妹们(⼿动狗头))标注的数据集,需在标注完成后需要对标注好的XML⽂件校验,⽐如看看标签名有没有写错,都有啥标签名。下⾯是代码,只需将SrcDir换成需要统计的xml的⽂件夹即可。
import os
from tqdm import tqdm
import xml.dom.minidom
def ReadXml(FilePath):
if ists(FilePath)is False:
return None
dom = xml.dom.minidom.parse(FilePath)
root_ = dom.documentElement
object_ = root_.getElementsByTagName('object')
info =[]
for object_1 in object_:
name = ElementsByTagName("name")[0].firstChild.data
bndbox = ElementsByTagName("bndbox")[0]
xmin =ElementsByTagName("xmin")[0].firstChild.data)
ymin =ElementsByTagName("ymin")[0].firstChild.data)
xmax =ElementsByTagName("xmax")[0].firstChild.data)
ymax =ElementsByTagName("ymax")[0].firstChild.data)
info.append([xmin, ymin, xmax, ymax, name])
return info
def CountLabelKind(Path):
LabelDict ={}
print("Star to count ")
for root, dirs, files in os.walk(Path):
for file in tqdm(files):
if file[-1]=='l':
Infos = ReadXml(root +"\\"+file)
for Info in Infos:
if Info[-1]not in LabelDict.keys():
LabelDict[Info[-1]]=1
else:
LabelDict[Info[-1]]+=1
return dict(sorted(LabelDict.items(), key=lambda x: x[0]))
if __name__ =='__main__':
SrcDir =r"G:\Temp\Temp5"
LabelDict = CountLabelKind(SrcDir)
KeyDict =sorted(LabelDict)
print("%d kind labels and %d labels in total:"%(len(KeyDict),sum(LabelDict.values())))
print(KeyDict)
print("Label Name and it's number:")
for key in KeyDict:
print("%s\t: %d"%(key, LabelDict[key]))
统计的效果如下
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系QQ:729038198,我们将在24小时内删除。
发表评论