...CVPR2009 - Learning to detect unseen object classes--688IT编程网

Learning To Detect Unseen Object Classes by Between-Class Attribute Transfer Christoph H.Lampert Hannes Nickisch Stefan Harmeling

Max Planck Institute for Biological Cybernetics,T¨u bingen,Germany

{firstname.lastname}@tuebingen.mpg.de

Abstract

We study the problem of object classiﬁcation when train-ing and test classes are training examples of the target classes are available.This setup has hardly been studied in computer vision research,but it is the rule rather than the exception,because the world contains tens of thou-sands of different object classes and for only a very few of them image,collections have been formed and annotated with suitable class labels.

In this paper,we tackle the problem by introducing attribute-based classiﬁcation.It performs object detection based on a human-speciﬁed high-level description of the target objects instead of training images.The description consists of arbitrary semantic attributes,like shape,color or even geographic information.Because such properties transcend the speciﬁc learning task at hand,they can be pre-lea

from image datasets unrelated to the cur-rent task.Afterwards,new classes can be detected based on their attribute representation,without the need for a new training phase.In order to evaluate our method and to facil-itate research in this area,we have assembled a new large-scale dataset,“Animals with Attributes”,of over30,000an-imal images that match the50classes in Osherson’s clas-sic table of how strongly humans associate85semantic at-tributes with animal classes.Our experiments show that by using an attribute layer it is indeed possible to build a learning object detection system that does not require any training images of the target classes.

1.Introduction

Learning-based methods for recognizing objects in natu-ral images have made large progress over the last years.For speciﬁc object classes,in particular faces and vehicles,reli-able and efﬁcient detectors are available,based on the com-bination of powerful low-level SIFT or HoG, with modern machine learning boosting or support vector machines.However,in order to achieve good classiﬁcation accuracy,these systems require a lot of man-ually labeled training data,typically hundreds or thousands of example images for each class to be learned.

It has been estimated that humans distinguish between at least30,000relevant object classes[3].Training con-ventional object detectors for all these would require mil-otter

black:yes

white:no

brown:yes

stripes:no

water:yes

eats fish:yes

polar bear

black:no

white:yes

brown:no

stripes:no

water:yes

eats fish:yes

zebra

black:yes

white:yes

brown:no

stripes:yes

water:no

eats fish:

Figure1.A description by high-level attributes allows the transfer of knowledge between object categories:after learning the visual appearance of attributes from any classes with training examples, we can detect also object classes that do not have any training images,based on which attribute description a test imageﬁts best. lions of well-labeled training images and is likely out of reach for years to come.Therefore,numerous techniques for reducing the number of necessary training images have been developed,some of which we will discuss in Section3. However,all of these techniques still require at least some labeled training examples to detect future object instances.

Human learning is different:although humans can learn and abstract well from examples,they are also capable of detecting completely unseen classes when provided with a high-level description. E.g.,from the phrase“eight-sided red trafﬁc sign with white writing”,we will be able to detect stop signs,and when looking for“large gray animals with long trunks”,we will reliably identify elephants.We build on this paradigm and propose a system that is able to detect objects from a list of high-level attributes.The attributes serve as an intermediate layer in a classiﬁer cascade and they enable the system to detect object classes,for which it had not seen a single training example.

Clearly,a large number of possible attributes exist and collecting separate training material to learn an ordinary classiﬁer for each of them would be as tedious as for all object classes.But,instead of creating a separate training

set for each attribute,we can exploit the fact that meaning-ful high-level concepts transcend class boundaries.To learn such attributes,we can therefore make use of existing train-ing data by merging images of several object classes.To ,the attribute striped,we can use images of ze-bras,bees and tigers.For the attribute yellow,zebras would not be included,but bees and tigers would still prove use-ful,possibly together with canary birds.It is this possibility to obtain knowledge about attributes from different object classes,and,vice versa,the fact that each attribute can be used for the detection of many object classes that makes our proposed learning method statistically efﬁcient.

2.Information Transfer by Attribute Sharing

We begin by formalizing the problem setting and our intuition from the previous section that the use of attributes allows us to transfer information between object classes. Weﬁrst deﬁne the problem of our interest:

Learning with Disjoint Training and Test Classes:

Let(x1,l1),...,(x n,l n)⊂X×Y be training samples where X is an arbitrary feature space and Y={y1,...,y K}consists of K discrete classes.The task is to learn a classiﬁer f:X→Z for a label set Z={z1,...,z L}that is disjoint from Y1.

Clearly,this task cannot be solved by an ordinary multi-class classiﬁer.Figure2(a)provides a graphical illustra-tion of the problem:typical classiﬁers learn one param-eter vector(or other representation)αk for each training class y1,...,y K.Because the classes z1,...,z L were not present during the training step,no parameter vector can be derived for them,and it is impossible to make predictions about these classes for future samples.

In order to make predictions about classes,for which no training data is available,we need to introduce a cou-pling between classes in Y and Z.Since no training data for the unobserved classes is available,this coupling cannot be learned from samples,but has to be inserted into the sys-tem by human effort.This introduces two severe constraints on what kind of coupling mechanisms are feasible:1)the amount of human effort to specify new classes should be minimal,because otherwise collecting and labeling training samples would be a simpler solution;2)coupling data that requires only common knowledge is preferable over special-ized expert knowledge,because the latter is often difﬁcult and expensive to obtain.

2.1.Attribute-Based Classiﬁcation:

We achieve both goals by introducing a small set of high-level semantic per-class attributes.These can lor 1The conditions that Y and Z are disjoint is included only to clarify the later presentation.The problem described also occurs if just Z⊆Y.and shape for arbitrary objects,or the natural habitat for animals.Humans are typically able to provide good prior knowledge about such attributes,and it is therefore possible to collect the necessary information without a lot of over-head.Because the attributes are assigned on a per-class ba-sis instead of a per-image basis,the manual effort to add a new object class is kept minimal.

For the situation where attribute data of this kind of available,we introduce attribute-based classiﬁcation: Attribute-Based Classiﬁcation:

Given the situation of learning with disjoint training and test classes.If for each class z∈Z and y∈Y an attribute representation a∈A is available,then we can learn a non-trivial classiﬁerα:X→Z by transferring information between Y and Z through A.

In the rest of this paper,we will demonstrate that attribute-based classiﬁcation is indeed a solution to the problem of learning with disjoint training and test classes, and how it can be practically used for o

bject classiﬁcation. For this,we introduce and compare two generic methods to integrate attributes into multi-class classiﬁcation: Direct attribute prediction(DAP),illustrated by Fig-ure2(b),uses an in between layer of attribute variables to decouple the images from the layer of labels.During training,the output class label of each sample induces a deterministic labeling of the attribute layer.Consequently, any supervised learning method can be used to learn per-attribute parametersβm.At test time,these allow the pre-diction of attribute values for each test sample,from which the test class label are inferred.Note that the classes during testing can differ from the classes used for training,as long as the coupling attribute layer is determined in a way that does not require a training phase.

Indirect attribute prediction(IAP),depicted in Fig-ure2(c),also uses the attributes to transfer knowledge be-tween classes,but the attributes form a connecting layer be-tween two layers of labels,one for classes that are known at training time and one for classes that are not.The training phase of IAP is ordinary multi-class classiﬁcation.At test time,the predictions for all training classes induce a label-ing of the attribute layer,from which a labeling over the test classes can be inferred.

The major difference between both approaches lies in the relationship between training classes and test classes.Di-rectly learning the attributes results in a network where all classes are treated equally.

When class labels are inferred at test time,the decision for all classes are based only on the attribute layer.We can expect it therefore to also handle the situation where training and test classes are not disjoint. In contrast,when predicting the attribute values indirectly, the training classes occur also a test time as an intermediate

Figure 2.Graphical representation of the proposed across-class learning task:dark gray nodes are always observed,light gray nodes are observed only during training.White nodes are never observed but must be inferred.An ordinary,ﬂat,multi-class classiﬁer (left)learns one parameter αk for each training class.It cannot generalize to classes (z l )l =1...,L that are not part of the training set.In an attribute-based classiﬁer (middle)with ﬁxed class–attribute relations (thick lines),training labels (y k )k =1,...,K imply training values for the attributes (a m )m =1,...,M ,from which parameters βm are learne

d.At test time,attribute values can directly be inferred,and these imply output class label even for previously unseen classes.A multi-class based attribute classiﬁer (right)combined both ideas:multi-class parameters αk are learned for each training class.At test time,the posterior distribution of the training class labels induces a distribution over the labels of unseen classes by means of the class–attribute relationship.

feature layer.On the one hand,this can introduce a bias,if training classes are also potential output classes during testing.On the other hand,one can argue that deriving the attribute layer from the label layer instead of from the sam-ples will act as regularization step that creates only sensible attribute combinations and therefore makes the system more robust.In the following,we will develop implementations for both methods and benchmark their performance.

2.2.Implementation

Both cascaded classiﬁcation methods,DAP and IAP,can

in principle be implemented by combining a supervised classiﬁer or regressor for the image–attribute or image–class prediction with a parameter free inference method to channel the information through the attribute layer.In the following,we use a probabilistic model that reﬂects the graphical structures o

f Figures 2(b)and 2(c).For simplic-ity,we assume that all attributes have binary values such

that the attribute representation a y =(a y 1,...,a y

m )for any training class y are ﬁxed-length binary vectors.Continuous attributes can in principle be handled in the same way by using regression instead of classiﬁcation.

For DAP,we start by learning probabilistic classiﬁers for each attribute a m .We use all images from all training classes as training samples with their label determined by the entry of the attribute vector corresponding to the sam-ple’s label,i.e .a sample of class y is assigned the binary label a y m .The trained classiﬁers provide us with estimates of p (a m |x ),from which we form a model for the complete

image–attribute layer as p (a |x )= M

m =1p (a m |x ).At test time,we assume that every class z induces its attribute vec-tor a z in a deterministic way,i.e .p (a |z )= a =a z ,mak-ing use of Iverson’s bracket notation: P =1if the con-

dition P is true and it is 0otherwise [19].Applying Bayes’

rule we obtain p (z |a )=p (z )p (a z ) a =a z

as representation of the attribute–class layer.Combining both layers,we can calculate the posterior of a test class given an image:p (z |x )=

a ∈{0,1}

M p (z |a )p (a |x )=p (z )p (a )M

m =1

p (a z m |x ).(1)In the absence of more speciﬁc knowledge,we assume iden-tical class priors,which allows us to ignore the factor p (z )in the following.For the factor p (a )we assume a facto-rial distribution p (a )= M

m =1p (a m ),using the empirical

means p (a m )=1K K k =1a y k

m over the training classes as attribute priors.2As decision rule f :X →Z that assigns the best output class from all test classes z 1,...,z L to a test sample x ,we use MAP prediction:

f (x )=argmax l =1,...,L M m =1p (a z l

m |x )p (a z l

m )

.(2)

In order to implement IAP,we only modify the image–attribute stage:as ﬁrst step,we learn a probabilistic multi-class classiﬁer estimating p (y k |x )for all training classes y 1,...,y K .Again assuming a deterministic dependence between attributes and classes,we set p (a m |y )= a m =a y m .The combination of both steps yields

p (a m |x )=

k =1

p (a m |y k )p (y k |x ),(3)

so inferring the attribute posterior probabilities p (a m |x )re-quires only a matrix-vector multiplication.Afterwards,we

2In

practice,the prior p (a )is not crucial to the procedure and setting p (a m )=12

yields comparable results.

continue in the same way as in for DAP,classifying test samples using Equation(2).

3.Connections to Previous Work

Multi-layer or cascaded classiﬁers have a long tradition in pattern recognition and computer vision:multi-layer per-ceptrons[29],decision trees[5],mixtures of experts[17] and boosting[14]are prominent examples of classiﬁca-tion systems built as feed-forward architectures with several stages.Multi-class classiﬁers are also often constructed as layers of binary decisions,from which theﬁnal output is [7,28].These methods differ in their training methodologies,but they share the goal of decomposing a difﬁcult classiﬁcation problem into a collection of simpler ones.Because their emphasis lies on the classiﬁcation per-formance in a fully supervised scenario,the methods are not capable of generalizing across class boundaries.

Especially in the area of computer vision,multi-layered classiﬁcation systems have been constructed,

in which inter-mediate layers have interpretable properties:artiﬁcial neu-ral networks or deep belief networks have been shown to learn interpretableﬁlters,but these are typically restricted to low-level properties like edge and corner detectors[27]. Popular local feature descriptors,such as SIFT[21]or HoG[6],can be seen as hand-crafted stages in a feed-forward architecture that transform an image from the pixel domain into a representation invariant to non-informative image variations.Similarly,image segmentation has been proposed as an unsupervised method to extract contours that are discriminative for object classes[37].Such pre-processing steps are generic in the sense that they still allow the subsequent detection of arbitrary object classes.How-ever,the basic elements,local image descriptors or seg-ments shapes,alone are not reliable enough indicators of generic visual object classes,unless they are used as input to a subsequent statistical learning step.

On a higher level,pictorial structures[13],the constel-lation model[10]and recent discriminatively trained de-formable part models[9]are examples of the many methods that recognize objects in images by detecting discriminative parts.In principle,humans can give descriptions of object classes in terms of such arms or wheels.How-ever,it is a difﬁcult problem to build a system that learns to detect exactly the parts described.Instead,the identiﬁ-cation of parts is integrated into the training of the model, which often reduces the parts to co-occurrence patterns of local feature points,

not to units with a semantic meaning. In general,parts learned this way do generalize across class boundaries.

3.1.Sharing Information between Classes

The aspect of sharing information between classes has also been recognized as an interestingﬁeld before.A com-mon idea is to construct multi-class classiﬁers in a cascaded way.By making similar classes share large parts of their decision paths,fewer classiﬁcation functions need to be learned,thereby increasing the system’s performance[26]. Similarly,one can reduce the number of feature calculations by actively selecting low-level features that help discrimina-tion for many classes simultaneously[33].Combinations of both approaches are also possible[39].

In contrast,inter-class transfer does not aim at higher speed,but at better generalization performance,typically for object classes with only few available training instances. From known object classes,one infers prior distributions over the expected intra-class variance in terms of distortions [22]or shapes and appearances[20].Alternatively,features that are known to be discriminative for some classes can be reused and adapted to support the detection of new classes [1].To our knowledge,no previous approach allows the direct incorporation of human prior knowledge.

Also,all methods require at least some training examples and cannot handle completely new object classes.

A noticable exception is[8]that uses high-level at-tributes to learn descriptions of object.Like our approach, this opens the possilibity to generalize between categories.

3.2.Learning Semantic Attributes

A different line of relevant research occurring as one building block for attribute-based classiﬁcation is the learn-ing of high-level semantic attributes from images.Prior work in the area of computer vision has mainly stud-ied elementary properties like colors and geometric pat-terns[11,36,38],achieving high accuracy by develop-ing task-speciﬁc features and representations.In theﬁeld of multimedia retrieval,the annual TRECVID contest[32] contains a subtask of high-level feature extraction.It has stimulated a lot of research in the detection of semantic con-cepts,including the categorization of scene ut-door,urban,and high-level sports.Typical sys-tems in this area combine many feature representations and, because they were designed for retrieval scenarios,they aim at high precision for low recall levels[34,40].

Our own task of attribute learning targets a similar prob-lem,but ourﬁnal goal is not the prediction of f

ew individual attributes.Instead,we want to infer class labels by combin-ing the predictions of many attributes.Therefore,we are relatively robust to prediction errors on the level of individ-ual attributes,and we will rely on generic classiﬁers and standard image features instead of specialized setups.

In contrast to computer science,a lot of work in cog-nitive science has been dedicated to studying the relations between object recognition and attributes.Typical ques-tions in theﬁeld are how human judgements are inﬂuenced by characteristic object attributes[23,31].A related line of research studies how the human performance in object

b l a

c k w h i t e b l u e b r o w n g r a y o r a n g e r e

d y

e l l o w p a t c h e s s p o t s s t r i p e s

f u r r y h a i r l e s s t o u

h s k

i n b i g s m a l l b u l b o u s l e a n f l i p p e r s h a n d s h o o v e s p a d s p a w s l o n g l e g l o n g n e c k t a i l c h e w t e e t h m e a t t e e t h b u c k t e e t h s t r a i n t e e t h h o r n s c l a w s t u s k s

zebra

giant panda

deer bobcat

pig lion mouse polar bear

collie walrus raccoon

cow dolphin

Class–attribute matrices from [24,18].The responses of persons were averaged to determine the real-valued sociation strength between attributes and classes.The darker the boxes,the less is the att

ribute associated with the class.Binary attributes are obtained by thresholding at the overall matrix mean.

detection tasks depends on the presence or absence of ob-ject properties and contextual cues [16].Since one of our goals is to integrate human knowledge into a computer vi-sion task,we would like to beneﬁt from the prior work in this ﬁeld,at least as a source of high quality data that,so far,cannot be obtained by an automatic process.In the follow-ing section,we describe a new dataset of animal images that allows us to make use of existing class-attribute association data,which was collected from cognitive science research.

4.The Animals with Attributes Dataset

object toFor their studies on attribute-based object similarity,Os-herson and Wilkie [24]collected judgements from human subjects on the “relative strength of association”between 85attributes and 48animal classes.Kemp et al.[18]made use of the same data in a machine learning context and added 2more animals classes.Figure 3illustrates an ex-cerpt of the resulting 50×85class-attribute matrix.How-ever,so far this data was not usable in a computer vision context,because the animals and attributes are only spec-iﬁed by their abstract names,not by example images.To overcome this problem,we have collected the Animals with Attributes data.3

4.1.Image Collection

We have collected example images for all 50Osher-son/Kemp animal classes by querying four large internet search engines,Google ,Microsoft ,Yahoo and Flickr ,using the animal names as keywords.The resulting over 180,000images were manually processed to remove outliers and du-plicates,and to ensure that the target animal is in a promi-nent view in all cases.The remaining collection consists of 30475images with at minimum of 92images for any class.Figure 1shows examples of some classes with the values of exemplary attributes assigned to this class.Altogether,animals are uniquely characterized by their attribute vector.Consequently,the Animals with Attributes dataset,formed

3Available

at attributes.kyb.tuebingen.mpg.de

by combining the collected images with the semantic at-tribute table,can serve as a testbed for the task of incorpo-rating human knowledge into an object detection system.

4.2.Feature Representations

Feature extraction is known to have a big inﬂuence in computer vision tasks.For most image datasets,e.g .Cal-tech [15]and PASCAL VOC 4,is has become difﬁcult to judge the true performance of newly proposed classiﬁca-tion methods,because results based on very different fea-ture sets need to be compared.We have therefore decided to include a reference set of pre-extracted features into the Animals with Attributes dataset.

We have selected six different feature types:RGB color histograms,SIFT [21],rgSIFT [35],PHOG [4],SURF [2]and local self-similarity histograms [30].The color his-tograms and PHOG feature vectors are extracted separately for all 21cells of a 3-level spatial pyramids (1×1,2×2,4×4).For each cell,128-dimensional color histograms are extracted and concatenated to form a 2688-dimensional feature vector.For PHOG,the same construction is used,but with 12-dimensional base histograms.The other feature vectors each are 2000-bin bag-of-visual words histograms.For the consistent evaluation of attribute-based object classiﬁcation methods,we have selected 10test classes:chimpanzee,giant panda,hippopotamus,humpback whale,leopard,pig,racoon,rat,seal .The 6180images of those classes act as test data,whereas the 24295images of the remaining 40classes can be used for training.Addition-ally,we also encourage the use of the dataset for regular large-scale multi-class or multi-label classiﬁcation.For this we provide ordinary training/test splits with both parts

con-taining images of all classes.In particular,we expect the Animals with Attributes dataset to be suitable to test hierar-chical classiﬁcation techniques,because the classes contain natural subgroups of similar appearance.

5.Experimental Evaluation

In Section 2we introduced DAP and IAP,two meth-ods for attribute-based classiﬁcation,that allow the learn-ing of object classiﬁcation systems for classes for,which no training samples are available.In the following,we eval-uate both methods by applying them to the Animals with Attributes dataset.For DAP,we train a non-linear sup-port vector machine (SVM)to predict each binary attributes a 1,...,a M .All attribute SVMs are based the same kernel,the sum of individual χ2-kernels for each feature,where the bandwidth parameters are ﬁxed to the ﬁve times inverse of the median of the χ2-distances over the training samples.The SVM’s parameter C is set to 10,which had been deter-mined a priori by cross-validation on a subset of the training

4/challenges/VOC/

688IT编程网

...CVPR2009 - Learning to detect unseen object classes

发表评论

推荐文章

java正则表达式选择题

一种基于正则表达式的DBC文件解析及报文分析方法[发明专利]

工龄小数点提取

非零金额正则表达式

提取文本中数字的函数

热门文章

利用正则表达式实现文本数据提取与处理

正则表达式零宽断言详解

文本匹配规则

excel中使用正则

1-31正则表达式

anki之高级筛选

BUAA_OO_2021_第一单元总结

insert语句递增写法

sublime text 3在行前插入递增数字序号的方法

字符串只允许数字和英文的正则

powerbuilder 正则表达式

Shell脚本编写的高级技巧利用正则表达式进行字符串匹配

JAVA正则表达式的三种模式:贪婪,勉强和占有的讨论

go regexp匹配规则

oracle regexp_substr 实现原理

基本的元字符回溯引用和前后查匹配模式

elasticsearch query dsl正则

oracle sql正则表达式

GA-设置目标

仅匹配全角片假名的正则表达式

最新文章

java正则表达式选择题

工龄小数点提取

非零金额正则表达式

提取文本中数字的函数

vue数字相加小数点变长-概述说明以及解释

vue validate 正则验证小数长度

标签列表

688IT编程网

...CVPR2009 - Learning to detect unseen object classes

发表评论

推荐文章

java正则表达式 选择题

一种基于正则表达式的DBC文件解析及报文分析方法[发明专利]

工龄小数点提取

非零金额 正则表达式

提取文本中数字的函数

热门文章

利用正则表达式实现文本数据提取与处理

正则表达式零宽断言详解

文本匹配规则

excel中使用正则

1-31正则表达式

anki之高级筛选

BUAA_OO_2021_第一单元总结

insert语句递增写法

sublime text 3在行前插入递增数字序号的方法

字符串只允许数字和英文的正则

powerbuilder 正则表达式

Shell脚本编写的高级技巧利用正则表达式进行字符串匹配

JAVA正则表达式的三种模式:贪婪,勉强和占有的讨论

go regexp匹配规则

oracle regexp_substr 实现原理

基本的元字符 回溯引用和前后查 匹配模式

elasticsearch query dsl正则

oracle sql正则表达式

GA-设置目标

仅匹配全角片假名的正则表达式

最新文章

java正则表达式 选择题

工龄小数点提取

非零金额 正则表达式

提取文本中数字的函数

vue数字相加小数点变长-概述说明以及解释

vue validate 正则验证小数长度

标签列表

java正则表达式选择题

非零金额正则表达式

基本的元字符回溯引用和前后查匹配模式

java正则表达式选择题

非零金额正则表达式