A Comparison of Methods for Multiclass SVM--688IT编程网

A Comparison of Methods for Multiclass

Support Vector Machines

Chih-Wei Hsu and Chih-Jen Lin

Abstract—Support vector machines(SVMs)were originally designed for binary classification.How to effectively extend it for multiclass classification is still an ongoing research issue.Several methods have been proposed where typically we construct a multiclass classifier by combining several binary classifiers.Some authors also proposed methods that consider all classes at once.As it is computationally more expensive to solve multiclass problems, comparisons of these methods using large-scale problems have not been seriously conducted.Especially for methods solving multiclass SVM in one step,a much larger optimization problem is required so up to now experiments are limited to small data sets. In this paper we give decomposition implementations for two such “all-together”methods.We then compare their performance with three methods based on binary classifications:“one-against-all,”“one-against-one,”and directed acyclic graph SVM(DAGSVM). Our experiments indicate that the“one-against-one”and DAG methods are more suitable for practical use than the other methods.Results also show that for large problems methods by considering all data at

once in general need fewer support vectors. Index Terms—Decomposition methods,multiclass classification, support vector machines(SVMs).

I.I NTRODUCTION

S UPPORT vector machines(SVMs)[6]were originally de-signed for binary classification.How to effectively extend it for multiclass classification is still an ongoing research issue. Currently there are two types of approaches for multiclass SVM. One is by constructing and combining several binary classifiers while the other is by directly considering all data in one opti-mization formulation.Up to now there are still no comparisons which cover most of these methods.

The formulation to solve multiclass SVM problems in one step has variables proportional to the number of classes.There-fore,for multiclass SVM methods,either several binary clas-sifiers have to be constructed or a larger optimization problem is needed.Hence in general it is computationally more expen-sive to solve a multiclass problem than a binary problem with the same number of data.Up to now experiments are limited to small data sets.In this paper we will give a decomposition implementation for two such“all-together”methods:[25],[27] and[7].We then compare their performance with three methods based on binary classification:“one-against-all,”“one-against-one,”and DAGSVM[23].

Manuscript received April9,2001;revised August22,2001.This work was supported in part by the National Science Council of Taiwan under Grant NSC 89-2213-E-002-106.

The authors are with the Department of Computer Science and Informa-tion Engineering,National Taiwan University,Taipei106,Taiwan(e-mail: u.edu.tw).

Publisher Item Identifier S1045-9227(02)01805-2.

Note that it was pointed out in[11]that the primal forms proposed in[25],[27]are also equivalent to those in[3],[11]. Besides methods mentioned above,there are other implemen-tations for multiclass SVM.For example,[14],[19].However, due to the limit of space here we do not conduct experiments on them.An earlier comparison between one-against-one and one-against-all methods is in[5].

In Section II,we review one-against-all,one-against-one,and directed acyclic graph SVM(DAGSVM)methods which are based on solving several binary classifications.In Section III, we give a brief introduction to the method in[25],[27]which considers all classes at once and show that the decomposition method proposed in[12]can be applied.Another method which also considers all variables together is by Crammer and Singer [7],which will be discussed in Section IV.Numerical experi-ments are in Section V where we show that“one-against-one”and DAG methods are more suit

able for practical use than the other methods.Results also show that for large problems the method proposed in[25],[27]by considering all variables at once generally needs fewer support vectors.Finally,we have some discussions and conclusions in Section VI.

II.O NE-A GAINST-A LL,O NE-A GAINST-O NE,

AND DAGSVM M ETHODS

The earliest used implementation for SVM multiclass clas-sification is probably the one-against-all method(for example, [2]).It

constructs is the number of classes.The

th class with positive labels,and all other examples with negative labels.Thus

given,

where

and is the class

of ,

the

(1) where the training

data are mapped to a higher dimensional space by the

function is the penalty parameter.

Minimizing

,the margin between two groups of data.When data are not linear separable,there is a penalty

term

which can reduce the number of training errors.The basic con-cept behind SVM is to search for a balance between the regu-larization

term

After solving (1),there

are

is in the class which has the largest value of the deci-sion function

class

th and

the

says

th class,then the vote for

the

th is increased by

one.Then we

predict

leaves.Each node is a binary SVM of th classes.

Given a test

sample

two-class

rules where

the

from the other vectors.Hence there

are

(4)

Then the decision function

which is the same as (2)of the one-against-all method.Like bi-nary SVM,it is easier to solve the dual problem here.Following

[27],the dual formulation of (4)

(5a)

(5b)

if (5c)

where

(6)

and the decision function

(7a)

HSU AND LIN:A COMPARISON OF METHODS FOR MULTICLASS SUPPORT VECTOR MACHINES417 and the other is from

[3]

(8)

with the same constraints of(4).

For the optimal solution of(4),from(6)and(5c),we

have

of them are always

zero.Hence we can also say it has

and

are fixed while a subproblem on variables corresponding

and the selection of its contents

are both important issues on designing a decomposition method.

For example,the sequential minimal optimization(SMO)by

Platt[22]considers only two variables in each iteration.Then

in each iteration of the decomposition method the sub-problem

on two variables can be analytically solved.Hence no optimiza-

tion software is needed.

However,such working set selections in the binary case may

not work here for the dual problem(5).Instead of only one linear

constraint in the dual of(1),now in(5a)we

have

linear equations

with

,then(5a)may be an over-determined linear

system with more equations than variables so the solution of the

decomposition algorithm cannot be moved.Therefore,in gen-

eral we have to select more

than

to the objective

function:

(9)

Then in the dual

problem

comparisons

(11)

where is

(12)

where is a permutation of the

matrix

418IEEE TRANSACTIONS ON NEURAL NETWORKS,VOL.13,NO.2,MARCH 2002

Assume

that is the size of the working set.We use the following working set selection of BSVM :

Let

and

calculate ,

where

Select

the

into

is an optimal solution of (13),

then must

defined above actually

has

Thus selecting the smallest elements

whose

corresponding

and the

other

whose

corresponding

which are mainly accessed

(12).

and

.As any quadratic formulation can be written as a symmetric

form,we must reformulate (10).This will be explained in (15).Note that

is positive semidefinite and (10)is a convex optimiza-tion problem.Here (14a)to (14b)is by direct calculation.It is easy to see how the first and third terms

of .In particular,we will

show what a column

In (10),the quadratic terms compose of three groups.First we

have

...

includes all

elements

th column,row

indexes corresponding

will have contributed by this part.We

use

represent

and

for .

HSU AND LIN:A COMPARISON OF METHODS FOR MULTICLASS SUPPORT VECTOR MACHINES419

For the third

term,

will

include.

The most complicated one is the second

part

th column,elements corresponding

and will

have

and,respectively.

In summary,

the can be obtained

as follows:

1)Obtain the column

vector.Initialize

the

add

(from the third part)

3)For elements corresponding

(from the second part)

4)For elements corresponding to each

(from the first part)

5)For elements corresponding to each

of,

minus(from the second part)

Thus

.The reduction of the cached matrix

from further

improve the training time of this algorithm.

There are different implementations of the above procedure.

As the situation may vary for different computational environ-

ment,here we do not get into further details.

We have implemented this method as an extension of

BSVM and LIBSVM which will be used for the experi-

ments in Section V.In addition,the software is available at

u.edu.tw/~cjlin/bsvm.

IV.M ETHOD BY C RAMMER AND S INGER

In[7],Crammer and Singer proposed an approach for mul-

ticlass problems by solving a single optimization problem.We

will also give a decomposition implementation here.Basically

[7]solves the following primal

problem:

(16)

where

slack

variables.That is,instead of

using as the gap

between each two decision planes,here the maximum

.Note that here we do not have to

explicitly write down

constraints

so(16)

becomes

(17a)

(17b)

where

an identity matrix

and

is positive

semidefinite,,the Hessian

of the dual objective function is also positive semidefinite.This

is another way to explain that(17)is a convex optimization

problem.

420IEEE TRANSACTIONS ON NEURAL NETWORKS,VOL.13,NO.2,MARCH2002 The decision function

equations while(17a)

has

variables.In a sense

we can say that(17a)contains

variables associated

with the

same in the working set.That

is,

will be discussed in(20).Then the subproblem

(18)

where

th component

and

by one vector,can be calculated as follows by the

information of the

gradient

operations are needed.

Next we discuss the selection of the working set and the stop-

ping condition of the decomposition method.The KKT condi-

tion requires that there

are

and

We can rewrite this

with

which are selected,they do not satisfy the

KKT condition of the subproblem(18)so solving(18)will guar-

antee the strict decrease on the objective function of the dual

problem.

Following(19)the stopping criterion can

is the stopping tolerance.

The convergence of the above decomposition method has

been proved in[17].In addition,[17]shows that the limit

688IT编程网

A Comparison of Methods for Multiclass SVM

发表评论

推荐文章

java正则表达式选择题

一种基于正则表达式的DBC文件解析及报文分析方法[发明专利]

工龄小数点提取

非零金额正则表达式

提取文本中数字的函数

热门文章

利用正则表达式实现文本数据提取与处理

正则表达式零宽断言详解

文本匹配规则

excel中使用正则

1-31正则表达式

anki之高级筛选

BUAA_OO_2021_第一单元总结

insert语句递增写法

sublime text 3在行前插入递增数字序号的方法

字符串只允许数字和英文的正则

powerbuilder 正则表达式

Shell脚本编写的高级技巧利用正则表达式进行字符串匹配

JAVA正则表达式的三种模式:贪婪,勉强和占有的讨论

go regexp匹配规则

oracle regexp_substr 实现原理

基本的元字符回溯引用和前后查匹配模式

elasticsearch query dsl正则

oracle sql正则表达式

GA-设置目标

仅匹配全角片假名的正则表达式

最新文章

java正则表达式选择题

工龄小数点提取

非零金额正则表达式

提取文本中数字的函数

vue数字相加小数点变长-概述说明以及解释

vue validate 正则验证小数长度

标签列表

688IT编程网

A Comparison of Methods for Multiclass SVM

发表评论

推荐文章

java正则表达式 选择题

一种基于正则表达式的DBC文件解析及报文分析方法[发明专利]

工龄小数点提取

非零金额 正则表达式

提取文本中数字的函数

热门文章

利用正则表达式实现文本数据提取与处理

正则表达式零宽断言详解

文本匹配规则

excel中使用正则

1-31正则表达式

anki之高级筛选

BUAA_OO_2021_第一单元总结

insert语句递增写法

sublime text 3在行前插入递增数字序号的方法

字符串只允许数字和英文的正则

powerbuilder 正则表达式

Shell脚本编写的高级技巧利用正则表达式进行字符串匹配

JAVA正则表达式的三种模式:贪婪,勉强和占有的讨论

go regexp匹配规则

oracle regexp_substr 实现原理

基本的元字符 回溯引用和前后查 匹配模式

elasticsearch query dsl正则

oracle sql正则表达式

GA-设置目标

仅匹配全角片假名的正则表达式

最新文章

java正则表达式 选择题

工龄小数点提取

非零金额 正则表达式

提取文本中数字的函数

vue数字相加小数点变长-概述说明以及解释

vue validate 正则验证小数长度

标签列表

java正则表达式选择题

非零金额正则表达式

基本的元字符回溯引用和前后查匹配模式

java正则表达式选择题

非零金额正则表达式