The Method of Least Squares--688IT编程网

The Method of Least Squares

HervéAbdi1

1Introduction

The least square methods(LSM)is probably the most popular tech-nique in statistics.This is due to several factors.First,most com-mon estimators can be casted within this framework.For exam-ple,the mean of a distribution is the value that minimizes the sum of squared deviations of the scores.Second,using squares makes LSM mathematically very tractable because the Pythagorean theo-rem indicates that,when the error is independent of an estimated quantity,one can add the squared error and the squared estimated quantity.Third,the mathematical tools and algorithms involved in LSM(derivatives,eigendecomposition,singular value decomposi-tion)have been well studied for a relatively long time.

LSM is one of the oldest techniques of modern statistics,and even though ancestors of LSM can be traced up to Greek mathe-matics,theﬁrst modern precursor is probably Galileo(see Harper, 1974,for a history and pre-history of LSM).The modern approach wasﬁrst exposed in1805by the French mathematician Legendre in a now classic memoir,but this method is somewhat older be-cause it turned

out that,after the publication of Legendre’s mem-oir,Gauss(the famous German mathematician)contested Legen-1In:Neil Salkind(Ed.)(2007).Encyclopedia of Measurement and Statistics. Thousand Oaks(CA):Sage.

Address correspondence to:HervéAbdi

Program in Cognition and Neurosciences,MS:Gr.4.1,

The University of Texas at Dallas,

Richardson,TX75083–0688,USA

E-mail:herve@utdallas.edu www.utd.edu/∼herve

dre’s priority.Gauss often did not published ideas when he though that they could be controversial or not yet ripe,but would mention his discoveries when others would publish them(the way he did, for example for the discovery of Non-Euclidean geometry).And in1809,Gauss published another memoir in which he mentioned that he had previously discovered LSM and used it as early as1795 in estimating the orbit of an asteroid.A somewhat bitter anterior-ity dispute followed(a bit reminiscent of the Leibniz-Newton con-troversy about the invention of Calculus),which,however,did not diminish the p

opularity of this technique.

The use of LSM in a modern statistical framework can be traced to Galton(1886)who used it in his work on the heritability of size which laid down the foundations of correlation and(also gave the name to)regression analysis.The two antagonistic giants of statis-tics Pearson and Fisher,who did so much in the early develop-ment of statistics,used and developed it in different contexts(fac-tor analysis for Pearson and experimental design for Fisher).

Nowadays,the least square method is widely used toﬁnd or es-timate the numerical values of the parameters toﬁt a function to a set of data and to characterize the statistical properties of esti-mates.It exists with several variations:Its simpler version is called ordinary least squares(OLS),a more sophisticated version is called weighted least squares(WLS),which often performs better than OLS because it can modulate the importance of each observation in theﬁnal solution.Recent variations of the least square method are alternating least squares(ALS)and partial least squares(PLS). 2Functionalﬁt example:regression

The oldest(and still the most frequent)use of OLS was linear re-gression,which corresponds to the problem ofﬁnding a line(or curve)that bestﬁts a set of data points.In the standard formu-lation,a set of

N pairs of observations{Y i,X i}is used toﬁnd a function relating the value of the dependent variable(Y)to the values of an independent variable(X).With one variable and a

linear function,the prediction is given by the following equation:

ˆY=a+bX.(1) This equation involves two free parameters which specify the in-tercept(a)and the slope(b)of the regression line.The least square method deﬁnes the estimate of these parameters as the values wh-ich minimize the sum of the squares(hence the name least squares) between the measurements and the ,the predicted val-ues).This amounts to minimizing the expression:

E= i(Y i−ˆY i)2= i[Y i−(a+bX i)]2(2) (where E stands for“error"which is the quantity to be minimized). The estimation of the parameters is obtained using basic results from calculus and,speciﬁcally,uses the property that a quadratic expression reaches its minimum value when its derivatives van-ish.Taking the derivative of E with respect to a and b and setting them to zero gives the following set of equations(called the normal equations):

∂E ∂a =2Na+2b

i−2

i=0(3)

and

∂E ∂b =2b

+2a

i−2

i X i=0.(4)

Solving the normal equations gives the following least square esti-mates of a and b as:

a=M Y−bM X(5) (with M Y and M X denoting the means of X and Y)and

b= (Y

i−M Y)(X i−M X)

i−M X)

.(6)

OLS can be extended to more than one independent variable(us-ing matrix algebra)and to non-linear functions.

2.1The geometry of least squares

OLS can be interpreted in a geometrical framework as an orthog-onal projection of the data vector onto the space deﬁned by the independent variable.The projection is orthogonal because the predicted values and the actual values are uncorrelated.This is il-lustrated in Figure1,which depicts the case of two independent variables(vectors x1and x2)and the data vector(y),and shows that the error vector(y−ˆy)is orthogonal to the least square(ˆy)es-timate which lies in the subspace deﬁned by the two independent variables.

Figure1:The least square estimate of the data is the orthogonal projection of the data vector onto the independent variable sub-space.

2.2Optimality of least square estimates

OLS estimates have some strong statistical properties.Speciﬁcally when(1)the data obtained constitute a random sample from a well-deﬁned population,(2)the population model is linear,(3)the error has a zero expected value,(4)the independent variables are linearly independent,and(5)the error is normally distributed and uncorrelated with the independent variables(the so-called homo-scedasticity assumption);then the OLS estimate is the b est l inear u nbiased e stimate often denoted with the acronym“BLUE"(the5 conditions and the proof are called the Gauss-Markov conditions and theorem).In addition,when the Gauss-Markov conditions hold,OLS estimates are also maximum likelihood estimates.

2.3Weighted least squares

The optimality of OLS relies heavily on the homoscedasticity as-sumption.When the data come from different sub-populations for which an independent estimate of the error variance is avail-able,a better estimate than OLS can be obtained using weighted least squares(WLS),also called generalized least s

quares(GLS). The idea is to assign to each observation a weight that reﬂects the uncertainty of the measurement.In general,the weight w i,as-signed to the i th observation,will be a function of the variance of

this observation,denotedσ2

i .A straightforward weighting schema

is to deﬁne w i=σ−1i(but other more sophisticated weighted sch-emes can also be proposed).For the linear regression example, WLS willﬁnd the values of a and b minimizing:

E w=

spring framework表达式assigni w i(Y i−ˆY i)2=

w i[Y i−(a+bX i)]2.(7)

2.4Iterative methods:Gradient descent

When estimating the parameters of a nonlinear function with OLS or WLS,the standard approach using derivatives is not always pos-sible.In this case,iterative methods are very often used.These methods search in a stepwise fashion for the best values of the es-timate.Often they proceed by using at each step a linear approx-

imation of the function and reﬁne this approximation by succes-sive corrections.The techniques involved are known as gradient descent and Gauss-Newton approximations.They correspond to nonlinear least squares approximation in numerical analysis and nonlinear regression in statistics.Neural networks constitutes a popular recent application of these techniques

3Problems with least squares,

and alternatives

Despite its popularity and versatility,LSM has its problems.Prob-ably,the most important drawback of LSM is its high sensitivity to ,extreme observations).This is a consequence of us-ing squares because squaring exaggerates the magnitude of ,the difference between20and10is equal to10but the difference between202and102is equal to300)and therefore gives a much stronger importance to extreme observations.This prob-lem is addressed by using robust tech

niques which are less sensi-tive to the effect of outliers.Thisﬁeld is currently under develop-ment and is likely to become more important in the next future. References

[1]Abdi,H.,Valentin D.,Edelman,B.E.(1999)Neural networks.

Thousand Oaks:Sage.

[2]Bates,D.M.&Watts D.G.(1988).Nonlinear regression analysis

and its applications.New York:Wiley

[3]Greene,W.H.(2002).Econometric analysis.New York:Prentice

Hall.

[4]Harper H.L.(1974–1976).The method of least squares and

some alternatives.Part I,II,II,IV,V,VI.International Satis-tical Review,42,147–174;42,235–264;43,1–44;43,125–190;

43,269–272;44,113–159;

[5]Nocedal J.&Wright,S.(1999).Numerical optimization.New

York:Springer.

688IT编程网

The Method of Least Squares

发表评论

推荐文章

java正则表达式选择题

一种基于正则表达式的DBC文件解析及报文分析方法[发明专利]

工龄小数点提取

非零金额正则表达式

提取文本中数字的函数

热门文章

利用正则表达式实现文本数据提取与处理

正则表达式零宽断言详解

文本匹配规则

excel中使用正则

1-31正则表达式

anki之高级筛选

BUAA_OO_2021_第一单元总结

insert语句递增写法

sublime text 3在行前插入递增数字序号的方法

字符串只允许数字和英文的正则

powerbuilder 正则表达式

Shell脚本编写的高级技巧利用正则表达式进行字符串匹配

JAVA正则表达式的三种模式:贪婪,勉强和占有的讨论

go regexp匹配规则

oracle regexp_substr 实现原理

基本的元字符回溯引用和前后查匹配模式

elasticsearch query dsl正则

oracle sql正则表达式

GA-设置目标

仅匹配全角片假名的正则表达式

最新文章

java正则表达式选择题

工龄小数点提取

非零金额正则表达式

提取文本中数字的函数

vue数字相加小数点变长-概述说明以及解释

vue validate 正则验证小数长度

标签列表

688IT编程网

The Method of Least Squares

发表评论

推荐文章

java正则表达式 选择题

一种基于正则表达式的DBC文件解析及报文分析方法[发明专利]

工龄小数点提取

非零金额 正则表达式

提取文本中数字的函数

热门文章

利用正则表达式实现文本数据提取与处理

正则表达式零宽断言详解

文本匹配规则

excel中使用正则

1-31正则表达式

anki之高级筛选

BUAA_OO_2021_第一单元总结

insert语句递增写法

sublime text 3在行前插入递增数字序号的方法

字符串只允许数字和英文的正则

powerbuilder 正则表达式

Shell脚本编写的高级技巧利用正则表达式进行字符串匹配

JAVA正则表达式的三种模式:贪婪,勉强和占有的讨论

go regexp匹配规则

oracle regexp_substr 实现原理

基本的元字符 回溯引用和前后查 匹配模式

elasticsearch query dsl正则

oracle sql正则表达式

GA-设置目标

仅匹配全角片假名的正则表达式

最新文章

java正则表达式 选择题

工龄小数点提取

非零金额 正则表达式

提取文本中数字的函数

vue数字相加小数点变长-概述说明以及解释

vue validate 正则验证小数长度

标签列表

java正则表达式选择题

非零金额正则表达式

基本的元字符回溯引用和前后查匹配模式

java正则表达式选择题

非零金额正则表达式