Title stata heckman postestimation—Postestimation tools for heckman
Postestimation commands predict margins Remarks and examples
Reference Also see
Postestimation commands
The following postestimation commands are available after heckman:
Command Description
contrast contrasts and ANOV A-style joint tests of estimates
∗estat ic Akaike’s,consistent Akaike’s,corrected Akaike’s,and Schwarz’s Bayesian in-
formation criteria(AIC,CAIC,AIC c,and BIC)
estat summarize summary statistics for the estimation sample
estat vce variance–covariance matrix of the estimators(VCE)
estat(svy)postestimation statistics for survey data
estimates cataloging estimation results
etable table of estimation results
†hausman Hausman’s specification test
lincom point estimates,standard errors,testing,and inference for linear combinations of
coefficients
†lrtest likelihood-ratio test;not available with two-step estimator
margins marginal means,predictive margins,marginal effects,and average marginal effects marginsplot graph the results from margins(profile plots,interaction plots,etc.)
nlcom point estimates,standard errors,testing,and inference for nonlinear combinations
of coefficients
predict linear predictions and their SE s,probabilities,etc.
predictnl point estimates,standard errors,testing,and inference for generalized predictions pwcompare pairwise comparisons of estimates
∗suest seemingly unrelated estimation
test Wald tests of simple and composite linear hypotheses
testnl Wald tests of nonlinear hypotheses
∗estat ic and suest are not appropriate after heckman,twostep.
†
hausman and lrtest are not appropriate with svy estimation results.
1
2heckman postestimation —Postestimation tools for heckman
predict
Description for predict
predict creates a new variable containing predictions such as linear predictions,standard errors,probabilities,expected values,and nonselection hazards.
Menu for predict
Statistics
>
Postestimation
Syntax for predict
After ML or twostep
predict type newvar if in
,statistic nooffset After ML
predict
type
stub *
if
in
,scores
statistic
Description
Main
xb linear prediction;the default stdp standard error of the prediction stdf standard error of the forecast
xbsel linear prediction for selection equation
stdpsel standard error of the linear prediction for selection equation pr(a ,b )Pr (y j |a <y j <b )e(a ,b )
E (y j |a <y j <b )
ystar(a ,b )E (y ∗j ),y ∗j =max {a,min (y j ,b )}ycond
E (y j |y j observed )
yexpected
E (y ∗j ),y j taken to be 0where unobserved
nshazard or mills nonselection hazard (also called the inverse of Mills’s ratio)psel
Pr (y j observed )
These statistics are available both in and out of sample;type predict ...if e(sample)...if wanted only for
the estimation sample.
stdf is not allowed with svy estimation results.
where a and b may be numbers or variables;a missing (a ≥.)means −∞,and b missing (b ≥.)means +∞;see [U ]12.2.1Missing values .
heckman postestimation—Postestimation tools for heckman3 Options for predict
£
£
Main xb,the default,calculates the linear prediction x j b.
stdp calculates the standard error of the prediction,which can be thought of as the standard error of the predicted expected value or mean for the observation’s covariate pattern.The standard error of the prediction is also referred to as the standard error of thefitted value.
stdf calculates the standard error of the forecast,which is the standard error of the point prediction for1observation.It is commonly referred to as the standard error of the future or forecast value.
By construction,the standard errors produced by stdf are always larger than those produced by stdp;see Methods and formulas in[R]regress postestimation.
xbsel calculates the linear prediction for the selection equation.
stdpsel calculates the standard error of the linear prediction for the selection equation.
pr(a,b)calculates Pr(a<x j b+u1<b),the probability that y j|x j would be observed in the interval(a,b).
a and
b may be specified as numbers or variable names;lb and ub are variable names;truncated data
pr(20,30)calculates Pr(20<x j b+u1<30);pr(lb,ub)calculates Pr(lb<x j b+u1<ub);
and pr(20,ub)calculates Pr(20<x j b+u1<ub).
a missing(a≥.)means−∞;pr(.,30)calculates Pr(−∞<x j b+u j<30);
pr(lb,30)calculates Pr(−∞<x j b+u j<30)in observations for which lb≥.
and calculates Pr(lb<x j b+u j<30)elsewhere.
b missing(b≥.)means+∞;pr(20,.)calculates Pr(+∞>x j b+u j>20);
pr(20,ub)calculates Pr(+∞>x j b+u j>20)in observations for which ub≥.
and calculates Pr(20<x j b+u j<ub)elsewhere.
e(a,b)calculates E(x j b+u1|a<x j b+u1<b),the expected value of y j|x j conditional on y j|x j being in the interval(a,b),meaning that y j|x j is truncated.
a and
b are specified as they are for pr().
ystar(a,b)calculates E(y∗j),where y∗j=a if x j b+u j≤a,y∗j=b if x j b+u j≥b,and y∗j=x j b+u j otherwise,meaning that y∗j is censored.a and b are specified as they are for pr(). ycond calculates the expected value of the dependent variable conditional on the dependent variable being observed,that is,selected;E(y j|y j observed).
yexpected calculates the expected value of the dependent variable(y∗j),where that value is taken to be0when it is expected to be unobserved;y∗j=Pr(y j observed)E(y j|y j observed).
The assumption of0is valid for many cases where nonselection implies nonparticipation(for example,unobserved wage levels,insurance claims from those who are uninsured)but may be inappropriate for some problems(for example,unobserved disease incidence).
nshazard and mills are synonyms;both calculate the nonselection hazard—what Heckman(1979) referred to as the inverse of the Mills ratio—from the selection equation.
psel calculates the probability of selection(or being observed):
Pr(y j observed)=Pr(z jγ+u2j>0).
4heckman postestimation—Postestimation tools for heckman
scores,not available with twostep,calculates equation-level score variables.
Thefirst new variable will contain∂ln L/∂(x jβ).
The second new variable will contain∂ln L/∂(z jγ).
The third new variable will contain∂ln L/∂(atanhρ).
The fourth new variable will contain∂ln L/∂(lnσ).
nooffset is relevant when you specify offset(varname)for heckman.It modifies the calculations made by predict so that they ignore the offset variable;the linear prediction is treated as x j b rather than as x j b+offset j.
margins
Description for margins
margins estimates margins of response for linear predictions,probabilities,expected values,and nonselection hazards.
Menu for margins
Statistics>Postestimation
Syntax for margins
margins
marginlist
,options
margins
marginlist
,)
)...
options
statistic Description
xb linear prediction;the default
xbsel linear prediction for selection equation
pr(a,b)Pr(y j|a<y j<b)
e(a,b)E(y j|a<y j<b)
ystar(a,b)E(y∗j),y∗j=max{a,min(y j,b)}
∗ycond E(y j|y j observed)
∗yexpected E(y∗
j
),y j taken to be0where unobserved
nshazard or mills nonselection hazard(also called the inverse of Mills’s ratio) psel Pr(y j observed)
stdp not allowed with margins
stdf not allowed with margins
stdpsel not allowed with margins
∗ycond and yexpected are not allowed with margins after heckman,twostep.
Statistics not allowed with margins are functions of stochastic quantities other than e(b). For the full syntax,see[R]margins.
heckman postestimation—Postestimation tools for heckman5 Remarks and examples stata Example1
The default statistic produced by predict after heckman is the expected value of the dependent variable from the underlying distribution of the regression model.In the wage model of[R]heckman, this
is the expected wage rate among all women,regardless of whether they were observed to participate in the labor force:
.use www.stata-press/data/r18/womenwk
.heckman wage educ age,select(married children educ age)vce(cluster county)
(output omitted)
.predict heckwage
(option xb assumed;fitted values)
It is instructive to compare these predicted wage values from the Heckman model with an ordinary regression model—a model without the selection adjustment:
.regress wage educ age
Source SS df MS Number of obs=1,343
F(2,1340)=227.49
Model13524.033726762.01687Prob>F=0.0000 Residual39830.86091,34029.7245231R-squared=0.2535
Adj R-squared=0.2524
Total53354.89461,34239.7577456Root MSE= 5.452
wage P>|t|[95%conf.interval] education.8965829.049806118.000.000.7988765.9942893
age.1465739.01871357.830.000.109863.1832848
_cons 6.084875.8896182 6.840.000 4.3396797.830071 .predict regwage
(option xb assumed;fitted values)
.summarize heckwage regwage
Variable Obs Mean Std.dev.Min Max
heckwage2,00021.15532 3.8396514.647932.85949
regwage2,00023.12291 3.24191117.9821832.66439 Because this dataset was concocted,we know the true coefficients of the wage regression equation to be1,0.2,and1,respectively.We can compute the true mean wage for our sample.
.generate truewage=1+.2*age+1*educ
.summarize truewage
Variable Obs Mean Std.dev.Min Max
truewage2,00021.3256 3.7979041532.8 Whereas the mean of the predictions from heckman is within18cents of the true mean wage, ordinary regression yields predictions that are on average about$1.80per hour too high because of the selection effect.The regression predictions also show somewhat less variation than the true wages.
The coefficients from heckman are so close to the true values that they are not worth testing.
Conversely,the regression equation is significantly off but seems to give the right sense.Would we be led far astray if we relied on the OLS coefficients?The effect of age is off by more than5cents per year of age,and the coefficient on education level is off by about10%.We can test the OLS coefficient on ed
ucation level against the true value by using test.
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系QQ:729038198,我们将在24小时内删除。
发表评论