作业二:
打开“bankloan.sav”,数据是某银行收集到的客户违约信息,待分析的因变量是default,其它变量是可能影响客户是否违约的因素。
1.使用logistic回归、判别分析、分类树方法进行分析,判断哪些变量会对客户违约产生影响。
2.比较这几种方法的分类准确性
logistic回归
Classification Tablea,b | |||||
Observed | Predicted | ||||
Previously defaulted | Percentage Correct | ||||
No | Yes | ||||
Step 0 | Previously defaulted | No | 516 | 0 | 100.0 |
Yes | 183 | 0 | .0 | ||
Overall Percentage | 73.8 | ||||
a. Constant is included in the model. | |||||
b. The cut value is .500 | |||||
如图:Previously defaulted的频数最多,因此将所有个案都划分到Previously defaulted中,分类的正确率为73.2%
Hosmer and Lemeshow Test | |||
Step | Chi-square | df | Sig. |
1 | 8.467 | 8 | .389 |
如图:Chi-square=8.467 sig=0.389>0.05 在5%的置信水平下模型并不能较好的拟合数据
Model Summary | |||
Step | -2 Log likelihood | Cox & Snell R Square | |
1 | 551.427a | .303 | .443 |
a. Estimation terminated at iteration number 6 because parameter estimates changed by less than .001. | |||
如上图:-2 Log likelihood、Cox & Snell R Square、Nagelkerke R Square三个拟合系数的数值都不是很大,说米模型的拟合效果一般。
Contingency Table for Hosmer and Lemeshow Test | |||||||||||
Previously defaulted = No | Previously defaulted = Yes | Total | |||||||||
Observed | Expected | Observed | Expected | ||||||||
Step 1 | 1 | 70 | 69.745 | 0 | .255 | 70 | |||||
2 | 69 | 68.805 | 1 | 1.195 | 70 | ||||||
3 | 64 | 66.852 | 6 | 3.148 | 70 | ||||||
4 | 64 | 63.931 | 6 | 6.069 | 70 | ||||||
5 | 65 | 59.868 | 5 | 10.132 | 70 | ||||||
6 | 51 | 54.527 | 19 | 15.473 | 70 | ||||||
7 | 49 | 48.175 | 21 | 21.825 | 70 | ||||||
8 | 39 | 40.568 | 31 | 29.432 | 70 | ||||||
9 | 34 | 30.362 | 36 | 39.638 | 70 | ||||||
10 | 11 | 13.166 | 58 | 55.834 | 69 | ||||||
Classification Tablea | |||||||||||
Observed | Predicted | ||||||||||
Previously defaulted | Percentage Correct | ||||||||||
No | Yes | ||||||||||
Step 1 | Previously defaulted | No | 471 | 45 | 91.3 | ||||||
Yes | 90 | 93 | 50.8 | ||||||||
Overall Percentage | 80.7 | ||||||||||
a. The cut value is .500 | |||||||||||
将上图与最初的图对比,可以明显的看出划分得到了改善,与初始的分类表相比,个案分类准确率提高到了80.7%。
Variables in the Equation | |||||||
B | S.E. | Wald | df | Sig. | Exp(B) | ||
Step 1a | age | .034 | .017 | 3.887 | 1 | .049 | 1.035 |
ed | .090 | .123 | .532 | 1 | .466 | 1.094 | |
employ | -.258 | .033 | 60.385 | 1 | .000 | .773 | |
address | -.105 | .023 | 20.251 | 1 | .000 | .901 | |
income | -.009 | .008 | 1.159 | 1 | .282 | .991 | |
debtinc | .067 | .031 | 4.881 | 1 | .027 | 1.070 | |
creddebt | .625 | .113 | 30.724 | 1 | .000 | 1.869 | |
othdebt | .062 | .077 | .642 | 1 | .423 | 1.064 | |
Constant | -1.551 | .619 | 6.274 | 1 | .012 | .212 | |
a. Variable(s) entered on step 1: age, ed, employ, address, income, debtinc, creddebt, othdebt. | |||||||
由上图可知由于sig>0.10,age、ed、income、debtinc、othdebt、constant这几个自变量在回归模型中的作用并不显著。
Forward回归法:
Variables not in the Equation | |||||
Score | df | Sig. | |||
Step 0 | Variables | age | 13.210 | 1 | .000 |
ed | 9.099 | 1 | .003 | ||
employ | 55.843 | 1 | .000 | ||
address | 18.768 | 1 | .000 | ||
income | 3.526 | 1 | .060 | ||
debtinc | 106.506 | 1 | .000 | ||
creddebt | 42.116 | 1 | .000 | ||
othdebt | 14.871 | 1 | .000 | ||
Overall Statistics | 201.719 | 8 | .000 | ||
Variables not in the Equation | |||||
Score | df | Sig. | |||
Step 1 | Variables | age | 16.400 | 1 | .000 |
ed | 10.307 | 1 | .001 | ||
employ | 60.633 | 1 | .000 | ||
address | 23.220 | 1 | .000 | ||
income | 3.218 | 1 | .073 | ||
creddebt | 2.302 | 1 | .129 | ||
othdebt | 6.679 | 1 | .010 | ||
Overall Statistics | 113.903 | 7 | .000 | ||
Step 2 | Variables | age | .006 | 1 | .940 |
ed | 3.792 | 1 | .052 | ||
address | 8.306 | 1 | .004 | ||
income | 21.294 | 1 | .000 | ||
creddebt | 64.810 | 1 | .000 | ||
othdebt | 4.429 | 1 | .035 | ||
Overall Statistics | 83.978 | 6 | .000 | ||
Step 3 | Variables | age | .637 | 1 | .425 |
ed | .016 | 1 | .898 | ||
address | 17.701 | 1 | .000 | ||
income | .798 | 1 | .372 | ||
othdebt | .009 | 1 | .925 | ||
Overall Statistics | 22.632 | 5 | .000 | ||
Step 4 | Variables | age | 3.591 | 1 | .058 |
ed | .380 | 1 | .538 | ||
income | .015 | 1 | .902 | ||
othdebt | .302 | 1 | .582 | ||
Overall Statistics | 5.179 | 4 | .269 | ||
Model if Term Removed | |||||
Variable | Model Log Likelihood | Change in -2 Log Likelihood | df | Sig. of the Change | |
Step 1 | debtinc | -401.879 | 103.208 | 1 | .000 |
Step 2 | employ | -350.275 | 69.972 | 1 | .000 |
debtinc | -369.531 | 108.484 | 1 | .000 | |
Step 3 | employ | -349.117 | 123.054 | 1 | .000 |
debtinc | -299.505 | 23.831 | 1 | .000 | |
creddebt | -315.289 | 55.398 | 1 | .000 | |
Step 4 | employ | -333.305 | 110.179 | 1 | .000 |
address | -287.590 | 18.748 | 1 | .000 | |
debtinc | -289.870 | 23.308 | 1 | .000 | |
creddebt | -310.976 | 65.521 | 1 | .000 | |
Classification Tablea | |||||||||
Observed | Predicted | ||||||||
Previously defaulted | Percentage Correct | ||||||||
No | Yes | ||||||||
Step 1 | Previously defaulted | No | 488 | 28 | 94.6 | ||||
Yes | 135 | 48 | 26.2 | ||||||
Overall Percentage | 76.7 | ||||||||
Step 2 | Previously defaulted | No | 479 | 37 | 92.8 | ||||
Yes | 109 | 74 | 40.4 | ||||||
Overall Percentage | 79.1 | ||||||||
Step 3 | Previously defaulted | No | 476 | 40 | 92.2 | ||||
Yes | 99 | 84 | 45.9 | ||||||
Overall Percentage | 80.1 | ||||||||
Step 4 | Previously defaulted | No | 477 | 39 | 92.4 | ||||
Yes | 91 | 92 | 50.3 | ||||||
Overall Percentage | 81.4 | ||||||||
a.The cut value is .500 | |||||||||
Model Summary | |||||||||
Step | -2 Log likelihood | Cox & Snell R Square | Nagelkerke R Square | ||||||
1 | 700.550a | .137 | .201 | ||||||
2 | 630.578b | .219 | .321 | ||||||
3 | 575.180b | .279 | .408 | ||||||
4 | 556.432c | .298 | .436 | ||||||
Variables in the Equation | |||||||
B | S.E. | Wald | df | Sig. | Exp(B) | ||
Step 1a | debtinc | .132 | .014 | 85.544 | 1 | .000 | 1.141 |
Constant | -2.530 | .195 | 168.488 | 1 | .000 | .080 | |
Step 2b | employ | -.141 | .019 | 53.493 | 1 | .000 | .869 |
debtinc | .145 | .016 | 87.340 | 1 | .000 | 1.156 | |
Constant | -1.695 | .219 | 59.886 | 1 | .000 | .184 | |
Step 3c | employ | -.244 | .027 | 79.984 | 1 | .000 | .784 |
debtinc | .088 | .018 | 23.375 | 1 | .000 | 1.092 | |
creddebt | .502 | .081 | 38.608 | 1 | .000 | 1.653 | |
Constant | -1.228 | .231 | 28.222 | 1 | .000 | .293 | |
Step 4d | employ | -.242 | .028 | 74.540 | 1 | .000 | .785 |
address | -.081 | .020 | 17.043 | 1 | .000 | .922 | |
debtinc | .088 | .019 | 22.686 | 1 | .000 | 1.092 | |
creddebt | .572 | .087 | 43.043 | 1 | .000 | 1.773 | |
Constant | -.794 | .252 | 9.954 | 1 | .002 | .452 | |
a. Variable(s) entered on step 1: debtinc. | |||||||
b. Variable(s) entered on step 2: employ. | |||||||
c. Variable(s) entered on step 3: creddebt. | |||||||
d. Variable(s) entered on step 4: address. | |||||||
判别分析
这里采用逐步判别法,具体如下图
Variables in the Analysis | ||||
Step | Tolerance | F to Remove | Wilks' Lambda | |
1 | Debt to income ratio (x100) | 1.000 | 125.293 | |
2 | Debt to income ratio (x100) | .992 | 130.842 | .920 |
Years with current employer | .992 | 65.708 | .848 | |
3 | Debt to income ratio (x100) | .766 | 36.043 | .766 |
Years with current employer | .716 | 111.035 | .844 | |
Credit card debt in thousands | .573 | 44.384 | .775 | |
4 | Debt to income ratio (x100) | .766 | 35.137 | .753 |
Years with current employer | .691 | 89.788 | .809 | |
Credit card debt in thousands | .564 | 48.856 | .767 | |
Years at current address | .898 | 10.895 | .728 | |
Variables Not in the Analysis | |||||
Step | Tolerance | Min. Tolerance | F to Enter | Wilks' Lambda | |
0 | Age in years | 1.000 | 1.000 | 13.426 | .981 |
Level of education | 1.000 | 1.000 | 9.192 | .987 | |
Years with current employer | 1.000 | 1.000 | 60.518 | .920 | |
Years at current address | 1.000 | 1.000 | 19.231 | .973 | |
Household income in thousands | 1.000 | 1.000 | 3.534 | .995 | |
Debt to income ratio (x100) | 1.000 | 1.000 | 125.293 | .848 | |
Credit card debt in thousands | 1.000 | 1.000 | 44.688 | .940 | |
Other debt in thousands | 1.000 | 1.000 | 15.151 | .979 | |
1 | Age in years | .994 | .994 | 17.403 | .827 |
Level of education | .999 | .999 | 10.146 | .835 | |
Years with current employer | .992 | .992 | 65.708 | .775 | |
Years at current address | .993 | .993 | 23.970 | .819 | |
Household income in thousands | 1.000 | 1.000 | 3.029 | .844 | |
Credit card debt in thousands | .793 | .793 | 2.723 | .844 | |
Other debt in thousands | .664 | .664 | 8.594 | .837 | |
2 | Age in years | .725 | .723 | .003 | .775 |
Level of education | .983 | .977 | 4.404 | .770 | |
Years at current address | .912 | .911 | 6.601 | .767 | |
Household income in thousands | .604 | .599 | 17.057 | .756 | |
Credit card debt in thousands | .573 | .573 | 44.384 | .728 | |
Other debt in thousands | .486 | .486 | 1.980 | .772 | |
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系QQ:729038198,我们将在24小时内删除。
发表评论