* FLS 6183 * FLS 468 ** * Lab -- Introducing Linear Probability Models and Logits *********************************************************************** * (A) LINEAR PROBABILITY MODEL *********************************************************************** * Open the data set use "C:\Documents and Settings\Lorena\Meus documentos\Dropbox\Research Methods\Fundamentals\nes2004subset.dta" * Create a log file log using "C:\Documents and Settings\Lorena\Meus documentos\Dropbox\Research Methods\USP FLS 6183\Laboratorios\lab 1.smcl", replace * Letīs look at the distributions of our variables. sum bush partyid eval_WoT eval_HoE tab bush tab partyid tab eval_WoT tab eval_HoE * Linear Probability Model Estimation regress bush partyid eval_WoT eval_HoE estimates store lpm *Let us use estout, a special package to help us organize our output estout lpm * Let us now add the standard errors and asterisks to make our results easier to interpret. estout lpm, cells(b(star fmt(3)) se (par fmt(2))) * even more pretty estout lpm, cells(b(star fmt(3)) se (par fmt(2))) stats(r2_a rmse N, fmt(%9.3f %9.0g) labels(R-squared)) legend label collabels(none) varlabels(_cons Constant) * Now let us try to understand the model and how it works in this case. * First, letīs look at the residuals of the model. predict uhat, resid label variable uhat "residuals" sum uhat, detail kdensity uhat * Second, let us examine the predicted values. predict yhat, xb sum yhat kdensity yhat * Interpreting Parameters in the LPM *P(Y=1) = * What is the predicted probabilty of voting for Bush if an individual changed has a party identification of 3 when all other variables are valued at their mean? di .0888*3+.0790844 *0 + .076001 *0 + .2459423 * alternatively lincom _b[_cons] + _b[partyid]*3+_b[ eval_WoT]*0+_b[ eval_HoE]*0 * What is the estimated effect on the predicted probabilty of voting for Bush if an individual changed his party identification from 3 to 4 when all other variables are valued at their mean? di .0888*4+.0790844 *0 + .076001 *0 + .2459423 * alternatively lincom _b[_cons] + _b[partyid]*4+_b[ eval_WoT]*0+_b[ eval_HoE]*0 di .6011423-.5123423 * Now let us examine the actual observed values for the data and the predicted probabilty that yhat=1. list bush partyid eval_WoT eval_HoE yhat rvfplot, yline(0) * Finally, let's examine the problems with the LPM Model. * heteroskedasticity estat hettest rvfplot, yline(0) * normal distribution of the error term kdensity uhat, normal * linearity in the parameters graph twoway (scatter yhat partyid) (lfit yhat partyid) ***************************************************************** * (B) GLM and MLE *********************************************************************** * Let us compare GLM and OLS estimates glm bush partyid eval_WoT eval_HoE, family(binomial 1) link (logit) logit bush partyid eval_WoT eval_HoE *********************************************************************** * (C) Maximum Likelihood Estimation (MLE) for Binary Outcomes *********************************************************************** * Returning to our example, letīs review the estimation of the logit model for the vote for bush in 2004 logit bush partyid eval_WoT eval_HoE * If we want to see the exponentiated coefficients, we would estimate the same model with ",or" to give us the odds ratios. logit bush partyid eval_WoT eval_HoE, or * To obtain the predicted log odds for a "Mega Republican" adjust partyid =6 eval_WoT=-2 eval_HoE=2, xb * To obtain the odds di exp(2.7477) * To obtain the predicted probability for a "Mega Republican" adjust partyid =6 eval_WoT=-2 eval_HoE=2, pr *********************************************************************** * (D) Maximum Likelihood Estimation (MLE) for Binary Outcomes *********************************************************************** * Logit for intercept only logit bush adjust, pr