* FLS 6183 * FLS 468 ** 

* Lab -- Introducing Linear Probability Models and Logits


***********************************************************************
* (A) LINEAR PROBABILITY MODEL
***********************************************************************

* Open the data set
use "C:\Documents and Settings\Lorena\Meus documentos\Dropbox\Research Methods\Fundamentals\nes2004subset.dta" 

* Create a log file
log using "C:\Documents and Settings\Lorena\Meus documentos\Dropbox\Research Methods\USP FLS 6183\Laboratorios\lab 1.smcl", replace

* Let´s look at the distributions of our variables.  
sum bush partyid eval_WoT eval_HoE
tab bush
tab partyid
tab  eval_WoT 
tab eval_HoE


* Linear Probability Model Estimation
regress bush partyid eval_WoT eval_HoE
estimates store lpm
*Let us use estout, a special package to help us organize our output
estout lpm
* Let us now add the standard errors and asterisks to make our results easier to interpret. 
estout lpm, cells(b(star fmt(3)) se (par fmt(2)))
* even more pretty 
estout lpm, cells(b(star fmt(3)) se (par fmt(2)))  stats(r2_a rmse N, fmt(%9.3f %9.0g) labels(R-squared))  legend label collabels(none) varlabels(_cons Constant)

* Now let us try to understand the model and how it works in this case.

* First, let´s look at the residuals of the model.

predict uhat, resid
label variable uhat "residuals"
sum uhat, detail
kdensity uhat

* Second, let us examine the predicted values. 
predict yhat, xb
sum yhat
kdensity yhat

* Interpreting Parameters in the LPM

*P(Y=1) =  

* What is the predicted probabilty of voting for Bush if an individual changed has a party identification of 3 when all other variables are valued at their mean?
di .0888*3+.0790844 *0 +  .076001  *0 + .2459423

* alternatively

lincom _b[_cons] + _b[partyid]*3+_b[ eval_WoT]*0+_b[  eval_HoE]*0

* What is the  estimated effect on the predicted probabilty of voting for Bush if an individual changed his party identification from 3 to 4 when all other variables are valued at their mean?

di .0888*4+.0790844 *0 +  .076001  *0 + .2459423

* alternatively
lincom _b[_cons] + _b[partyid]*4+_b[ eval_WoT]*0+_b[  eval_HoE]*0

di .6011423-.5123423

* Now let us examine the actual observed values for the data and the predicted probabilty that yhat=1.

list bush partyid eval_WoT eval_HoE yhat

rvfplot, yline(0)

* Finally, let's examine the problems with the LPM Model. 

* heteroskedasticity

estat hettest
rvfplot, yline(0)

* normal distribution of the error term

kdensity uhat, normal

* linearity in the parameters

graph twoway (scatter yhat partyid) (lfit yhat partyid)

*****************************************************************

* (B) GLM and MLE
***********************************************************************

* Let us compare GLM and OLS estimates

glm bush partyid eval_WoT eval_HoE, family(binomial 1) link (logit)

logit bush partyid eval_WoT eval_HoE 


***********************************************************************
* (C) Maximum Likelihood Estimation (MLE) for Binary Outcomes
***********************************************************************

* Returning to our example, let´s review the estimation of the logit model for the vote for bush in 2004

logit bush partyid eval_WoT eval_HoE 

* If we want to see the exponentiated coefficients, we would estimate the same model with ",or" to give us the odds ratios.

logit bush partyid eval_WoT eval_HoE, or

* To obtain the predicted log odds for a "Mega Republican"

adjust partyid  =6 eval_WoT=-2 eval_HoE=2, xb

* To obtain the odds

di exp(2.7477)

* To obtain the predicted probability for a "Mega Republican"

adjust partyid  =6 eval_WoT=-2 eval_HoE=2, pr

***********************************************************************
* (D) Maximum Likelihood Estimation (MLE) for Binary Outcomes
***********************************************************************

* Logit for intercept only
logit bush
adjust, pr