* USP Prof. Lorena G Barberia
* 2019 
* Lab Class 9 - Robustness tests
* Outliers and Jackknife

* For this lab we will continue working with Cox's data testing Duverger's theory on the determinants of the number of parties.

cd "C:\Users\Gui\Dropbox\2019 USP Methods II\Labs\Lab Class 9 - Four Approaches to Robustness Testing"
use "Cox Data.dta", c

* First, let's look at the data to identify outliers for each of our variables:


graph box enps 
graph box eneth 
graph box ml 

* As we can see, there are outliers in our main variables. 

* One way of dealing  with oddly distributed data is working with the log of it:

gen lnenps = ln(enps)

graph box lnenps
graph box lmleneth
graph box lnml

* Even with less outliers, we still have them and these cases could be biasing our results.

* We can check if any observations look like outliers in relation to our dependent variable as well:

twoway (scatter enps eneth) (lfit enps eneth)
twoway (scatter enps ml) (lfit enps ml)

* We can also look at these same figures in terms of the logs of our variables:
twoway (scatter lnenps lmleneth) (lfit lnenps lmleneth)
twoway (scatter lnenps lnml) (lfit lnenps lnml)


* These figures look much better in terms of outliers, and maybe it would be better working with logs. 

* For now, let's let us return to our original variables as we analyzed previously in the last lab.

* Can you find some countries that are outlier for each variable?

* One strategy for trying to assess if we have bias due to outuliers is to remove these cases from our sample. 
* What are the costs and benefits ?

* First, let's examine our regression results with the entire sample. 
findit eststo
eststo m1: reg enps c.eneth##c.lnml
eststo m2: reg enps c.eneth##c.lnml, rob
esttab m1 m2, se

* Now, let's examine our regression results excluding some specific cases.

* What differences do you observe in the models? Are you more confident of these results or the earlier full sample results?

eststo m3:  reg enps c.eneth##c.lnml if ctry != "BELGIUM" & ctry != "NETHERLANDS"  &  ctry != "ISRAEL"
esttab m1 m2 m3, se


* Now, it is your turn. Please consider the outlier cases and perform your own analyis without the cases which you identify as most problematic.  



* Jackknife:

/* 
There are other ways we can deal with increasing the robustness of our hypothesis tests. 
One technique called "jackknifing", is described by Neumayer and Plumper in the assigned readings for this week. 
jackknife runs regressions removing each observation, in our case it would run 54 regressions, one without each
country.
*/

* The command below runs different regressions excluding one country at a time.

gen ctrynumber = _n
gen beta_lnml = 0
gen beta_eneth = 0
gen beta_interaction = 0
gen interaction = lnml*eneth

levelsof ctrynumber, local(ctrynumber)
foreach i of local ctrynumber {
    display "regression without ctry `i'"
    regress enps eneth lnml interaction if ctrynumber  != `i'
	replace beta_lnml = _b[eneth] if ctrynumber  == `i'
	replace beta_eneth = _b[lnml] if ctrynumber  == `i'
	replace beta_interaction = _b[interaction] if ctrynumber  == `i'

}

* What do you see on the different regressions runned? Can you identify the ones
* with results further away from the general regression? Are those the same outliers
* you found previously?



* Now let us work with the jacknife command:

* It will run the 54 regressions we estimated previouly saving the output from each regression 
* (where one country is dropped from the sample at a time).

findit jknife

reg enps eneth lnml
jackknife, noisily: reg enps eneth lnml /*this option shows all 54 regressions*/

jackknife: reg enps eneth lnml

reg enps eneth lnml, vce(jackknife) /*this option shows the final model based on the 54 regressions*/

eststo m4: reg enps c.eneth##c.lnml, vce(jackknife)

esttab m1 m2 m3 m4, se


* Can you identify what changed with the jackknife command? Why is it a robustness test?