* Lorena G. Barberia	
* Adapted from Andy Philips (2013)
* 31/10/2019

* Lab Class 10 - Clarify

*	Bootstrapping and Clarify


* In this lab, we will briefly explore how bootstrapping and Clarify work.
*
*	-----------------------------------------------------------------------
*	This .do file is based on generated data

* Data Generating Process:
clear
set seed 345
set obs 120
gen e1 = rnormal()
gen x1 = rnormal()
gen x2 = rnormal()
sum e1 x1 x2
gen y = 2*x1 + 3*x2 + e1
kdensity y

* Now let's examine the regression results of the DGP
regress y x1 x2
estimates store m1_original

* let's examine the residuals and fitted values
predict uhat_m1_original, resid 
predict yhat_m1_original, xb 
sum uhat_m1_original yhat_m1_original


* BOOTSTRAPPING	--------------------------------------------------------------
/* bootstrapping standard errors from a statistic can be used by the following:
	1. write program (if you have a custom statistic program)
	2. load in data
	3. drop missing values (STATA will not discern if you have missing 
	values)
	4. drop unneeded variables (this speeds up a bootstrap)
	5. set seed
	6. run bootstrap
*/

* first drop any missing obs
foreach var in y x1 x2 {
	drop if `var' == .
}

reg y x1 x2
bootstrap, reps(1000): regress y x1 x2
estimates store m2_bootstrap 
predict uhat_m2_bootstrap, resid 
predict yhat_m2_bootstrap, xb 
sum uhat_m1_original uhat_m2_bootstrap yhat_m1_original yhat_m2_bootstrap

coefplot m1_original m2_bootstrap, drop(_cons) xline(0)

/* 

Can you identify the difference produced by the bootstraping? 


*/
 
* Jacknife --------------------------------------------------------------

reg y x1 x2, vce(jackknife) 
estimates store m3_jacknife 
predict uhat_m3_jacknife, resid 
predict yhat_m3_jacknife, xb 
sum uhat_m1_original uhat_m2_bootstrap uhat_m3_jacknife yhat_m1_original yhat_m2_bootstrap yhat_m3_jacknife

coefplot m1_original m2_bootstrap m3_jacknife, drop(_cons) xline(0)


/*



*/

* Clarify --------------------------------------------------------------


* net from http://gking.harvard.edu/clarify

* net install clarify

/* Clarify can do faster some of the things we have done this far, look at the
help file for more information on the commands we will use for this lab*/

estsimp regress y x1 x2 

* Note the labels for b1, b2, b3, b4
* b4 is sigma^2
* b1 and b2 are the betas for the regression parameters
* b3 is the simulated parameter for the constant 

sum

* How do the coefficient standard errors compare with the original standard errors for the coefficients?
* What about the coefficients? What differences do you recognize?

* Now let's compare these outputs with the uhat above.

gen uhat_clarify=sqrt( b4 )

sum uhat_m1_original uhat_m2_bootstrap uhat_m3_jacknife uhat_clarify

* Let's compare predicted values of y with pv and ev

drop b1 b2 b3 b4
estsimp regress y x1 x2 

setx mean

simqi, pv
simqi, ev

* let's compare with the original yhat and yhat under bootstrap
* note that these are standard deviations and we would need to calculate the standard errors to compare them directly 
sum yhat_m1_original yhat_m2_bootstrap yhat_m3_jacknife


* Questions 
* 1. How do the estimates from the bootstrap model compare to the original results? Why? 
* 2. How do the estimates from jacknife compare with the original regression estimates? Why? 
* 3. How do the estimates from Clarify compare with the original regression estimates? Why? 
* 4. How do the standard errors from Clarify compare to the original, bootstrap and jacknife results? Why? 
* 4. Why does the Clarify simqui command with expected values (ev option) and predicted values (pv option) differ?
* 5. How does running Clarify relate to the robustness tests we ran for the last lab? What are the differences?
* 6. What are the advantages and disadvantages related to each of the methods we discuss in this lab?