* Lorena G. Barberia * Adapted from Andy Philips (2013) * 11/1/2018 * * Bootstrapping and Clarify * In this lab, we will briefly explore how bootstrapping and Clarify work. * * ----------------------------------------------------------------------- * This .do file is based on generated data * Data Generating Process: clear set seed 345 set obs 120 gen e1 = rnormal() gen x1 = rnormal() gen x2 = rnormal() gen y = 2*x1 + 3*x2 + e1 kdensity y * Now let's examine the regression results of the DGP regress y x1 x2 estimates store m1_original * let's examine the residuals and fitted values predict uhat_m1_original, resid predict yhat_m1_original, xb sum uhat_m1_original yhat_m1_original * BOOTSTRAPPING -------------------------------------------------------------- /* bootstrapping standard errors from a statistic can be used by the following: 1. write program (if you have a custom statistic program) 2. load in data 3. drop missing values (STATA will not discern if you have missing values) 4. drop unneeded variables (this speeds up a bootstrap) 5. set seed 6. run bootstrap */ * first drop any missing obs foreach var in y x1 x2 { drop if `var' == . } reg y x1 x2 bootstrap, reps(1000): regress y x1 x2 estimates store m2_bootstrap predict uhat_m1_bootstrap, resid predict yhat_m1_bootstrap, xb sum uhat_m1_original uhat_m1_bootstrap yhat_m1_original yhat_m1_bootstrap coefplot m1_original m2_bootstrap, drop(_cons) xline(0) /* note how the t-scores are slightly lower in the bootstrap, but the coeffs. remain the same. for speed we can drop unneeded vars. */ * Clarify -------------------------------------------------------------- * net from http://gking.harvard.edu/clarify * net install clarify estsimp regress y x1 x2 * Note the labels for b1, b2, b3, b4 * b4 is sigma"2 * b1 and b2 are the betas for the regression parameters * b3 is the simulated parameter for the constant sum * How do the coefficient standard errors compare with the original standard errors for the coefficients? * standard error of b1 di .0881776/sqrt(1000) * confidence interval of b1 di 1.9432 + (.00278842*1.96) di 1.9432 - (.00278842*1.96) * standard error of b2 di .1115355/sqrt(1000) * confidence interval of b2 di 2.792914+(.00352706*1.96) di 2.792914 - (.00352706*1.96) * Now let's compare these outputs with the uhat above. gen uhat_clarify=sqrt( b4 ) sum uhat_m1_original uhat_m1_bootstrap uhat_clarify * Let's compare predicted values of y with pv and ev setx mean simqi, pv simqi, ev * let's compare with the original yhat and yhat under bootstrap * note that these are standard deviations and we would need to calculate the standard errors to compare them directly sum yhat_m1_original yhat_m1_bootstrap drop b1 b2 b3 b4 * Questions * 1. How do the estimates from the bootstrap model compare to the original results? Why? * 2. How do the estimates from Clarify compare with the original regression estimates? Why? * 3. How do the standard errors from Clarify compare to the original results? Why? * 4. Why do the Clarify results with expected values and predicted values differ ?