*USP 2/2019 *Course: FLS6183 *Authors: Lorena Barberia & Maria Leticia Claro *Lab 6 - Omitted Variable bias clear * Part a. No correlation between X and Z * We will create a data set with 500 observations. We are establishing that X has a mean of 7 and a standard deviation of 8. * We are estiablishing at Y with mean 100 and standard deviation of 20. We are also establishing that Z has a mean of 20 and a standard deviation of 2. * In addition, we will stipulate that the correlation between X and Z is 0, that the correlation between X and Y = 0.7 and the correlation between Y and Z = 0.3. * x, y and z are randomly drawn from a normal distribution. Please note in the matrix C below the order is (y, x, z) in each colunm and row. * In this case, we have named the matrix "V", "m" "sd". The next step is to define the elements in the matrix. * This is done by row, with a comma between elements and a backslash ("\") separating each * row. matrix C = (1, 0.6, 0.4 \ 0.6, 1, 0 \ 0.4, 0, 1) *We will simulate the means matrix m = (100,7,20) *We will simulate the standard errors matrix sd = (20,8,2) * For instance, if we wanted to generate a 150 observation data set with the * correlation structure that we defined above, we would issue the following command * drawnorm draws a sample from a multivariate normal distribution with desired means and covariance matrix. *The values generated are a function of the current random-number seed or the number specified with set seed() drawnorm y x z, n(500) means (m) sds(sd) corr(C) seed(12345) *where y, x and z are our new variables which will be generate, n is the number of observation and corr *the matrix correlation adopted to gen the new variables. * Now, let's see what happens when we estimate them with the sample data: eststo model1: regress y x predict u_hat, resid eststo model2: regress y x z predict u_hat_2, resid *We have going to use a new command to show the results. We want the betas, standard errors and the confidence intervals * it will show us the stars. Please note the format of the stars in the output. To see more options type "help estout" estout model1 model2, cells(b(star fmt(3)) se (fmt(3)) ci(par fmt(2))) stats(r2 N) /// legend label collabels(none) varlabels(_cons Constant) *Let's look the residuals kdensity u_hat, normal kdensity u_hat_2, normal scatter u_hat x || scatter u_hat_2 x * Part B. Correlation between X and Z =0.75 * We will create a data set with 500 observations. We are establishing that X has a mean of 7 and a standard deviation of 8. * We are estiablishing at Y with mean 100 and standard deviation of 20. We are also establishing that Z has a mean of 20 and a standard deviation of 2. * In addition, we will stipulate that the correlation between X and Z is 0.75, that the correlation between X and Y = 0.7 and the correlation between Y and Z = 0.3. * x, y and z are randomly drawn from a normal distribution. clear matrix m = (100,7,20) matrix sd = (20,8,2) matrix C = (1, 0.7, 0.3 \ 0.7, 1, 0.75 \ 0.3, 0.75, 1) drawnorm y x z, n(500) means (m) sds(sd) corr(C) seed(12345) eststo model4: regress y x predict u_hat, resid eststo model5: regress y x z predict u_hat_2, resid estout model4 model5, cells(b(star fmt(3)) se (fmt(3)) ci(par fmt(2))) stats(r2 N) /// legend label collabels(none) varlabels(_cons Constant) kdensity u_hat, normal kdensity u_hat_2, normal scatter u_hat x || scatter u_hat_2 x * Part C. Analysis with Dummy Variable * 1. No correlation between X and Z clear matrix m = (100,7,20) matrix sd = (20,8,2) matrix C = (1, 0.7, 0.3 \ 0.7, 1, 0 \ 0.3, 0, 1) drawnorm y x z, n(500) means (m) sds(sd) corr(C) seed(12345) corr x y z * Transform Z into a dummy variable equal to 0 below the mean, and 1 if above the mean. sum(z) return list gen zmean = r(mean) // we are saving on stata memories the z mean replace z=0 if z <= zmean // now we are using that information to (re)create z as a dummy replace z=1 if z > zmean corr x y z eststo model6: regress y x predict u_hat, resid eststo model7:regress y x z predict u_hat_2, resid estout model2 model5 model6 model7, cells(b(star fmt(3)) se (fmt(3)) ci(par fmt(2))) stats(r2 N) /// legend label collabels(none) varlabels(_cons Constant) scatter u_hat x || scatter u_hat_2 x * 2. Correlation between X and Z =0.75 clear matrix m = (100,7,20) matrix sd = (20,8,2) matrix C = (1, 0.7, 0.3 \ 0.7, 1, 0.75 \ 0.3, 0.75, 1) drawnorm y x z, n(500) means (m) sds(sd) corr(C) seed(12345) corr x y z sum(z) return list gen zmean = r(mean) replace z=0 if z <= zmean replace z=1 if z > zmean corr x y z // the correlation between z and other variables change when we transform the variable eststo model8: regress y x predict u_hat, resid eststo model9: regress y x z predict u_hat_2, resid estout model8 model9, cells(b(star fmt(3)) se (fmt(3)) ci(par fmt(2))) stats(r2 N) /// legend label collabels(none) varlabels(_cons Constant) scatter u_hat x || scatter u_hat_2 x * Advanced Examples with unit effects * Create ID Variable that is correlated with X egen newid = group(x) set matsize 550 regress y x z i.newid regress y x i.newid * Create ID Variable that is not correlated with X egen newid2=group(random)