*USP 2/2019 *Course: FLS6183 *Authors: Lorena Barberia & Maria Leticia Claro *Lab 2 - Bivariate vs multivariate linear regression models. * How to interpret multivariate OLS models and confidence intervals? * *Today, we will use our own simulated data again similar to the prior lab. *First of all, we need to set a seed. Setting a seed will allow us to replicate our analysis. clear set seed 77555 *We will also define how many observations we want to generate. For our exercise, we * will generate 100 observations. set obs 100 * Creating X, Z and Y *Again, we will first create a random varible(z). gen w=rnormal() *Now that we have our auxiliary variable, w, we will create our dummy variable. *We define w=0 for z<0 and w=1 for z>=0. gen z=0 replace z=1 if w<0 histogram z, percent * We will create a random variable (x). We are saying to stata gen a random variable with *a normal distributoin with mean 3 and sd 0.7. gen x=rnormal(3,0.7) *Now, let's get a sense of our data with summary statistics. sum x, detail kdensity x *Now, let`s gen our stochastic term, or the error. gen r=invnorm(uniform()) sum r, detail kdensity r ** Estimation and Interpretation of Models * Case 1. Bivariate Regression Model with x as a continuous variable gen y = 1 + 1.5*x + r *Now, let's get a sense of our data with summary statistics. sum y, detail * Let's regress y and x and see the outputs. Try to describe the results. regress y x * We know the "real" effect of x on y because we "created" y values based on a specific coefficient value for x. * We also know the effect of y when X is 0. display _b[_cons] display _b[x] *We can calculate the predicted values of y and the residuals to compare them with * the observed values of y. qui reg y x predict y_hat predict u_hat, residuals list y y_hat u_hat in 1/50 *We have some assumptions about our residuals. One of them is that the mean of residuals *should be 0. Let's see it in our simulation sum u_hat sum y_hat *Another way to interpret the relationship between two variables is visually. *This graph shows us the predicted values with 95% confidence interval. How *do you interpret this graph and the CI twoway lfitci y x *Now, let's plot the observed values and the predicted values. twoway lfitci y x || scatter y x * The confidence intervals can help us to interpret our estimated results in OLS. * One very useful command in Stata is coefplot, which we can use after the regression command * to visually display the confidence interval plot for the estiamted coefficients. * Please run the command below and interpret what you observe visually displayed. qui reg y x coefplot, drop(_cons) xline(0) * Stata has a very useful help file which we can always use to request more information on a specific command. * Coefplot has many options. Be sure to read through the help file for coefplot. help coefplot *Now it is your turn. For the two cases below, please run the same commands to estimate the regression. * In the case of the bivariate regression, you can also once again analyze how the model compares with the observed values. drop y // we must drop the variable y and gen a new one * Case 2. Bivariate Regression Model with z as a dichotmous variable gen y = 1 + 2*z + r * Case 3. Multivariate Regression Model with x and z *We are going to use x, z and r to generate y: gen y = 1 + 1.5*x + 2*z + r