* This is an exercize in which we will simulate various degrees of multicollinearity. * In order to do this, we will have to introduce some new commands. As always, we will * begin with a clear command: *USP 2/2019 *Course: FLS6183 *Authors: Lorena Barberia & Maria Leticia Claro *Lab 5 - Multicollinearity simulation clear * The first command that we are going to introduce is a command to create a matrix. * In this command "mat" tells STATA that we are going to define a matrix. In this case, * we have named the matrix "C". The next step is to define the elements in the matrix. * This is done by row, with a comma between elements and a backslash ("\") separating each * row. mat C=(1,0\0,1) * This two by two matrix that we have just created will be our correlations between the * variables that we will generate. In this case, * 1 0 * 0 1 * Which is * corr(X1,X1) corr(X1,X2) * corr(X2,X1) corr(X2,X2) * Because simulations are based on random number generation, we will get a different set * of data each time that we run the same program unless we specify a seed #. For instance: set seed 999 * One useful command in STATA for helping us to simulate different data scenarios is "corr2data". * This command allows us to randomly generate a set of variables with a particular pattern of * correlation. For instance, if we wanted to generate a 10 observation data set with the * correlation structure that we defined above, we would issue the following command: corr2data x1 x2, n(10) corr (C) *where x1 and x2 are our new variables which will be generate, n is the number of observation and corr *the matrix correlation adopted to gen the new variables. * let's look at what we just generated. graph twoway scatter x1 x2 * Let's check if STATA followed our command. *The pwcorr command calculates pairwise correlation coefficients using all the available information. *This command has a advantage because it can give to us the statistical significance. pwcorr x1 x2, sig obs * By now you have probably guessed where we are heading. We have our two independent variables * x1 and x2. The next step is to generate a stochastic component: generate r=invnorm(uniform()) * And now, let's generate our Population Regression Function: generate y=.5 + x1 + x2 + r * What are the population parameters? * Now, let's see what happens when we estimate them with the sample data: regress y x1 x2 *Sometimes, we want to test a directonal hypothesis for our regression coefficients. *When we want to know if our slope or intercept is higher or lower than 0, we can *calculate the p-values directly from the regression output. When our estimated *coefficient is positive (as our slope), we can test three different hypothesis with the p-value as follows: *H0: intercept=0 p-value= = 0.078(given in the output) *H0: intercept<=0 p-value= 0.078/2= 0.039 *H0: intercept>=0 p-value=1-(0.078/2= 0.039)=0.961 *We can also use commands to calculate this in Stata test _b[x1]=0 local sign_x1 = sign(_b[x1]) display "Ho: coef <= 0 p-value = " ttail(r(df_r),`sign_x1'* sqrt(r(F))) display "Ho: coef >= 0 p-value = " 1-ttail(r(df_r),`sign_x1'* sqrt(r(F))) * "vif" is a command for estimating the variance inflation factor for each variable in the * most recently estimated model: vif *We also can be interested in the confidence interval associated with the correlation coefficient. *The extension ci2 for Stata (You need to install this command before use it) allows us to obtain these results. findit ci2 ci2 y x1, corr ci2 y x2, corr *What do you interpret from this output? * Now, it is your turn. Please complete the analysis in the assignment this week. To do so, you will need to change the correlations between the explanatory variables and the number of observations.