*USP 2/2017 *Course: FLS6183 *Author: Lorena Barberia *Lab 1 - What is the difference between hypothesis tests using a difference *of means test versus a bivariate regression model *when you have a categorical dummy variable as the explanatory variable? *Our objective here is compare a Difference of Means Test with Hypothesis Testing for a Bivariate Regression. *When should you use a bivariate regression? Are there any differences *between these two hypothesis tests? *In order to examine this question, we will use our own simulated data. *First of all, we need to set a seed. Setting a seed will allow us *to replicate our analysis. clear set seed 12345 *We will also define how many observation we want to generate. For our exercise, we *will set the number of obs as 100. set obs 100 *It seems very similar with the normal distribution. It is likely that *if we increase our sample size they would be even more similar. *We will first create a random varible (z). We will then use Z to create our dummy variable (X). gen z=rnormal() *Now, let's get a sense of our data with summary statistics. sum z, detail kdensity z *Now that we have our auxiliary variable, z, we will create our dummy variable. *We define x=0 for z<0 and x=1 for z>=0. gen x=0 replace x=1 if z>=0 *Now, let's get a sense of our data with summary statistics. sum x, detail histogram x, percent *Now, let`s gen our stochastic term gen r=invnorm(uniform()) *We are going to use your x and r to gen our y: gen y = 1 + 2*x +r *Now, let's get a sense of our data with summary statistics. sum y, detail *First, let us carry out a hypothesis test *for a single variable using our variable y, which is a continous variable. ttest y=0.5 *Can you visually illustrate the results of the hypothesis test? *Now, let's consider a new problem. *Our dummy variable could be a characteristic for which we want to test *if there are differences between groups, for example. *How can we carry out hypothesis testing in this case in which we have *a continous variable (y) which differs between two groups? *One possiblity would be to carry out a difference of means test. *Another possiblity would be to use a bivariate regression model. *Are there any differences in carrying out a hypothesis test of the difference *of the mean of y by group type (X=1 or X-0), or testing this hypothesis using *a bivariate regression in which we test the statistical significance of our *dummy independent variable? When is a bivariate regression a more appropriate hypothesis test? *As x is a categorical variable, we can examine the mean of y by type of x case. *To do so, we first want to sort all of the cases by x type. sort x by x: sum y *In this case, is the mean higher when X=1 or X=0? *Let's now do a t-test to test if there are differences assuming that the variances in each sample group are equal. ttest y, by(x) *Now, let's estimate a bivariate regression model. reg y x *What can you observe in these results? Are they similar or different? How? *Do you think there is a preferred method in this particular case for these *type of data (y= continous and x=dummy)? *Are there any advantages of using a regression in this case? *To help you think through your answer, let's look at a scatterplot *of the data and plot our estimated regression line. scatter y x