Now the estimates for bo and b90 and -14088, respectively, leading once again to a prediction of average salary of 115090 for males and a prediction of 115090 - 14088 = 101002 for females.Īlternatively, instead of a 0/1 coding scheme, we could create a dummy variable -1 (male) / 1 (female). The fact that the coefficient for sexFemale in the regression output is negative indicates that being a Female is associated with decrease in salary (relative to Males). The output of the regression fit becomes: model |t|) You can use the function relevel() to set the baseline category to males as follow: Salaries % The decision to code males as 1 and females as 0 (baseline) is arbitrary, and has no effect on the regression computation, but does alter the interpretation of the coefficients. R has created a sexMale dummy variable that takes on a value of 1 if the sex is Male, and 0 otherwise. The contrasts() function returns the coding that R have used to create the dummy variables: contrasts(Salaries$sex) # Male The p-value for the dummy variable sexMale is very significant, suggesting that there is a statistical evidence of a difference in average salary between the genders. # (Intercept) 101002 4809 21.00 2.68e-66įrom the output above, the average salary for female is estimated to be 101002, whereas males are estimated a total of 101002 + 14088 = 115090. R creates dummy variables automatically: # Compute the model and b1 is the average difference in salary between males and females.įor simple demonstration purpose, the following example models the salary difference between males and females by computing a simple linear regression model on the Salaries data set.b0 + b1 is the average salary among males,.b0 is the average salary among females,.The coefficients can be interpreted as follow: Suppose that, we wish to investigate differences in salaries between males and females.īased on the gender variable, we can create a new dummy variable that takes the value:Īnd use this variable as a predictor in the regression equation, leading to the following the model: b0 and `b1 are the regression beta coefficients, representing the intercept and the slope, respectively. Recall that, the regression equation, for predicting an outcome variable (y) on the basis of a predictor variable (x), can be simply written as y = b0 + b1*x.