“
regression as dummy variables Explain the importance of the error term plot Identify assumptions of regression, and know how to test and correct assumption violations Multiple regression is one of the most widely used multivariate statistical techniques for analyzing three or more variables. This chapter uses multiple regression to examine such relationships, and thereby extends the discussion in Chapter 14. The popularity of multiple regression is due largely to the ease with which it takes control variables (or rival hypotheses) into account. In Chapter 10, we discussed briefly how contingency tables can be used for this purpose, but doing so is often a cumbersome and sometimes inconclusive effort. By contrast, multiple regression easily incorporates multiple independent variables. Another reason for its popularity is that it also takes into account nominal independent variables. However, multiple regression is no substitute for bivariate analysis. Indeed, managers or analysts with an interest in a specific bivariate relationship will conduct a bivariate analysis first, before examining whether the relationship is robust in the presence of numerous control variables. And before conducting bivariate analysis, analysts need to conduct univariate analysis to better understand their variables. Thus, multiple regression is usually one of the last steps of analysis. Indeed, multiple regression is often used to test the robustness of bivariate relationships when control variables are taken into account. The flexibility with which multiple regression takes control variables into account comes at a price, though. Regression, like the t-test, is based on numerous assumptions. Regression results cannot be assumed to be robust in the face of assumption violations. Testing of assumptions is always part of multiple regression analysis. Multiple regression is carried out in the following sequence: (1) model specification (that is, identification of dependent and independent variables), (2) testing of regression assumptions, (3) correction of assumption violations, if any, and (4) reporting of the results of the final regression model. This chapter examines these four steps and discusses essential concepts related to simple and multiple regression. Chapters 16 and 17 extend this discussion by examining the use of logistic regression and time series analysis. MODEL SPECIFICATION Multiple regression is an extension of simple regression, but an important difference exists between the two methods: multiple regression aims for full model specification. This means that analysts seek to account for all of the variables that affect the dependent variable; by contrast, simple regression examines the effect of only one independent variable. Philosophically, the phrase identifying the key difference—“all of the variables that affect the dependent variable”—is divided into two parts. The first part involves identifying the variables that are of most (theoretical and practical) relevance in explaining the dependent
”
”
Evan M. Berman (Essential Statistics for Public Managers and Policy Analysts)