“
(e). Hence the expressions are equivalent, as is y = ŷ + e. Certain assumptions about e are important, such as that it is normally distributed. When error term assumptions are violated, incorrect conclusions may be made about the statistical significance of relationships. This important issue is discussed in greater detail in Chapter 15 and, for time series data, in Chapter 17. Hence, the above is a pertinent but incomplete list of assumptions. Getting Started Conduct a simple regression, and practice writing up your results. PEARSON’S CORRELATION COEFFICIENT Pearson’s correlation coefficient, r, measures the association (significance, direction, and strength) between two continuous variables; it is a measure of association for two continuous variables. Also called the Pearson’s product-moment correlation coefficient, it does not assume a causal relationship, as does simple regression. The correlation coefficient indicates the extent to which the observations lie closely or loosely clustered around the regression line. The coefficient r ranges from –1 to +1. The sign indicates the direction of the relationship, which, in simple regression, is always the same as the slope coefficient. A “–1” indicates a perfect negative relationship, that is, that all observations lie exactly on a downward-sloping regression line; a “+1” indicates a perfect positive relationship, whereby all observations lie exactly on an upward-sloping regression line. Of course, such values are rarely obtained in practice because observations seldom lie exactly on a line. An r value of zero indicates that observations are so widely scattered that it is impossible to draw any well-fitting line. Figure 14.2 illustrates some values of r. Key Point Pearson’s correlation coefficient, r, ranges from –1 to +1. It is important to avoid confusion between Pearson’s correlation coefficient and the coefficient of determination. For the two-variable, simple regression model, r2 = R2, but whereas 0 ≤ R ≤ 1, r ranges from –1 to +1. Hence, the sign of r tells us whether a relationship is positive or negative, but the sign of R, in regression output tables such as Table 14.1, is always positive and cannot inform us about the direction of the relationship. In simple regression, the regression coefficient, b, informs us about the direction of the relationship. Statistical software programs usually show r rather than r2. Note also that the Pearson’s correlation coefficient can be used only to assess the association between two continuous variables, whereas regression can be extended to deal with more than two variables, as discussed in Chapter 15. Pearson’s correlation coefficient assumes that both variables are normally distributed. When Pearson’s correlation coefficients are calculated, a standard error of r can be determined, which then allows us to test the statistical significance of the bivariate correlation. For bivariate relationships, this is the same level of significance as shown for the slope of the regression coefficient. For the variables given earlier in this chapter, the value of r is .272 and the statistical significance of r is p ≤ .01. Use of the Pearson’s correlation coefficient assumes that the variables are normally distributed and that there are no significant departures from linearity.7 It is important not to confuse the correlation coefficient, r, with the regression coefficient, b. Comparing the measures r and b (the slope) sometimes causes confusion. The key point is that r does not indicate the regression slope but rather the extent to which observations lie close to it. A steep regression line (large b) can have observations scattered loosely or closely around it, as can a shallow (more horizontal) regression line. The purposes of these two statistics are very different.8 SPEARMAN’S RANK CORRELATION
”
”
Evan M. Berman (Essential Statistics for Public Managers and Policy Analysts)