Linear Regression Quotes

We've searched our database for all the quotes and captions related to Linear Regression. Here they are! All 64 of them:

Working an integral or performing a linear regression is something a computer can do quite effectively. Understanding whether the result makes sense—or deciding whether the method is the right one to use in the first place—requires a guiding human hand. When we teach mathematics we are supposed to be explaining how to be that guide. A math course that fails to do so is essentially training the student to be a very slow, buggy version of Microsoft Excel.
Jordan Ellenberg (How Not to Be Wrong: The Power of Mathematical Thinking)
History fancies itself linear - but yields to a cyclical temptation.
Criss Jami (Healology)
You can do linear regression without thinking about whether the phenomenon you’re modeling is actually close to linear. But you shouldn’t.
Jordan Ellenberg (How Not to Be Wrong: The Power of Mathematical Thinking)
In principle, more analytic power can be achieved by varying multiple things at once in an uncorrelated (random) way, and doing standard analysis, such as multiple linear regression. In practice, though, A/B testing is widely used, because A/B tests are easy to deploy, easy to understand, and easy to explain to management.
Christopher D. Manning (Introduction to Information Retrieval)
One day, Carmona had an idea. Axcom had been employing various approaches to using their pricing data to trade, including relying on breakout signals. They also used simple linear regressions, a basic forecasting tool relied upon by many investors that analyzes the relationships between two sets of data or variables under the assumption those relationships will remain linear. Plot crude-oil prices on the x-axis and the price of gasoline on the y-axis, place a straight regression line through the points on the graph, extend that line, and you usually can do a pretty good job predicting prices at the pump for a given level of oil price.
Gregory Zuckerman (The Man Who Solved the Market: How Jim Simons Launched the Quant Revolution)
Just as functioning isn't uniform, it isn't linear either. There is a commonly seen phenomenon in autistic children where they'll make big gains in elementary school then regress when they hit adolescence. Or a child will be labeled a late bloomer, seeming practically "normal" in their teen years, then seem to backslide dramatically when they go off to college or enter the adult world of work and independent living
Cynthia Kim (Nerdy, Shy, and Socially Inappropriate: A User Guide to an Asperger Life)
The prediction of false rape-related beliefs (rape myth acceptance [RMA]) was examined using the Illinois Rape Myth Acceptance Scale (Payne, Lonsway, & Fitzgerald, 1999) among a nonclinical sample of 258 male and female college students. Predictor variables included measures of attitudes toward women, gender role identity (GRI), sexual trauma history, and posttraumatic stress disorder (PTSD) symptom severity. Using linear regression and testing interaction effects, negative attitudes toward women significantly predicted greater RMA for individuals without a sexual trauma history. However, neither attitudes toward women nor GRI were significant predictors of RMA for individuals with a sexual trauma history." Rape Myth Acceptance, Sexual Trauma History, and Posttraumatic Stress Disorder Shannon N. Baugher, PhD, Jon D. Elhai, PhD, James R. Monroe, PhD, Ruth Dakota, Matt J. Gray, PhD
Shannon N. Baugher
All we may expect of time is its reversibility. Speed and acceleration are merely the dream of making time reversible. You hope that by speeding up time, it will start to whirl like a fluid. It is a fact that, as linear time and history have retreated, we have been left with the ephemerality of networks and fashion, which is unbearable. All that remain are the rudiments of a supratemporal peripeteia—a few short sequences, a few whirling moments, like the ones physicists observe in certain particles.
Jean Baudrillard (Cool Memories)
The problem is that an overemphasis on linear time tends to magnify the pain we feel when joy ebbs. If we view the future as a blank, uncertain space, then it’s hard to trust that joy will return once it has gone. Each downswing of joy feels like a regression, each nadir like stagnation. But if instead we can rely on the repetition of certain delights at regular intervals, then the wavelike quality of joy becomes more present in our lives. Cycles create a symmetry between past and future that reminds us joy will come back again.
Ingrid Fetell Lee (Joyful: The Surprising Power of Ordinary Things to Create Extraordinary Happiness)
Jerry Hirshberg, in his book The Creative Priority: Putting Innovation to Work in Your Business, writes, No one in a corporation deliberately sets out to stifle creative thought. Yet, a traditional bureaucratic structure, with its need for predictability, linear logic, conformance to accepted norms, and the dictates of the most recent “long-range” vision statement, is a nearly perfect idea-killing machine. People in groups regress toward the security of the familiar and the well-regulated. Even creative people do it. It’s easier. It avoids the ambiguity, the fear of unpredictability, the threat of the unfamiliar, and the messiness of intuition and human emotion.
John C. Maxwell (The 15 Invaluable Laws of Growth: Live Them and Reach Your Potential)
Econometrics is the application of classical statistical methods to economic and financial series. The essential tool of econometrics is multivariate linear regression, an 18th-century technology that was already mastered by Gauss before 1794. Standard econometric models do not learn. It is hard to believe that something as complex as 21st-century finance could be grasped by something as simple as inverting a covariance matrix.
Marcos López de Prado (Advances in Financial Machine Learning)
The key takeaway is that correlation is an understandable equation that relates the amount of change in x and y.  If the two variables have consistent change, there will be a high correlation; otherwise, there will have a lower correlation.    
Scott Hartshorn (Linear Regression And Correlation: A Beginner's Guide)
liter. The correlation value, r, will be the same for either set of units.  Note however that the slope of the regression line won’t be the same since And the standard deviation parts of the equation still have units baked into them.     Correlation Takeaways We did a lot of looking at equations in this section.  What are the key takeaways? The
Scott Hartshorn (Linear Regression And Correlation: A Beginner's Guide)
The p-value for each independent variable tests the null hypothesis that the variable has no relationship with the dependent variable.
Jim Frost (Regression Analysis: An Intuitive Guide for Using and Interpreting Linear Models)
In statistics, correlation is a quantitative assessment that measures both the direction and the strength of this tendency to vary together.
Jim Frost (Regression Analysis: An Intuitive Guide for Using and Interpreting Linear Models)
Pearson’s correlation takes all of the data points on this graph and represents them with a single summary statistic. In this case, the statistical output below indicates that the correlation is 0.705.
Jim Frost (Regression Analysis: An Intuitive Guide for Using and Interpreting Linear Models)
Pearson’s correlation coefficient is represented by the Greek letter rho (ρ) for the population parameter and r for a sample statistic. This coefficient is a single number that measures both the strength and direction of the linear relationship between two continuous variables. Values can range from -1 to +1.
Jim Frost (Regression Analysis: An Intuitive Guide for Using and Interpreting Linear Models)
When the value is in-between 0 and +1/-1, there is a relationship, but the points don’t all fall on a line. As r approaches -1 or 1, the strength of the relationship increases and the data points tend to fall closer to a line.
Jim Frost (Regression Analysis: An Intuitive Guide for Using and Interpreting Linear Models)
Pearson’s correlation coefficient is unaffected by scaling issues. Consequently, a statistical assessment is better for determining the precise strength of the relationship.
Jim Frost (Regression Analysis: An Intuitive Guide for Using and Interpreting Linear Models)
Pearson’s correlation measures only linear relationships. Consequently, if your data contain a curvilinear relationship, the correlation coefficient will not detect it.
Jim Frost (Regression Analysis: An Intuitive Guide for Using and Interpreting Linear Models)
Correlations have a hypothesis test. As with any hypothesis test, this test takes sample data and evaluates two mutually exclusive statements about the population from which the sample was drawn. For Pearson correlations, the two hypotheses are the following: Null hypothesis: There is no linear relationship between the two variables. ρ = 0. Alternative hypothesis: There is a linear relationship between the two variables. ρ ≠ 0. A correlation of zero indicates that no linear relationship exists. If your p-value is less than your significance level, the sample contains sufficient evidence to reject the null hypothesis and conclude that the correlation does not equal zero. In other words, the sample data support the notion that the relationship exists in the population.
Jim Frost (Regression Analysis: An Intuitive Guide for Using and Interpreting Linear Models)
correlation does not mean that the changes in one variable actually cause the changes in the other variable.
Jim Frost (Regression Analysis: An Intuitive Guide for Using and Interpreting Linear Models)
What is a good correlation? How high should it be? These are commonly asked questions. I have seen several schemes that attempt to classify correlations as strong, medium, and weak. However, there is only one correct answer. The correlation coefficient should accurately reflect the strength of the relationship. Take a look at the correlation between the height and weight data, 0.705. It’s not a very strong relationship, but it accurately represents our data. An accurate representation is the best-case scenario for using a statistic to describe an entire dataset.
Jim Frost (Regression Analysis: An Intuitive Guide for Using and Interpreting Linear Models)
squared is a primary measure of how well a regression model fits the data. This statistic represents the percentage of variation in one variable that other variables explain. For a pair of variables, R-squared is simply the square of the Pearson’s correlation coefficient. For example, squaring the height-weight correlation coefficient of 0.705 produces an R-squared of 0.497, or 49.7%. In other words, height explains about half the variability of weight in preteen girls.
Jim Frost (Regression Analysis: An Intuitive Guide for Using and Interpreting Linear Models)
The dependent variable is a variable that you want to explain or predict using the model. The values of this variable depend on other variables. It’s also known as the response variable, outcome variable, and it is commonly denoted using a Y. Traditionally, analysts graph dependent variables and the vertical, or Y, axis.
Jim Frost (Regression Analysis: An Intuitive Guide for Using and Interpreting Linear Models)
Independent variables are the variables that you include in the model to explain or predict changes in the dependent variable.
Jim Frost (Regression Analysis: An Intuitive Guide for Using and Interpreting Linear Models)
Regression analysis mathematically describes the relationships between independent variables and a dependent variable. Use regression for two primary goals: To understand the relationships between these variables. How do changes in the independent variables relate to changes in the dependent variable? To predict the dependent variable by entering values for the independent variables into the regression equation.
Jim Frost (Regression Analysis: An Intuitive Guide for Using and Interpreting Linear Models)
A beautiful aspect of regression analysis is that you hold the other independent variables constant by merely including them in your model!
Jim Frost (Regression Analysis: An Intuitive Guide for Using and Interpreting Linear Models)
values and coefficients are they key regression output. Collectively, these statistics indicate whether the variables are statistically significant and describe the relationships between the independent variables and the dependent variable. Low p-values (typically < 0.05) indicate that the independent variable is statistically significant. Regression analysis is a form of inferential statistics. Consequently, the p-values help determine whether the relationships that you observe in your sample also exist in the larger population. The coefficients for the independent variables represent the average change in the dependent variable given a one-unit change in the independent variable (IV) while controlling the other IVs.
Jim Frost (Regression Analysis: An Intuitive Guide for Using and Interpreting Linear Models)
The low p-values indicate that both education and IQ are statistically significant. The coefficient for IQ (4.796) indicates that each additional IQ point increases your income by an average of approximately $4.80 while controlling everything else in the model. Furthermore, the education coefficient (24.215) indicates that an additional year of education increases average earnings by $24.22 while holding the other variables constant.
Jim Frost (Regression Analysis: An Intuitive Guide for Using and Interpreting Linear Models)
ordinary least squares (OLS).
Jim Frost (Regression Analysis: An Intuitive Guide for Using and Interpreting Linear Models)
Continuous variables can take on almost any numeric value and can be meaningfully divided into smaller increments, including fractional and decimal values. You often measure a continuous variable on a scale. For example, when you measure height, weight, and temperature, you have continuous data. Categorical variables have values that you can put into a countable number of distinct groups based on a characteristic. Categorical variables are also called qualitative variables or attribute variables. For example, college major is a categorical variable that can have values such as psychology, political science, engineering, biology, etc.
Jim Frost (Regression Analysis: An Intuitive Guide for Using and Interpreting Linear Models)
Observed values of the dependent variable are the values of the dependent variable that you record during your study or experiment along with the values of the independent variables. These values are denoted using Y. Fitted values are the values that the model predicts for the dependent variable using the independent variables. If you input values for the independent variables into the regression equation, you obtain the fitted value. Predicted values and fitted values are synonyms.
Jim Frost (Regression Analysis: An Intuitive Guide for Using and Interpreting Linear Models)
A residual is the distance between an observed value and the corresponding fitted value.
Jim Frost (Regression Analysis: An Intuitive Guide for Using and Interpreting Linear Models)
Graphically, residuals are the vertical distances between the observed values and the fitted values. On the graph, the line represents the fitted values from the regression model. We call this line . . . the fitted line! The lines that connect the data points to the fitted line represent the residuals. The length of the line is the value of the residual.
Jim Frost (Regression Analysis: An Intuitive Guide for Using and Interpreting Linear Models)
For a good model, the residuals should be relatively small and unbiased. In statistics, bias indicates that estimates are systematically too high or too low. Unbiased estimates are correct on average.
Jim Frost (Regression Analysis: An Intuitive Guide for Using and Interpreting Linear Models)
OLS regression squares those residuals so they’re always positive. In this manner, the process can add them up without canceling each other out.
Jim Frost (Regression Analysis: An Intuitive Guide for Using and Interpreting Linear Models)
This process produces squared residuals, which statisticians call squared errors.
Jim Frost (Regression Analysis: An Intuitive Guide for Using and Interpreting Linear Models)
OLS draws the line that minimizes the sum of squared errors (SSE). Hopefully, you’re gaining an appreciation for why the procedure is named ordinary least squares!
Jim Frost (Regression Analysis: An Intuitive Guide for Using and Interpreting Linear Models)
SSE is a measure of variability. As the points spread out further from the fitted line, SSE increases. Because the calculations use squared differences, the variance is in squared units rather the original units of the data. While higher values indicate greater variability, there is no intuitive interpretation of specific values. However, for a given data set, smaller SSE values signal that the observations fall closer to the fitted values. OLS minimizes this value, which means you’re getting the best possible line.
Jim Frost (Regression Analysis: An Intuitive Guide for Using and Interpreting Linear Models)
The result is that an individual outlier can exert a strong influence over the entire model and, by itself, dramatically change the results.
Jim Frost (Regression Analysis: An Intuitive Guide for Using and Interpreting Linear Models)
These three sums of squares have the following mathematical relationship: RSS + SSE = TSS
Jim Frost (Regression Analysis: An Intuitive Guide for Using and Interpreting Linear Models)
Understanding this relationship is fairly straight forward. RSS represents the variability that your model explains. Higher is usually good. SSE represents the variability that your model does not explain. Smaller is usually good. TSS represents the variability inherent in your dependent variable. Or, Explained Variability + Unexplained Variability = Total Variability
Jim Frost (Regression Analysis: An Intuitive Guide for Using and Interpreting Linear Models)
Additionally, if you take RSS / TSS, you’ll obtain the percentage of the variability of the dependent variable around its mean that your model explains. This statistic is R-squared!
Jim Frost (Regression Analysis: An Intuitive Guide for Using and Interpreting Linear Models)
Based on the mathematical relationship shown above, you know that R-squared can range from 0 – 100%. Zero indicates that the model accounts for none of the variability in the dependent variable around its mean. 100% signifies that the model explains all of that variability.
Jim Frost (Regression Analysis: An Intuitive Guide for Using and Interpreting Linear Models)
This graph shows all the observations together with a line that represents the fitted relationship. As is traditional, the Y-axis displays the dependent variable, which is weight. The X-axis shows the independent variable, which is height. The line is the fitted line. If you enter the full range of height values that are on the X-axis into the regression equation that the chart displays, you will obtain the line shown on the graph. This line produces a smaller SSE than any other line you can draw through these observations. Visually, we see that that the fitted line has a positive slope that corresponds to the positive correlation we obtained earlier. The line follows the data points, which indicates that the model fits the data. The slope of the line equals the coefficient that I circled. This coefficient indicates how much mean weight tends to increase as we increase height. We can also enter a height value into the equation and obtain a prediction for the mean weight. Each point on the fitted line represents the mean weight for a given height. However, like any mean, there is variability around the mean. Notice how there is a spread of data points around the line. You can assess this variability by picking a spot on the line and observing the range of data points above and below that point. Finally, the vertical distance between each data point and the line is the residual for that observation.
Jim Frost (Regression Analysis: An Intuitive Guide for Using and Interpreting Linear Models)
The linear regression model assumes that the effect of one feature is the same regardless of the values of the other features (= no interactions). But often there are interactions in the data. To predict the number of bicycles rented, there may be an interaction between temperature and whether it is a working day or not. Perhaps, when people have to work, the temperature does not influence the number of rented bikes much, because people will ride the rented bike to work no matter what happens. On days off, many people ride for pleasure, but only when it is warm enough. When it comes to rental bicycles, you might expect an interaction between temperature and working day.
Christoph Molnar (Interpretable Machine Learning: A Guide For Making Black Box Models Explainable)
Therefore all the things that appear in CLS—network analysis, digital mapping, linear and nonlinear regressions, topic modeling, topology, entropy—are just fancier ways of talking about word frequency changes.
Nan Z. Da
linear regression is something a computer can do quite effectively. Understanding whether the result makes sense—or deciding whether the method is the right one to use in the first place—requires a guiding human hand. When we teach mathematics we are supposed to be explaining how to be that guide. A math course that fails to do so is essentially training the student to be a very slow, buggy version of Microsoft Excel.
Jordan Ellenberg (How Not to Be Wrong: The Power of Mathematical Thinking)
Examples of common algorithms used in supervised learning include regression analysis (i.e. linear regression, logistic regression, non-linear regression), decision trees, k-nearest neighbors, neural networks, and support vector machines, each of which are examined in later chapters.
Oliver Theobald (Machine Learning for Absolute Beginners: A Plain English Introductiom)
Table 14.1 also shows R-square (R2), which is called the coefficient of determination. R-square is of great interest: its value is interpreted as the percentage of variation in the dependent variable that is explained by the independent variable. R-square varies from zero to one, and is called a goodness-of-fit measure.5 In our example, teamwork explains only 7.4 percent of the variation in productivity. Although teamwork is significantly associated with productivity, it is quite likely that other factors also affect it. It is conceivable that other factors might be more strongly associated with productivity and that, when controlled for other factors, teamwork is no longer significant. Typically, values of R2 below 0.20 are considered to indicate weak relationships, those between 0.20 and 0.40 indicate moderate relationships, and those above 0.40 indicate strong relationships. Values of R2 above 0.65 are considered to indicate very strong relationships. R is called the multiple correlation coefficient and is always 0 ≤ R ≤ 1. To summarize up to this point, simple regression provides three critically important pieces of information about bivariate relationships involving two continuous variables: (1) the level of significance at which two variables are associated, if at all (t-statistic), (2) whether the relationship between the two variables is positive or negative (b), and (3) the strength of the relationship (R2). Key Point R-square is a measure of the strength of the relationship. Its value goes from 0 to 1. The primary purpose of regression analysis is hypothesis testing, not prediction. In our example, the regression model is used to test the hypothesis that teamwork is related to productivity. However, if the analyst wants to predict the variable “productivity,” the regression output also shows the SEE, or the standard error of the estimate (see Table 14.1). This is a measure of the spread of y values around the regression line as calculated for the mean value of the independent variable, only, and assuming a large sample. The standard error of the estimate has an interpretation in terms of the normal curve, that is, 68 percent of y values lie within one standard error from the calculated value of y, as calculated for the mean value of x using the preceding regression model. Thus, if the mean index value of the variable “teamwork” is 5.0, then the calculated (or predicted) value of “productivity” is [4.026 + 0.223*5 =] 5.141. Because SEE = 0.825, it follows that 68 percent of productivity values will lie 60.825 from 5.141 when “teamwork” = 5. Predictions of y for other values of x have larger standard errors.6 Assumptions and Notation There are three simple regression assumptions. First, simple regression assumes that the relationship between two variables is linear. The linearity of bivariate relationships is easily determined through visual inspection, as shown in Figure 14.2. In fact, all analysis of relationships involving continuous variables should begin with a scatterplot. When variable
Evan M. Berman (Essential Statistics for Public Managers and Policy Analysts)
relationships are nonlinear (parabolic or otherwise heavily curved), it is not appropriate to use linear regression. Then, one or both variables must be transformed, as discussed in Chapter 12. Second, simple regression assumes that the linear relationship is constant over the range of observations. This assumption is violated when the relationship is “broken,” for example, by having an upward slope for the first half of independent variable values and a downward slope over the remaining values. Then, analysts should consider using two regression models each for these different, linear relationships. The linearity assumption is also violated when no relationship is present in part of the independent variable values. This is particularly problematic because regression analysis will calculate a regression slope based on all observations. In this case, analysts may be misled into believing that the linear pattern holds for all observations. Hence, regression results always should be verified through visual inspection. Third, simple regression assumes that the variables are continuous. In Chapter 15, we will see that regression can also be used for nominal and dichotomous independent variables. The dependent variable, however, must be continuous. When the dependent variable is dichotomous, logistic regression should be used (Chapter 16). Figure 14.2 Three Examples of r The following notations are commonly used in regression analysis. The predicted value of y (defined, based on the regression model, as y = a + bx) is typically different from the observed value of y. The predicted value of the dependent variable y is sometimes indicated as ŷ (pronounced “y-hat”). Only when R2 = 1 are the observed and predicted values identical for each observation. The difference between y and ŷ is called the regression error or error term
Evan M. Berman (Essential Statistics for Public Managers and Policy Analysts)
(e). Hence the expressions are equivalent, as is y = ŷ + e. Certain assumptions about e are important, such as that it is normally distributed. When error term assumptions are violated, incorrect conclusions may be made about the statistical significance of relationships. This important issue is discussed in greater detail in Chapter 15 and, for time series data, in Chapter 17. Hence, the above is a pertinent but incomplete list of assumptions. Getting Started Conduct a simple regression, and practice writing up your results. PEARSON’S CORRELATION COEFFICIENT Pearson’s correlation coefficient, r, measures the association (significance, direction, and strength) between two continuous variables; it is a measure of association for two continuous variables. Also called the Pearson’s product-moment correlation coefficient, it does not assume a causal relationship, as does simple regression. The correlation coefficient indicates the extent to which the observations lie closely or loosely clustered around the regression line. The coefficient r ranges from –1 to +1. The sign indicates the direction of the relationship, which, in simple regression, is always the same as the slope coefficient. A “–1” indicates a perfect negative relationship, that is, that all observations lie exactly on a downward-sloping regression line; a “+1” indicates a perfect positive relationship, whereby all observations lie exactly on an upward-sloping regression line. Of course, such values are rarely obtained in practice because observations seldom lie exactly on a line. An r value of zero indicates that observations are so widely scattered that it is impossible to draw any well-fitting line. Figure 14.2 illustrates some values of r. Key Point Pearson’s correlation coefficient, r, ranges from –1 to +1. It is important to avoid confusion between Pearson’s correlation coefficient and the coefficient of determination. For the two-variable, simple regression model, r2 = R2, but whereas 0 ≤ R ≤ 1, r ranges from –1 to +1. Hence, the sign of r tells us whether a relationship is positive or negative, but the sign of R, in regression output tables such as Table 14.1, is always positive and cannot inform us about the direction of the relationship. In simple regression, the regression coefficient, b, informs us about the direction of the relationship. Statistical software programs usually show r rather than r2. Note also that the Pearson’s correlation coefficient can be used only to assess the association between two continuous variables, whereas regression can be extended to deal with more than two variables, as discussed in Chapter 15. Pearson’s correlation coefficient assumes that both variables are normally distributed. When Pearson’s correlation coefficients are calculated, a standard error of r can be determined, which then allows us to test the statistical significance of the bivariate correlation. For bivariate relationships, this is the same level of significance as shown for the slope of the regression coefficient. For the variables given earlier in this chapter, the value of r is .272 and the statistical significance of r is p ≤ .01. Use of the Pearson’s correlation coefficient assumes that the variables are normally distributed and that there are no significant departures from linearity.7 It is important not to confuse the correlation coefficient, r, with the regression coefficient, b. Comparing the measures r and b (the slope) sometimes causes confusion. The key point is that r does not indicate the regression slope but rather the extent to which observations lie close to it. A steep regression line (large b) can have observations scattered loosely or closely around it, as can a shallow (more horizontal) regression line. The purposes of these two statistics are very different.8 SPEARMAN’S RANK CORRELATION
Evan M. Berman (Essential Statistics for Public Managers and Policy Analysts)
to the measures described earlier. Hence, 90 percent of the variation in one variable can be explained by the other. For the variables given earlier, the Spearman’s rank correlation coefficient is .274 (p < .01), which is comparable to r reported in preceding sections. Box 14.1 illustrates another use of the statistics described in this chapter, in a study of the relationship between crime and poverty. SUMMARY When analysts examine relationships between two continuous variables, they can use simple regression or the Pearson’s correlation coefficient. Both measures show (1) the statistical significance of the relationship, (2) the direction of the relationship (that is, whether it is positive or negative), and (3) the strength of the relationship. Simple regression assumes a causal and linear relationship between the continuous variables. The statistical significance and direction of the slope coefficient is used to assess the statistical significance and direction of the relationship. The coefficient of determination, R2, is used to assess the strength of relationships; R2 is interpreted as the percent variation explained. Regression is a foundation for studying relationships involving three or more variables, such as control variables. The Pearson’s correlation coefficient does not assume causality between two continuous variables. A nonparametric alternative to testing the relationship between two continuous variables is the Spearman’s rank correlation coefficient, which examines correlation among the ranks of the data rather than among the values themselves. As such, this measure can also be used to study relationships in which one or both variables are ordinal. KEY TERMS   Coefficient of determination, R2 Error term Observed value of y Pearson’s correlation coefficient, r Predicted value of the dependent variable y, ŷ Regression coefficient Regression line Scatterplot Simple regression assumptions Spearman’s rank correlation coefficient Standard error of the estimate Test of significance of the regression coefficient Notes   1. See Chapter 3 for a definition of continuous variables. Although the distinction between ordinal and continuous is theoretical (namely, whether or not the distance between categories can be measured), in practice ordinal-level variables with seven or more categories (including Likert variables) are sometimes analyzed using statistics appropriate for interval-level variables. This practice has many critics because it violates an assumption of regression (interval data), but it is often
Evan M. Berman (Essential Statistics for Public Managers and Policy Analysts)
There are four main predictive modeling techniques detailed in this book as important upstream O&G data-driven analytic methodologies: Decision trees Regression Linear regression Logistic regression Neural networks Artificial neural networks Self-organizing maps (SOMs) K-means clustering
Keith Holdaway (Harness Oil and Gas Big Data with Analytics: Optimize Exploration and Production with Data-Driven Models (Wiley and SAS Business Series))
Figure 3.35 shows examples of nonstandard trend lines: FIGURE 3.35 Nonstandard Trend Lines in XLF A is drawn between lows in a downtrend instead of between highs in a downtrend. B is also drawn between lows in a downtrend. Furthermore, it ignores a large price spike in an effort to fit the line to later data. C is more of a best-fit line drawn through the center of a price area. These may be drawn freehand or via a procedure like linear regression. D is drawn between highs in an uptrend. E raises a critical point about trend lines: They are lines drawn between successive swings in the market. If there are no swings, there should be no trend line. It would be hard to argue that the market was showing any swings at E, at least on this time frame. This trend line may be valid on a lower time frame, but it is nonstandard on this time frame. In general, trend lines are tools to define the relationship between swings, and are a complement to the simple length of swing analysis. As such, one of the requirements for drawing trend lines is that there must actually be swings in the market. We see many cases where markets are flat, and it is possible to draw trend lines that touch the tops or bottoms of many consecutive price bars. With one important exception later in this chapter, these types of trend lines do not tend to be very significant. They are penetrated easily by the smallest motions in the market, and there is no reliable price action after the penetration. Avoid drawing these trend lines in flat markets with no definable swings.
Adam H. Grimes (The Art and Science of Technical Analysis: Market Structure, Price Action, and Trading Strategies (Wiley Trading Book 547))
There are a hundred thousand species of love, separately invented, each more ingenious than the last, and every one of them keeps making things. OLIVIA VANDERGRIFF SNOW IS THIGH-HIGH and the going slow. She plunges through drifts like a pack animal, Olivia Vandergriff, back to the boardinghouse on the edge of campus. Her last session ever of Linear Regression and Time Series Models has finally ended. The carillon on the quad peals five, but this close to the solstice, blackness closes around Olivia like midnight. Breath crusts her upper lip. She sucks it back in, and ice crystals coat her pharynx. The cold drives a metal filament up her nose. She could die out here, for real, five blocks from home. The novelty thrills her. December of senior year. The semester so close to over. She might stumble now, fall face-first, and still roll across the finish line. What’s left? A short-answer exam on survival analysis. Final paper in Intermediate Macroeconomics. Hundred and ten slide IDs in Masterpieces of World Art, her blow-off elective. Ten
Richard Powers (The Overstory)
At its core, regression analysis seeks to find the “best fit” for a linear relationship between two variables. A simple example is the relationship between height and weight. People who are taller tend to weigh more—though that is obviously not always the case.
Charles Wheelan (Naked Statistics: Stripping the Dread from the Data)
Once the initial paradox dilemma is maximized to the ultimate conundrum of linear reasoning, the conundrum can be translated into the relationally stratified juxtaposition of Truth (consistency) and Derivability (completeness). This juxtaposition has misled nearly everyone into believing, despite being simultaneously contradicted by direct facial perception, an idiotically incorrect “dualism” between internal-cognition and external-perception. Cognition and perception are coupled into complementary (mental and physical) aspects of a singly unified logical (telic) identity which regresses into a merger of consistency and completeness, period.
Council of Human Hybrid-Attractors (Incessance: Incesancia)
Working an integral or performing a linear regression is something a computer can do quite effectively. Understanding whether the result makes sense—or deciding whether the method is the right one to use in the first place—requires a guiding human hand.
Jordan Ellenberg (How Not to Be Wrong: The Power of Mathematical Thinking)
Regression analysis enables us to go one step further and “fit a line” that best describes a linear relationship between the two variables.
Charles Wheelan (Naked Statistics: Stripping the Dread from the Data)
This is not about regression, she thought. I am not linear. I just hurt. I just want the world to be quiet.
Amy Reed (The Nowhere Girls)
Beginners typically start out using simple supervised learning algorithms such as linear regression, logistic regression, decision trees, and k-nearest neighbors. Beginners are also likely to apply unsupervised learning in the form of k-means clustering and descending dimension algorithms.
Oliver Theobald (Machine Learning For Absolute Beginners: A Plain English Introduction (Second Edition) (AI, Data Science, Python & Statistics for Beginners))
Chaos should be taught, he argued. It was time to recognize that the standard education of a scientist gave the wrong impression. No matter how elaborate linear mathematics could get, with its Fourier transforms, its orthogonal functions, its regression techniques, May argued that it inevitably misled scientists about their overwhelmingly nonlinear world. “The mathematical intuition so developed ill equips the student to confront the bizarre behaviour exhibited by the simplest of discrete nonlinear systems,” he wrote. “Not only in research, but also in the everyday world of politics and economics, we would all be better off if more people realized that simple nonlinear systems do not necessarily possess simple dynamical properties.
James Gleick (Chaos: Making a New Science)