What Is the Coefficient of Determination?

The coefficient of determination is a measure used in statistical analysis that assesses how well a model explains and predicts future outcomes. It is indicative of the level of explained variability in the data set. The coefficient of determination, also commonly known as "R-squared," is used as a guideline to measure the accuracy of the model.

One way of interpreting this figure is to say that the variables included in a given model explain approximately x% of the observed variation. So, if the R2 = 0.50, then approximately half of the observed variation can be explained by the model.

1:58

R-Squared

Understanding the Coefficient of Determination

The coefficient of determination is used to explain how much variability of one factor can be caused by its relationship to another factor. It is relied on heavily in trend analysis and is represented as a value between 0 and 1.

The closer the value is to 1, the better the fit, or relationship, between the two factors. The coefficient of determination is the square of the correlation coefficient, also known as "R," which allows it to display the degree of linear correlation between two variables.

[Important: The coefficient of determination should not be interpreted naively.]

This correlation is known as the "goodness of fit." A value of 1.0 indicates a perfect fit, and it is thus a very reliable model for future forecasts, indicating that the model explains all of the variations observed. A value of 0, on the other hand, would indicate that the model fails to accurately model the data at all. For a model with several variables, such as a multiple regression model, the adjusted R2 is a better coefficient of determination. In economics, an R2 value above 0.60 is seen as worthwhile.

Advantages of Analyzing the Coefficient of Determination

The coefficient of determination is the square of the correlation between the predicted scores in a data set versus the actual set of scores. It can also be expressed as the square of the correlation between X and Y scores, with the X being the independent variable and the Y being the dependent variable.

Regardless of representation, an R-squared equal to 0 means that the dependent variable cannot be predicted using the independent variable. Conversely, if it equals 1, it means that the dependent of a variable is always predicted by the independent variable.

A coefficient of determination that falls within this range measures the extent that the dependent variable is predicted by the independent variable. An R-squared of 0.20, for example, means that 20% of the dependent variable is predicted by the independent variable.

[Important: It's up to the person to make a decision based on the R-squared number.]

The goodness of fit, or the degree of linear correlation, measures the distance between a fitted line on a graph and all the data points that are scattered around the graph. The tight set of data will have a regression line that's very close to the points and have a high level of fit, meaning that the distance between the line and the data is very small. A good fit has an R-squared that is close to 1.

However, R-squared is unable to determine whether the data points or predictions are biased. It also doesn't tell the analyst or user whether the coefficient of determination value is good or not. A low R-squared is not bad, for example, and it's up to the person to make a decision based on the R-squared number.

The coefficient of determination should not be interpreted naively. For example, if a model’s R-squared is reported at 75%, the variance of its errors is 75% less than the variance of the dependent variable, and the standard deviation of its errors is 50% less than the standard deviation of the dependent variable. The standard deviation of the model’s errors is about one-third the size of the standard deviation of the errors that you would get with a constant-only model.

Finally, even if an R-squared value is large, there may be no statistical significance of the explanatory variables in a model, or the effective size of these variables may be very small in practical terms.

Key Takeaways

  • The coefficient of determination is a complex idea centered on the statistical analysis of a future model of data.
  • The coefficient of determination is used to explain how much variability of one factor can be caused by its relationship to another factor.