Global AI and Data Science

Global AI & Data Science

Train, tune and distribute models with generative AI and machine learning capabilities

View Only

Back to Blog List

R Squared the Coefficient of Determination

By Moloy De posted Thu February 18, 2021 08:19 PM

In statistics, the coefficient of determination, denoted R² or r² is the proportion of the variance in the dependent variable that is predictable from the independent variables.

It is a statistic used in the context of statistical models whose main purpose is either the prediction of future outcomes or the testing of hypotheses, on the basis of other related information. It provides a measure of how well observed outcomes are replicated by the model, based on the proportion of total variation of outcomes explained by the model.

There are several definitions of R² that are only sometimes equivalent. One class of such cases includes that of simple linear regression where r² is used instead of R². When an intercept is included, then r² is simply the square of the sample correlation coefficient between the observed outcomes and the observed predictor values. If additional regressors are included, R² is the square of the coefficient of multiple correlation. In both such cases, the coefficient of determination ranges from 0 to 1.

In all instances where R² is used, the predictors are calculated by ordinary least-squares regression: that is, by minimizing sum of squares. In this case, R² increases as the number of variables in the model is increased. R² is monotone increasing with the number of variables included—it will never decrease. This illustrates a drawback to one possible use of R², where one might keep adding variables to increase the R² value. For example, if one is trying to predict the sales of a model of car from the car's gas mileage, price, and engine power, one can include such irrelevant factors as the first letter of the model's name or the height of the lead engineer designing the car because the R² will never decrease as variables are added and will probably experience an increase due to chance alone.

This leads to the alternative approach of looking at the adjusted R².

The explanation of this statistic is almost the same as R² but it penalizes the statistic as extra variables are included in the model. For cases other than fitting by ordinary least squares, the R² statistic can be calculated as above and may still be a useful measure. If fitting is by weighted least squares or generalized least squares, alternative versions of R2 can be calculated appropriate to those statistical frameworks, while the "raw" R² may still be useful if it is more easily interpreted. Values for R² can be calculated for any type of predictive model, which need not have a statistical basis.

QUESTION I : Could R Squared be negative?
QUESTION II : When analyzing a Time Series data is it justified to calculate R Squared as the square of the correlation Coefficient be Actuals and Predicted?

REFERENCE : Wikipedia

#Featured-area-3
#Featured-area-3-home
#GlobalAIandDataScience
#GlobalDataScience

0 comments

7 views

Permalink

https://community.ibm.com/community/user/blogs/moloy-de1/2021/02/02/points-to-ponder

Global AI and Data Science

Global AI & Data Science

R Squared the Coefficient of Determination

By Moloy De posted Thu February 18, 2021 08:19 PM

Permalink

Additional
Resources

Office

Quick Links

Global AI and Data Science

Global AI & Data Science

R Squared the Coefficient of Determination

By Moloy De posted Thu February 18, 2021 08:19 PM

Permalink

Additional Resources

Office

Quick Links

Additional
Resources