Global Data Science Forum

R Squared the Coefficient of Determination

By Moloy De posted Thu February 18, 2021 08:19 PM

In statistics, the coefficient of determination, denoted R2 or r2  is the proportion of the variance in the dependent variable that is predictable from the independent variables.

It is a statistic used in the context of statistical models whose main purpose is either the prediction of future outcomes or the testing of hypotheses, on the basis of other related information. It provides a measure of how well observed outcomes are replicated by the model, based on the proportion of total variation of outcomes explained by the model.

There are several definitions of R2 that are only sometimes equivalent. One class of such cases includes that of simple linear regression where r2 is used instead of R2. When an intercept is included, then r2 is simply the square of the sample correlation coefficient between the observed outcomes and the observed predictor values. If additional regressors are included, R2 is the square of the coefficient of multiple correlation. In both such cases, the coefficient of determination ranges from 0 to 1.

In all instances where R2 is used, the predictors are calculated by ordinary least-squares regression: that is, by minimizing sum of squares. In this case, R2 increases as the number of variables in the model is increased. R2 is monotone increasing with the number of variables included—it will never decrease. This illustrates a drawback to one possible use of R2, where one might keep adding variables to increase the R2 value. For example, if one is trying to predict the sales of a model of car from the car's gas mileage, price, and engine power, one can include such irrelevant factors as the first letter of the model's name or the height of the lead engineer designing the car because the R2 will never decrease as variables are added and will probably experience an increase due to chance alone.

This leads to the alternative approach of looking at the adjusted R2.

The explanation of this statistic is almost the same as R2 but it penalizes the statistic as extra variables are included in the model. For cases other than fitting by ordinary least squares, the R2 statistic can be calculated as above and may still be a useful measure. If fitting is by weighted least squares or generalized least squares, alternative versions of R2 can be calculated appropriate to those statistical frameworks, while the "raw" R2 may still be useful if it is more easily interpreted. Values for R2 can be calculated for any type of predictive model, which need not have a statistical basis.

QUESTION I : Could R Squared be negative?

QUESTION II : When analyzing a Time Series data is it justified to calculate R Squared as the square of the correlation Coefficient be Actuals and Predicted?

REFERENCE : Wikipedia