Linear regression is the approximation of a linear model used to describe the relationship between two or more variables. We can use linear regression to predict a continuous value, by using other variables. In simple linear regression, there are two variables: a dependent variable and an independent variable. Linear Regression is the easiest and most basic regression to use and understand; it is fast and highly interpretable. It also doesn’t require the tuning of parameters. For example, tuning the K parameter in the K-Nearest Neighbors or the learning rate in Neural Networks isn’t something to be worried about in Linear Regression.
For an overview of Linear Regression in Machine Learning, please go through my article titled “Regression in Machine Learning” at
https://community.ibm.com/community/user/ibmz-and-linuxone/blogs/subhasish-sarkar1/2020/03/14/regression-in-machine-learning.
In order to understand linear regression, let us consider a hypothetical example of estimating the approximate CO2 emission from a new car model after its production, with the variable ‘Engine Size of the Car’ being an independent variable, and ‘CO2 Emission’ as the target value that we would like to predict. Let us assume that we have the following dataset.
Dataset
Let us first plot our independent and dependent variables using a scatter plot. With linear regression, we can fit a line through the data. The whole objective of linear regression is to find a line that is a good fit of the data in hand. Good fit, here, means that if we have, for instance, a car with engine size x1=3.9, and actual CO2 Emission=350, the CO2 emission for a new or unknown car model should be predicted very close to the actual value of 350.
Scatter Plot
The fit line is shown traditionally as a polynomial, which, for a simple regression problem with a single independent variable, would be
Polynomial Equation
In this equation, ŷ (usually called y-hat) is the dependent variable or the variable whose value we are to predict, and x1 is the independent variable. θo and θ1 are called the coefficients of the linear equation. θ1 is known as the "slope" or "gradient" of the fitting line and θo is known as the "intercept".
Slope Intercept Form
Linear regression estimates the coefficients θo and θ1 of the line. In linear regression, we calculate θo and θ1 to find the best line to ‘fit’ the data. But, how do we do that? Let us assume for a moment that we have already found the best fit line for our data.
Fit Line
The green dotted lines represent the actual CO2 Emission value. The orange dotted lines represent the predicted CO2 Emission value using the fit line. Now, if we compare the actual value of the emission of the car with what we have predicted, we will find out that we have a (340-250) = 90-unit error (also called the residual error), represented by the red bi-directional arrow. Error = y - ŷ = 250 – 340 = -90. This error is only for a single data point in our dataset. A 90-unit error inevitably points to the fact that our prediction line is not very accurate. What we usually do is that we calculate the mean of all residual errors for all the data points in our dataset and the goal is to find a line where the mean of all residual errors is as minimized as possible. Technically speaking, we consider what is called the Mean Squared Error (MSE), mathematically represented by the following equation.
Mean Squared Error (MSE) Equation
Therefore, the objective of linear regression is to minimize the MSE equation by finding the best parameters, θo and θ1. θo and θ1 can be calculated using the following equations.
θo and θ1 calculation
Quite evidently, we calculate the average of x1 and average of y. Next, we use those values to find θ1. Once we have the value of θ1, calculating θo is child’s play. We really don’t need to remember the formula for calculating the parameters θo and θ1; most of the libraries in Python, R, and Scala used for machine learning can easily calculate the parameters for us. However, it is always good to understand how everything works.
Suppose, in our case, θo=100 and θ1=25. ŷ = 100 + 25 * x1
Now, let us imagine that we need to predict the CO2 Emission (y) from EngineSize (x) for an Automobile having an EngineSize 3.3.
ŷ =100 + 25 * x1
=> CO2Emission = 100 + 25*EngineSize = 100 + 25 × 3.3 = 182.5
Thus, we have predicted that the CO2 Emission for our specific car in consideration is 182.5.