This is a technique for computing coefficients for Multivariate Linear Regression. The problem is also called Ordinary Least Square Regression, and Normal Equation is an approach for solving it.
This was discovered by Carl Friedrich Gauss in 1809. It finds the regression coefficients analytically. It's a one-step learning algorithm as opposed to Gradient Descent.
In Multivariate Linear Regression Problem suppose we have m training examples (xi, yi) each with n features, xi = [xi1, ..., xin] ∈ Rn. We can put all such xi as rows of a matrix X, sometimes called the Design Matrix, and the observed values y = [y1, …, ym] ∈ Rm. Thus, we expressed our multivariate linear regression problem in the matrix form Xw = y. Note that there's usually additional feature xi0 = 1 to incorporate the intercept term, so xi∈Rn+1 .
Thus, we have a system Xw = y. Now, how do we solve w, and if there's no solution, how do we find the best possible w?
Let w be the best fit solution to Xw ≈ y. We will try to minimize the error e = y−Xw also called residuals. We take the square of this error, so the objective is to minimize J(w)=∥e∥2=∥y−Xw∥2 over w. So, our problem is to find w = arg min J(w) = arg min ∥y−Xw∥2. Let's expand J(w) = yTy − 2wTXTy + wTXTXw. Now to minimize J(w) with respect to w we set:
∂J(w)/∂w = −2XTy + 2XTXw = 0
Or, XTXw = XTy
Or, w = (XTX)−1XTy
The equation, XTXw = XTy, is known as the Normal Equation that is used to solve the regression coefficients w in Multivariate Linear Regression.
QUESTION I: How to compute the Normal Equations for Polynomial Regression?
QUESTION II: Could we apply Iterative Method to solve the Normal Equations?
#GlobalAIandDataScience#GlobalDataScience