Global AI and Data Science

Global AI & Data Science

Train, tune and distribute models with generative AI and machine learning capabilities

 View Only

Principal Component Analysis using Singular Value Decomposition

By Moloy De posted Fri June 02, 2023 09:12 PM

  
In data analysis, we often examine data tables using both the domain of interest and linear algebra techniques. Viewing data as a matrix gives us access to the rich library of tools from linear algebra, including the Singular Value Decompsition (SVD). This allows us to answer a set of questions, like “What patterns emerge from all the data?”
 
The Singular Value Decomposition
 
The singular value decomposition expresses any n×d matrix X as a product of three matrices U, Σ, and V such that X=UΣV'. There are restrictions on U, Σ and V. U has dimensions n×n, Σ has dimensions n×d, and V has dimensions d×d. U and V are orthogonal matrices. That is, U'U=I and V'V=I. This also implies that all columns of U and V have magnitude 1 and are mutually orthogonal. Σ is a diagonal matrix. That is, all elements in Σ are 0 unless they lie on the diagonal. In addition, the diagonal elements in Σ are arranged from biggest to smallest.
 
As a simple example, if we have:
We can also check that the matrices U, Σ, and V fulfill the requirements of the SVD. First, X has dimensions 3×2. U has dimensions 3×3, Σ has dimensions 3×2, and V has dimensions 2×2. Second, U and V are orthogonal matrices. Note that we select this example for simplicity. As we will soon see, for real datasets the elements of U, Σ, and V' are rarely integers.
 
Principal Directions
 
SVD takes a data matrix X and produces the matrix factorization X=UΣV'. This factorization has useful properties that we use to perform dimensionality reduction. First, we examine V'. The rows of V' (or equivalently the columns of the un-transposed V) contain the principal directions for the matrix X. Consider the following dataset of country-level child mortality and fertility:
To examine the principal directions of this data matrix, we first center the data by subtracting the mean of each column.
Then, we compute the SVD and display the values in V'.  We plot the rows of V' as vectors on top of a scatter plot of the original data (we scaled the vectors for presentation). The first row of V' is drawn in red, and the second in purple.
Observe that the principal directions of X, or the rows of V', seem to align themselves with the direction that the data vary. This is no coincidence: the principal directions of X are precisely the orthogonal directions where X has the greatest variance.
 
Principal Component Analysis
 
We can use principal directions to sketch a procedure for dimensionality reduction. First, we find the principal directions of X by centering X, then using the SVD. If X has 100 dimensions, this will produce 100 principal directions. Next, we decide how many dimensions we want to reduce to; a common choice is 2. To reduce to 2 dimensions, we keep the first two principal directions and throw out the other 98. Finally, we project the data onto the first two directions. This is principal component analysis (PCA), a method for dimensionality reduction. Recall that SVD computes X=UΣV', and the matrix V' contains the two principal directions of X. To remove a direction, we simply drop the last row from V'.
 
Finally, we project the original data points X onto the single principal direction. Since the SVD enforces that principal directions in V' have magnitude one, this a simply a matrix multiplication XV, after removing a direction from V.
 
We have successfully performed PCA to reduce X from 2 dimensions to 1. Dimensionality reduction attempts to summarize the data with fewer dimensions while preserving patterns in the data. One way to verify this is to check that points that appear together in the two-dimensional data (scatter plot) also appear together one-dimensional data (rug plot).
In practice, we use a simpler method to compute the reduced dimension X which arises from the following equivalence
This means that instead of removing rows from V' and then computing XV, we can compute UΣ and then remove columns until we achieve our desired dimensionality. For this reason, we call the columns of UΣ the principal components of X. To reduce X from 2 to 1 dimension, we can compute the principal components.
 
With this, we can describe the final procedure for PCA. To reduce an n×d matrix X to k dimensions (where k<d), we perform the following steps:
1. Center X
2. Use the SVD to find X=UΣV'
3. Compute the principal components UΣ
4. Keep the first k columns of UΣ
 
Rank-k Approximation
 
We can also use PCA to construct a rank-k approximation of X, which produces a rank k matrix with the same dimensions as X. The rank-k approximation projects the points in X onto the top k principal directions. The procedure is similar to that of PCA:
1. Center X
2. Use the SVD to find X=UΣV⊤
3. Keep the first k columns of UΣ , and the first k rows of V'.
4. Compute X≈UΣV' with the reduced UΣ and V'.
 
Thanks.



QUESTION I : What are the efficient SVD algorithms that work for large data?
QUESTION II : How does SVD help in recommendation engines?

REFERENCE : PCA using the Singular Value Decomposition

0 comments
4 views

Permalink