Often a matrix represents a process that takes in one kind of data and produces another.
Examples:
So a matrix can represent a transformation
A: \mathbb{R}^n \to \mathbb{R}^m
The input space and output space may be different worlds.
Consider
A = \begin{bmatrix} 1 & 0 & 0\\ 0 & 1 & 0 \end{bmatrix}.
Then
A(x,y,z) = (x,y).
Geometrically:
So:
Already we see:
Some directions in the input space matter, and some directions don’t.
Every vector \mathbf{x} \in \mathbb{R}^3 can be written as
\mathbf{x} = (\text{part in the }xy\text{-plane}) + (\text{part in the }z\text{-direction}).
Apply A:
A\mathbf{x} = A(\text{xy part}) + A(\text{z part}).
But the second term is always zero, so
A\mathbf{x} = A(\text{xy part}).
Before doing anything interesting, a transformation throws away the part of the input it can’t see.
Sometimes a transformation effectively responds to only one input direction.
For instance, let
A = \begin{bmatrix} 1 & 1 \\ 0 & 0 \\ 0 & 0 \end{bmatrix} : \mathbb{R}^2 \to \mathbb{R}^3.
It takes (x,y) and returns
A(x,y) = (x+y,0,0).
Multiply A by a vector:
A \begin{bmatrix} x \\ y \end{bmatrix} = \begin{bmatrix} 1 & 1 \\ 0 & 0 \\ 0 & 0 \end{bmatrix} \begin{bmatrix} x \\ y \end{bmatrix} = \begin{bmatrix} x + y \\ 0 \\ 0 \end{bmatrix}
Notice that the output depends only on the combination
x+y.
So the component of the input in the direction
\begin{bmatrix}1\\1\end{bmatrix}
affects the output, but the component of the input in the orthogonal direction
\begin{bmatrix}1\\-1\end{bmatrix}
is completely ignored.
So even though the input space is 2-dimensional, the transformation really only responds to the component of the input in one direction.
Now:
So
\mathrm{rank}(A) = 1.
Geometrically:
This will show up later as having only one nonzero singular value.
The number of independent directions the map can actually do anything with is its rank — regardless of input or output dimension.
Rank tells us how many directions a linear map can act on. SVD tells us which directions, and how strongly.
Let r = \mathrm{rank}(A). We’ll try to pick orthonormal input directions \mathbf{v}_1,\dots,\mathbf{v}_r that capture everything A can actually do.
Ask:
Can we choose input directions \mathbf{v}_1,\dots,\mathbf{v}_r so that their images [A\mathbf{v}_1,\dots,A\mathbf{v}_r] are orthogonal in the output space?
That would mean the transformation treats these directions independently.
Define
\mathbf{w}_j = A\mathbf{v}_j.
We want:
\mathbf{w}_i^T \mathbf{w}_j = 0 \quad (i \ne j).
But
\mathbf{w}_i^T \mathbf{w}_j = (A\mathbf{v}_i)^T(A\mathbf{v}_j) = \mathbf{v}_i^T A^T A \mathbf{v}_j.
\mathbf{w}_i^T \mathbf{w}_j = (A\mathbf{v}_i)^T(A\mathbf{v}_j) = \mathbf{v}_i^T A^T A \mathbf{v}_j.
The output vectors will be orthogonal if we choose the \mathbf{v}_j to be eigenvectors of A^T A.
Let
A^T A \mathbf{v}_j = \sigma_j^2 \mathbf{v}_j.
Then
\|\mathbf{w}_j\|^2 = (A\mathbf{v}_j)^T(A\mathbf{v}_j) = \mathbf{v}_j^T A^T A \mathbf{v}_j = \sigma_j^2.
So the outputs are orthogonal, with known lengths.
Define
\mathbf{u}_j = \frac{1}{\sigma_j} \mathbf{w}_j = \frac{1}{\sigma_j} A\mathbf{v}_j.
Then
\mathbf{u}_1,\dots,\mathbf{u}_r \text{ are orthonormal}
and
A\mathbf{v}_j = \sigma_j \mathbf{u}_j.
Let
V_r = [\mathbf{v}_1\ \cdots\ \mathbf{v}_r], \qquad U_r = [\mathbf{u}_1\ \cdots\ \mathbf{u}_r], \qquad \Sigma_r = \mathrm{diag}(\sigma_1,\dots,\sigma_r).
Then the relations A\mathbf{v}_j = \sigma_j \mathbf{u}_j become
A V_r = U_r \Sigma_r.
For any input \mathbf{x}:
So, in the right orthonormal coordinates in the input and output spaces,
Multiplication by A is just independent scalings of the components along orthogonal directions in the two spaces.
We’ve described what happens to the r independent directions in the input space.
But what if the input space has more than r dimensions? Or the output space?
The trick is to extend the orthonormal bases to full bases.
Let’s start with the case where the input space has more than r dimensions.
We started with the r singular vectors \mathbf{v}_1,\dots,\mathbf{v}_r. These capture everything A can actually do.
The rest of the input space is the null space of A, and our additional vectors will be a basis for this null space.
Let’s call these additional vectors \mathbf{v}_{r+1},\dots,\mathbf{v}_n. We know that
A\mathbf{v}_{r+1} = \mathbf{0}, \quad \dots, \quad A\mathbf{v}_n = \mathbf{0}.
Similarly, we can extend the singular vectors \mathbf{u}_1,\dots,\mathbf{u}_r to a basis for the output space.
Let’s call these additional vectors \mathbf{u}_{r+1},\dots,\mathbf{u}_m. They represent directions that cannot be reached by A\mathbf{x} for any \mathbf{x}.
This means that the projection of any output A\mathbf{x} onto any of these additional vectors must be zero:
\mathbf{u}_{r+k}^T A\mathbf{x} = 0 \text{ for all } \mathbf{x}.
for k = 1, \dots, m - r.
And the only way this can be true for all \mathbf{x} is if
\mathbf{u}_{r+k}^T A = \mathbf{0}
We have built orthonormal bases:
For the input space:
V = [\mathbf{v}_1\ \cdots\ \mathbf{v}_r\ \mathbf{v}_{r+1}\ \cdots\ \mathbf{v}_n].
For the output space:
U = \begin{bmatrix} \mathbf u_1 & \cdots & \mathbf u_r & \mathbf u_{r+1} & \cdots & \mathbf u_m \end{bmatrix}.
Split them into blocks:
V = \begin{bmatrix} V_r & V_0 \end{bmatrix}, \qquad U = \begin{bmatrix} U_r & U_0 \end{bmatrix}.
where
Put A between the two bases:
Consider the matrix
U^T A V.
\begin{bmatrix} U_r^T \\ U_0^T \end{bmatrix} A \begin{bmatrix} V_r & V_0 \end{bmatrix}=\begin{bmatrix} U_r^T A V_r & U_r^T A V_0 \\ U_0^T A V_r & U_0^T A V_0 \end{bmatrix}
\begin{bmatrix} U_r^T A V_r & U_r^T A V_0 \\ U_0^T A V_r & U_0^T A V_0 \end{bmatrix}
Each block has a geometric meaning.
Because A V_0 = 0,
we get
U_r^T A V_0 = 0.
Because U_0^T A = 0,
we get
U_0^T A V_r = 0, \qquad U_0^T A V_0 = 0.
So the matrix collapses to
$$ U^T A V =======
\begin{bmatrix} U_r^T A V_r & 0 \ 0 & 0 \end{bmatrix}\end{bmatrix}. $$
Everything outside the rank-r interaction vanishes.
By construction,
A\mathbf v_i = \sigma_i \mathbf u_i, \qquad i=1,\dots,r.
Therefore
$$ U_r^T A V_r ===========
\begin{bmatrix} \sigma_1 & & \ & \ddots & \ & & \sigma_r \end{bmatrix}& & _r \end{bmatrix}. $$
Call this matrix \Sigma_r.
U^T A V = \begin{bmatrix} \Sigma_r & 0 \ 0 & 0 \end{bmatrix} = \Sigma.
Multiplying back gives
\boxed{A = U \Sigma V^T.}
After choosing the right input and output coordinates, the transformation does only one thing:
The SVD is simply the matrix form of this geometric statement.