A = \begin{bmatrix} 1 & 2 & 0 \\ 1 & 2 & 0 \\ 0 & 0 & 0 \end{bmatrix}
Columns are \mathbf{a}_1 = (1,1,0), \mathbf{a}_2 = 2\mathbf{a}_1, \mathbf{a}_3 = \mathbf{0}.
The only vectors that we can form by taking linear combinations of the columns of A are multiples of \begin{bmatrix} 1 \\ 1 \\ 0 \end{bmatrix}.
A basis for the column space is just this single vector: \left\lbrace \begin{bmatrix} 1 \\ 1 \\ 0 \end{bmatrix} \right\rbrace.
Because there is only one vector in the basis, the matrix has rank 1.
Here are some vectors that are in the column space of A:
\begin{bmatrix} 1 \\ 1 \\ 0 \end{bmatrix}, \begin{bmatrix} 2 \\ 2 \\ 0 \end{bmatrix}, \begin{bmatrix} 3 \\ 3 \\ 0 \end{bmatrix}, \begin{bmatrix} 4 \\ 4 \\ 0 \end{bmatrix}, \begin{bmatrix} 5 \\ 5 \\ 0 \end{bmatrix}, \begin{bmatrix} -1 \\ -1 \\ 0 \end{bmatrix}, \begin{bmatrix} -2 \\ -2 \\ 0 \end{bmatrix}
If we draw these vectors in \mathbb{R}^3, we see that they all lie on the same line. This line is the column space of A.
(draw it)
Remember, the column space tells us which vectors \mathbf{b} can be written as A\mathbf{x} for some \mathbf{x}.
So if a vector \mathbf{b} is not on this line, we know that there is no solution to A\mathbf{x}=\mathbf{b}.
(draw a b slightly off the line, and one very far off the line)
A = \begin{bmatrix} 1 & 1 & 2 \\ 1 & 2 & 3 \\ 2 & 1 & 3 \end{bmatrix}
The first two columns are independent, but the third is a linear combination of the first two.
We can make a basis for the column space by taking the first two columns: \left\lbrace \begin{bmatrix} 1 \\ 1 \\ 2 \end{bmatrix}, \begin{bmatrix} 1 \\ 2 \\ 1 \end{bmatrix} \right\rbrace.
\mathcal{C}(A) = \mathrm{span}\left\lbrace \begin{bmatrix} 1 \\ 1 \\ 2 \end{bmatrix}, \begin{bmatrix} 1 \\ 2 \\ 1 \end{bmatrix} \right\rbrace
Here are some vectors that are in the column space of A:
\begin{bmatrix} 1 \\ 1 \\ 2 \end{bmatrix} = 1 \cdot \mathbf{a}_1 + 0 \cdot \mathbf{a}_2, \qquad \begin{bmatrix} 1 \\ 2 \\ 1 \end{bmatrix} = 0 \cdot \mathbf{a}_1 + 1 \cdot \mathbf{a}_2,
\begin{bmatrix} 2 \\ 3 \\ 3 \end{bmatrix} = 1 \cdot \mathbf{a}_1 + 1 \cdot \mathbf{a}_2, \qquad \begin{bmatrix} 3 \\ 3 \\ 6 \end{bmatrix} = 3 \cdot \mathbf{a}_1 + 0 \cdot \mathbf{a}_2
In \mathbb{R}^3, the dimension of \mathcal{C}(A) equals the rank r of A:
In \mathbb{R}^n, the dimension of \mathcal{C}(A) equals the rank r of A:
The pictures we have drawn of the column space represent the possible outputs \mathbf{b} from the linear system A\mathbf{x}=\mathbf{b}.
In our example, all potential outputs \mathbf{b} live in \mathbb{R}^3, but the only ones that are actually reachable are those that are on a line or a plane within \mathbb{R}^3. So the column space of A is a special subspace of the output space.
Formally, we call the output space the codomain of A, and the column space of A is a subspace of the codomain.
But there’s another space that’s important: the space of all the possible inputs \mathbf{x} that can be fed into A. We call this the domain of A.
In our examples so far, because the vectors \mathbf{x} are all in \mathbb{R}^3, the domain is also \mathbb{R}^3.
Let’s consider a simple 2x3 matrix:
F = \begin{bmatrix} 1 & 2 & 0 \\ 1 & 2 & 0 \end{bmatrix}
For the system F\mathbf{x}=\mathbf{b}, the possible inputs \mathbf{x} are all in \mathbb{R}^3, and the possible outputs \mathbf{b} are all in \mathbb{R}^2.
For example, if we take \mathbf{x} = \begin{bmatrix} 1 \\ 2 \\ 3 \end{bmatrix}, then F\mathbf{x} = \begin{bmatrix} 1 & 2 & 0 \\ 1 & 2 & 0 \end{bmatrix} \begin{bmatrix} 1 \\ 2 \\ 3 \end{bmatrix} = \begin{bmatrix} 5 \\ 5 \end{bmatrix} = \mathbf{b}
So \mathbf{x}=\begin{bmatrix} 1 \\ 2 \\ 3 \end{bmatrix} is in the domain of F, and \mathbf{b}=\begin{bmatrix} 5 \\ 5 \end{bmatrix} is in the codomain of F.
We know that \mathbf{b}=\begin{bmatrix} 5 \\ 5 \end{bmatrix} is in the column space of F because we constructed it from multiplying F by \mathbf{x}.
There are other vectors, such as \begin{bmatrix} 1 \\ -1 \end{bmatrix}, that are in the codomain of F but not in its column space.
The column space is a subspace of the codomain, and it tells us something about the matrix F.
Is there a similar special subset of the vectors in the domain that tells us something about the matrix F? There is!
Yes, it’s the null space of F. The null space of F is the set of all vectors \mathbf{x} such that F\mathbf{x}=\mathbf{0}.
For example, the vector \begin{bmatrix} 1 \\ -1 \\ 0 \end{bmatrix} is in the null space of F because F\begin{bmatrix} 1 \\ -1 \\ 0 \end{bmatrix} = \begin{bmatrix} 1 & 2 & 0 \\ 1 & 2 & 0 \end{bmatrix} \begin{bmatrix} 1 \\ -1 \\ 0 \end{bmatrix} = \begin{bmatrix} 1 \cdot 1 + 2 \cdot (-1) + 0 \cdot 0 \\ 1 \cdot 1 + 2 \cdot (-1) + 0 \cdot 0 \end{bmatrix} = \begin{bmatrix} 0 \\ 0 \end{bmatrix} = \mathbf{0}
If we know a basis for the null space, we can visualize it in the domain.
We haven’t learned how to find these yet, but for our example matrix F=\begin{bmatrix} 1 & 2 & 0 \\ 1 & 2 & 0 \end{bmatrix}, I’ll just tell you that the basis for the null space is \left\lbrace \begin{bmatrix} -2 \\ 1 \\ 0 \end{bmatrix}, \begin{bmatrix} 0 \\ 0 \\ 1 \end{bmatrix} \right\rbrace.
So the null space will form a plane in \mathbb{R}^3.
But because the null space is a subspace of the input space (the domain) and the column space is a subspace of the output space (the codomain), we can’t visualize them together in the same picture.
So we’ll visualize them separately.
The column space \mathcal{C}(F) is all vectors in \mathbb{R}^2 that are multiples of \begin{bmatrix} 1 \\ 1 \end{bmatrix}.
The null space \mathcal{N}(F) is that spanned by the vectors \begin{bmatrix} -2 \\ 1 \\ 0 \end{bmatrix} and \begin{bmatrix} 0 \\ 0 \\ 1 \end{bmatrix}.
The row space of a matrix A is the subspace of \mathbb{R}^m spanned by the rows of A. It is also the column space of A^T.
The left null space of a matrix A is just the null space of A^T.
Because the column space is so often a plane, let’s build our intuition about planes.
A plane through the origin is the set of all vectors \vec x such that \vec a \cdot \vec x = 0, where the nonzero vector \vec a is a normal vector to the plane (we often take \vec a of unit length).
For \mathbb{R}^3, we can find the normal vector by taking the cross product of two vectors in the plane.
Let \mathbf{i}, \mathbf{j}, and \mathbf{k} represent the standard basis \mathbf{e}_{1}, \mathbf{e}_{2}, \mathbf{e}_{3}.
\mathbf{u} \times \mathbf{v}=\left(u_{2} v_{3}-u_{3} v_{2}\right) \mathbf{i}+\left(u_{3} v_{1}-u_{1} v_{3}\right) \mathbf{j}+\left(u_{1} v_{2}-u_{2} v_{1}\right) \mathbf{k}=\left|\begin{array}{ccc} \mathbf{i} & \mathbf{j} & \mathbf{k} \\ u_{1} & u_{2} & u_{3} \\ v_{1} & v_{2} & v_{3} \end{array}\right|
The cross product is orthogonal to both \mathbf{u} and \mathbf{v}: \mathbf{u} \cdot (\mathbf{u} \times \mathbf{v})=0 and \mathbf{v} \cdot (\mathbf{u} \times \mathbf{v})=0.
We can use the cross product to find the normal vector to the plane by taking the cross product of two vectors in the plane.
If the plane is specified by ax + by + cz = d, then a normal vector is \vec a = (a, b, c).
Check (optional verification):
Take the cross product \vec u \times \vec v:
\begin{aligned} \vec u \times \vec v &= \left|\begin{array}{ccc} \mathbf{i} & \mathbf{j} & \mathbf{k} \\ 0 & d/b & -d/c \\ d/a & 0 & -d/c \end{array}\right| \\ &= -\frac{d^2}{bc} \mathbf{i} + \frac{d^2}{ac} \mathbf{j} - \frac{d^2}{ab} \mathbf{k} = -\frac{d^2}{abc} \left( a \mathbf{i} - b \mathbf{j} + c \mathbf{k} \right) \end{aligned}
This is parallel to \vec{a} = (a, b, c).
The distance from a point \vec x to the plane \vec a \cdot \vec x = 0 is the scalar projection of \vec x onto the normal:
The displacement from \vec x to the closest point on the plane has length |\vec a \cdot \vec x|/\|\vec a\| and direction along \vec a.
The closest point on the plane to \vec x is \vec x - \vec a (\vec x \cdot \vec a)/\|\vec a\|^2 — i.e., \vec x minus the projection of \vec x onto \vec a.
A plane displaced from the origin can be described by a normal \vec a and a scalar q: the plane is displaced by q \vec a from the origin.
Points \vec x on the plane satisfy \vec a \cdot \vec x = q (with \vec a unit length, q is the signed distance of the plane from the origin).
A hyperplane in \mathbb{R}^{n} is the set of all \mathbf{x} \in \mathbb{R}^{n} such that \mathbf{a} \cdot \mathbf{x}=b, where the nonzero vector \mathbf{a} \in \mathbb{R}^{n} and scalar b are given.
Let H be the hyperplane in \mathbb{R}^{n} defined by \mathbf{a} \cdot \mathbf{x}=b and let \mathbf{x}_{*} \in H. Then
\mathbf{a}^{\perp}=\{\mathbf{y} \in \mathbb{R}^{n} \mid \mathbf{a} \cdot \mathbf{y}=0\} is a subspace of \mathbb{R}^{n} of dimension n-1.
H=\mathbf{x}_{*}+\mathbf{a}^{\perp}=\{\mathbf{x}_{*}+\mathbf{y} \mid \mathbf{y} \in \mathbf{a}^{\perp}\}.
Find an equation that defines the plane containing the three (noncollinear) points P, Q, and R with coordinates (1,0,2), (2,1,0), and (3,1,1), respectively.
\begin{aligned} \overrightarrow{P Q}&=(2,1,0)-(1,0,2)=(1,1,-2) \\ \overrightarrow{P R}&=(3,1,1)-(1,0,2)=(2,1,-1) \end{aligned}
\mathbf{u} \times \mathbf{v}=\left|\begin{array}{ccc} \mathbf{i} & \mathbf{j} & \mathbf{k} \\ 1 & 1 & -2 \\ 2 & 1 & -1 \end{array}\right|=\mathbf{i}-3 \mathbf{j}-\mathbf{k}
The normal vector is \mathbf{a}=(1,-3,-1).
The equation of the plane is \mathbf{a} \cdot \mathbf{x}=b. Using P: b=\mathbf{a} \cdot \mathbf{P}=(1,-3,-1) \cdot (1,0,2)=1+0-2=-1.
So the plane is x - 3y - z = -1.
The closest point on the hyperplane \vec a \cdot \vec x = q to a point \vec x is \vec x - \vec a (\vec a \cdot \vec x - q) (for unit \vec a).
Let \mathbf{u} and \mathbf{v} be vectors, \mathbf{v} \neq \mathbf{0}.
\mathbf{p}=\frac{\mathbf{v} \cdot \mathbf{u}}{\mathbf{v} \cdot \mathbf{v}} \mathbf{v} \quad \text{and} \quad \mathbf{q}=\mathbf{u}-\mathbf{p}
Then \mathbf{p} is parallel to \mathbf{v}, \mathbf{q} is orthogonal to \mathbf{v}, and \mathbf{u}=\mathbf{p}+\mathbf{q}.
We write \operatorname{proj}_{\mathbf{v}} \mathbf{u}=\frac{\mathbf{v} \cdot \mathbf{u}}{\mathbf{v} \cdot \mathbf{v}} \mathbf{v} and \operatorname{orth}_{\mathbf{v}} \mathbf{u}=\mathbf{u}-\operatorname{proj}_{\mathbf{v}} \mathbf{u}.
When A\mathbf{x}=\mathbf{b} is inconsistent, \mathbf{b} is not in the column space. Least squares finds \widehat{\mathbf{x}} so that A\widehat{\mathbf{x}} is the closest point in \mathcal{C}(A) to \mathbf{b}—i.e., the projection of \mathbf{b} onto the plane \mathcal{C}(A). The residual \mathbf{b} - A\widehat{\mathbf{x}} is perpendicular to \mathcal{C}(A).
When data are not on a line, \mathbf{y} - m\mathbf{x} \neq \mathbf{0}. The vector \mathbf{y} - m\mathbf{x} is the residual vector; we minimize \|\mathbf{y} - m\mathbf{x}\|^2.
The best m minimizes \|\mathbf{y} - m\mathbf{x}\|^2. Geometrically, that happens when the residual \mathbf{y} - m\mathbf{x} is orthogonal to \mathbf{x}.
m \mathbf{x} - \mathbf{y} should be orthogonal to \mathbf{x}, so \mathbf{x} \cdot (m \mathbf{x} - \mathbf{y}) = 0.
\begin{aligned} m \mathbf{x} \cdot \mathbf{x} - \mathbf{x} \cdot \mathbf{y} &= 0 \\ m \,\mathbf{x} \cdot \mathbf{x} &= \mathbf{x} \cdot \mathbf{y} \end{aligned}
In matrix notation: \mathbf{x} \to \mathbf{A}, \mathbf{y} \to \mathbf{b}, m \to \mathbf{x}. Then \mathbf{A} \mathbf{x} = \mathbf{b} is inconsistent; the best fit satisfies
\mathbf{A}^T \mathbf{A} \mathbf{x} = \mathbf{A}^T \mathbf{b}
These are the normal equations.
When there is one predictor and one response, \mathbf{x} (the unknown) is a scalar. Write the predictor vector as \mathbf{a} and the response vector as \mathbf{b}. Then:
\mathbf{a}^T \mathbf{a}\, x = \mathbf{a}^T \mathbf{b} \quad \Rightarrow \quad x = \frac{\mathbf{a} \cdot \mathbf{b}}{\mathbf{a} \cdot \mathbf{a}}
This is the slope of the least-squares line (through the origin).
You have a scale that measures in unknown units. You have items of known weight approximately 2, 5, and 7 pounds. You measure the scale output and get 0.7, 2.4, and 3.2. What is the conversion factor from the scale’s units to pounds?
Set \mathbf{a}= [2,\, 5,\, 7]^T and \mathbf{b}= [0.7,\, 2.4,\, 3.2]^T. We want x such that \mathbf{a} \cdot x = \mathbf{b} (componentwise); the system is inconsistent, so we use least squares.
\mathbf{a}^T \mathbf{a}\, x = \mathbf{a}^T \mathbf{b} \quad \Rightarrow \quad x = \frac{2(0.7) + 5(2.4) + 7(3.2)}{2^2 + 5^2 + 7^2} \approx 0.459
So to convert from scale units to pounds, multiply by \approx 0.459 (i.e., scale reading \times 0.459 \approx pounds).
(Book may use the opposite convention: conversion from pounds to scale units.)