Ch4 Lecture 2
More Least Squares
Normal equations for multiple predictors
When we have multiple predictors, \(\mathbf{A}\) is a matrix
. . .
Each column corresponds to a predictor variable.
. . .
\(\mathbf{x}\) is a column vector of the coefficients we are trying to find, one per predictor.
. . .
What do the normal equations tell us? We can break them apart into one equation corresponding to each column of \(\mathbf{A}\).
. . .
\[ \begin{aligned} \mathbf{A}^{T} \mathbf{A} \mathbf{x} &=\mathbf{A}^{T} \mathbf{b} \\ \text{becomes} \\ \mathbf{a_i}\cdot \mathbf{A} \mathbf{x} &=\mathbf{a_i} \cdot \mathbf{b}\text{ for every i} \\ \mathbf{a_i}\cdot \left(\mathbf{b}-\mathbf{Ax}\right)&=0\text{ for every i} \end{aligned} \]
. . .
We need to find \(\mathbf{x}\) such that the residuals are orthogonal to every column of \(\mathbf{A}\).
Example
We have two predictors \(a_1\) and \(a_2\) and a response variable \(b\). We have the following data:
\(x_1\) | \(x_2\) | \(y\) |
---|---|---|
2 | 1 | 0 |
1 | 1 | 0 |
2 | 1 | 2 |
. . .
We are hoping to find a linear relationship of the form \(\beta_1 x_1 + \beta_2 x_2 = y\) for some values of \(\beta_1\) and \(\beta_2\).
. . .
This would bring us the following system of equations:
\[ \begin{aligned} 2 \beta_{1}+\beta_{2} & =0 \\ \beta_{1}+\beta_{2} & =0 \\ 2 \beta_{1}+\beta_{2} & =2 . \end{aligned} \]
. . .
Obviously inconsistent! (\(2 \beta_{1}+\beta_{2}\) cannot equal both 0 and 2.)
. . .
Find the least squares solution using the normal equations:
Change variable names, so the \(\beta\)s are now \(\mathbf{x}\), the \(x\)s are now the columns \(\mathbf{A}\), and the \(y\)s are now \(\mathbf{b}\).
\[ A=\left[\begin{array}{ll} 2 & 1 \\ 1 & 1 \\ 2 & 1 \end{array}\right], \text { and } \mathbf{b}=\left[\begin{array}{l} 0 \\ 0 \\ 2 \end{array}\right] \]
. . .
Residuals are \(\mathbf{b}-\mathbf{A} \mathbf{x}\).
Normal equations are \(\mathbf{A}^{T} \mathbf{A} \mathbf{x}=\mathbf{A}^{T} \mathbf{b}\).
\[ A^{T} A=\left[\begin{array}{lll} 2 & 1 & 2 \\ 1 & 1 & 1 \end{array}\right]\left[\begin{array}{ll} 2 & 1 \\ 1 & 1 \\ 2 & 1 \end{array}\right]=\left[\begin{array}{ll} 9 & 5 \\ 5 & 3 \end{array}\right] \]
with inverse \[ \left(A^{T} A\right)^{-1}=\left[\begin{array}{ll} 9 & 5 \\ 5 & 3 \end{array}\right]^{-1}=\frac{1}{2}\left[\begin{array}{rr} 3 & -5 \\ -5 & 9 \end{array}\right] \]
. . .
\[ A^{T} \mathbf{b}=\left[\begin{array}{lll} 2 & 1 & 2 \\ 1 & 1 & 1 \end{array}\right]\left[\begin{array}{l} 0 \\ 0 \\ 2 \end{array}\right]=\left[\begin{array}{l} 4 \\ 2 \end{array}\right] \]
Least Squares Solution
\[ \mathbf{x}=\left(A^{T} A\right)^{-1} A^{T} \mathbf{b}=\frac{1}{2}\left[\begin{array}{rr} 3 & -5 \\ -5 & 9 \end{array}\right]\left[\begin{array}{l} 4 \\ 2 \end{array}\right]=\left[\begin{array}{r} 1 \\ -1 \end{array}\right] \]
. . .
Back in our original variables, this means the best fit estimate for \(\beta_1\) is 1 and the best \(\beta_2\) is -1.
Orthogonal and Orthonormal Sets
The set of vectors \(\mathbf{v}_{1}, \mathbf{v}_{2}, \ldots, \mathbf{v}_{n}\) is an orthogonal set if \(\mathbf{v}_{i} \cdot \mathbf{v}_{j}=0\) whenever \(i \neq j\).
If, in addition, each vector has unit length, i.e., \(\mathbf{v}_{i} \cdot \mathbf{v}_{i}=1\), then the set of vectors is orthonormal .
Orthogonal Coordinates Theorem
If \(\mathbf{v}_{1}, \mathbf{v}_{2}, \ldots, \mathbf{v}_{n}\) are nonzero and orthogonal, and
\(\mathbf{v} \in \operatorname{span}\left\{\mathbf{v}_{1}, \mathbf{v}_{2}, \ldots, \mathbf{v}_{n}\right\}\),
\(\mathbf{v}\) can be expressed uniquely (up to order) as a linear combination of \(\mathbf{v}_{1}, \mathbf{v}_{2}, \ldots, \mathbf{v}_{n}\), namely
\[ \mathbf{v}=\frac{\mathbf{v}_{1} \cdot \mathbf{v}}{\mathbf{v}_{1} \cdot \mathbf{v}_{1}} \mathbf{v}_{1}+\frac{\mathbf{v}_{2} \cdot \mathbf{v}}{\mathbf{v}_{2} \cdot \mathbf{v}_{2}} \mathbf{v}_{2}+\cdots+\frac{\mathbf{v}_{n} \cdot \mathbf{v}}{\mathbf{v}_{n} \cdot \mathbf{v}_{n}} \mathbf{v}_{n} \]
Proof
Since \(\mathbf{v} \in \operatorname{span}\left\{\mathbf{v}_{1}, \mathbf{v}_{2}, \ldots, \mathbf{v}_{n}\right\}\),
we can write \(\mathbf{v}\) as a linear combination of the \(\mathbf{v}_{i}\) ’s:
\[ \mathbf{v}=c_{1} \mathbf{v}_{1}+c_{2} \mathbf{v}_{2}+\cdots+c_{n} \mathbf{v}_{n} \]
. . .
Now take the inner product of both sides with \(\mathbf{v}_{k}\):
. . .
Since \(\mathbf{v}_{k} \cdot \mathbf{v}_{j}=0\) if \(j \neq k\),
\[ \begin{aligned} \mathbf{v}_{k} \cdot \mathbf{v} & =\mathbf{v}_{k} \cdot\left(c_{1} \mathbf{v}_{1}+c_{2} \mathbf{v}_{2}+\ldots \cdots+c_{n} \mathbf{v}_{n}\right) \\ & =c_{1} \mathbf{v}_{k} \cdot \mathbf{v}_{1}+c_{2} \mathbf{v}_{k} \cdot \mathbf{v}_{2}+\cdots+c_{n} \mathbf{v}_{k} \cdot \mathbf{v}_{n}=c_{k} \mathbf{v}_{k} \cdot \mathbf{v}_{k} \end{aligned} \]
. . .
Since \(\mathbf{v}_{k} \neq 0\), \(\left\|\mathbf{v}_{k}\right\|^{2}=\mathbf{v}_{k} \cdot \mathbf{v}_{k} \neq 0\) so we can divide::
\[ c_{k}=\frac{\mathbf{v}_{k} \cdot \mathbf{v}}{\mathbf{v}_{k} \cdot \mathbf{v}_{k}} \]
. . .
Any linear combination of an orthogonal set of nonzero vectors is the sum of its projections in the direction of each vector in the set.
Orthogonal matrix
Suppose \(\mathbf{u}_{1}, \mathbf{u}_{2}, \ldots, \mathbf{u}_{n}\) is an orthonormal basis of \(\mathbb{R}^{n}\)
. . .
Make these the column vectors of \(\mathbf{A}\): \(A=\left[\mathbf{u}_{1}, \mathbf{u}_{2}, \ldots, \mathbf{u}_{n}\right]\)
. . .
Because the \(\mathbf{u}_{i}\) are orthonormal, \(\mathbf{u}_{i}^{T} \mathbf{u}_{j}=\delta_{i j}\)
. . .
Use this to calculate \(A^{T} A\).
- its \((i, j)\) th entry is \(\mathbf{u}_{n}^{T} \mathbf{u}_{n}\)
- = \(\delta_{i j}\)
- So we have = \(A^{T} A=\left[\delta_{i j}\right]=I\)
- Therefore, \(A^{T} = A^{-1}\)
Orthogonal matrix
A square real matrix \(Q\) is called orthogonal if \(Q^{T}=Q^{-1}\).
A square matrix \(U\) is called unitary if \(U^{*}=U^{-1}\).
Example
Show that the matrix \(R(\theta)=\left[\begin{array}{rr}\cos \theta & -\sin \theta \\ \sin \theta & \cos \theta\end{array}\right]\) is orthogonal.
. . .
\[ \begin{aligned} R(\theta)^{T} R(\theta) & =\left(\left[\begin{array}{cc} \cos \theta & -\sin \theta \\ \sin \theta & \cos \theta \end{array}\right]\right)^{T}\left[\begin{array}{cc} \cos \theta & -\sin \theta \\ \sin \theta & \cos \theta \end{array}\right] \\ & =\left[\begin{array}{cc} \cos \theta \sin \theta \\ -\sin \theta \cos \theta \end{array}\right]\left[\begin{array}{cc} \cos \theta-\sin \theta \\ \sin \theta & \cos \theta \end{array}\right] \\ & =\left[\begin{array}{cc} \cos ^{2} \theta+\sin ^{2} \theta & \cos \theta \sin \theta-\sin \theta \cos \theta \\ -\cos \theta \sin \theta+\sin \theta \cos \theta & \sin ^{2} \theta+\cos ^{2} \theta \end{array}\right]=\left[\begin{array}{ll} 1 & 0 \\ 0 & 1 \end{array}\right], \end{aligned} \]
Rigidity of orthogonal (and unitary) matrices
This rotation matrix preserves vector lengths and angles between vectors (illustrate by hand).
Thus, \(R(\theta)\mathbf{x}\cdot R(\theta)\mathbf{y}=\mathbf{x} \cdot \mathbf{y}\)
. . .
This is true in general for orthogonal matrices:
\[ Q \mathbf{x} \cdot Q \mathbf{y}=(Q \mathbf{x})^{T} Q \mathbf{y}=\mathbf{x}^{T} Q^{T} Q \mathbf{y}=\mathbf{x}^{T} \mathbf{y}=\mathbf{x} \cdot \mathbf{y} \]
also,
\[ \|Q \mathbf{x}\|^{2}=Q \mathbf{x} \cdot Q \mathbf{x}=(Q \mathbf{x})^{T} Q \mathbf{x}=\mathbf{x}^{T} Q^{T} Q \mathbf{x}=\mathbf{x}^{T} \mathbf{x}=\|\mathbf{x}\|^{2} \]
Finding orthogonal bases
Gram-Schmidt Algorithm
Let \(\mathbf{w}_{1}, \mathbf{w}_{2}, \ldots, \mathbf{w}_{n}\) be linearly independent vectors in a standard space.
. . .
Define vectors \(\mathbf{v}_{1}, \mathbf{v}_{2}, \ldots, \mathbf{v}_{n}\) recursively:
Take each \(\mathbf{w}_{k}\) and subtract off the projection of \(\mathbf{w}_{k}\) onto each of the previous vectors \(\mathbf{v}_{1}, \mathbf{v}_{2}, \ldots, \mathbf{v}_{k-1}\)
. . .
\[ \mathbf{v}_{k}=\mathbf{w}_{k}-\frac{\mathbf{v}_{1} \cdot \mathbf{w}_{k}}{\mathbf{v}_{1} \cdot \mathbf{v}_{1}} \mathbf{v}_{1}-\frac{\mathbf{v}_{2} \cdot \mathbf{w}_{k}}{\mathbf{v}_{2} \cdot \mathbf{v}_{2}} \mathbf{v}_{2}-\cdots-\frac{\mathbf{v}_{k-1} \cdot \mathbf{w}_{k}}{\mathbf{v}_{k-1} \cdot \mathbf{v}_{k-1}} \mathbf{v}_{k-1}, \quad k=1, \ldots, n \]
\[ \mathbf{v}_{k}=\mathbf{w}_{k}-\frac{\mathbf{v}_{1} \cdot \mathbf{w}_{k}}{\mathbf{v}_{1} \cdot \mathbf{v}_{1}} \mathbf{v}_{1}-\frac{\mathbf{v}_{2} \cdot \mathbf{w}_{k}}{\mathbf{v}_{2} \cdot \mathbf{v}_{2}} \mathbf{v}_{2}-\cdots-\frac{\mathbf{v}_{k-1} \cdot \mathbf{w}_{k}}{\mathbf{v}_{k-1} \cdot \mathbf{v}_{k-1}} \mathbf{v}_{k-1}, \quad k=1, \ldots, n \]
. . .
Then
The vectors \(\mathbf{v}_{1}, \mathbf{v}_{2}, \ldots, \mathbf{v}_{k}\) form an orthogonal set.
For each index \(k=1, \ldots, n\),
. . .
\[ \operatorname{span}\left\{\mathbf{w}_{1}, \mathbf{w}_{2}, \ldots, \mathbf{w}_{k}\right\}=\operatorname{span}\left\{\mathbf{v}_{1}, \mathbf{v}_{2}, \ldots, \mathbf{v}_{k}\right\} \]
Example
Let \(V=\mathcal{C}(A)\) with the standard inner product and compute an orthonormal basis of \(V\), where
\[ A=\left[\begin{array}{rrrr} 1 & 2 & 0 & -1 \\ 1 & -1 & 3 & 2 \\ 1 & -1 & 3 & 2 \\ -1 & 1 & -3 & 1 \end{array}\right] \]
. . .
RREF of \(A\) is:
\[ R=\left[\begin{array}{rrrr} 1 & 0 & 2 & 0 \\ 0 & 1 & -1 & 0 \\ 0 & 0 & 0 & 1 \\ 0 & 0 & 0 & 0 \end{array}\right] \]
. . .
The independent columns are columns 1, 2, and 4.
. . .
So let these be our \(\mathbf{w}_{1}, \mathbf{w}_{2}, \mathbf{w}_{3}\).
Step 1: \(\mathbf{v}_{1}=\mathbf{w}_{1}=(1,1,1,-1)\)
. . .
Step 2: \(\mathbf{v}_{2}=\mathbf{w}_{2}-\frac{\mathbf{v}_{1} \cdot \mathbf{w}_{2}}{\mathbf{v}_{1} \cdot \mathbf{v}_{1}} \mathbf{v}_{1}\) . . .
\(=(2,-1,-1,1)-\frac{-1}{4}(1,1,1,-1)=\frac{1}{4}(9,-3,-3,3)\)
. . .
Step 3: \(\mathbf{v}_{3}=\mathbf{w}_{3}-\frac{\mathbf{v}_{1} \cdot \mathbf{w}_{3}}{\mathbf{v}_{1} \cdot \mathbf{v}_{1}} \mathbf{v}_{1}-\frac{\mathbf{v}_{2} \cdot \mathbf{w}_{3}}{\mathbf{v}_{2} \cdot \mathbf{v}_{2}} \mathbf{v}_{2}\)
. . .
\(=(-1,2,2,1)-\frac{2}{4}(1,1,1,-1)-\frac{-18}{108}(9,-3,-3,3)=(0,1,1,2)\)
Finding orthonormal basis
Now we need to normalize these vectors to get an orthonormal basis.
. . .
\(\mathbf{u}_{1}=\frac{1}{\|\mathbf{v}_{1}\|} \mathbf{v}_{1}=\frac{1}{2}(1,1,1,-1)\)
. . .
\(\mathbf{u}_{2}=\frac{1}{\|\mathbf{v}_{2}\|} \mathbf{v}_{2}=\frac{1}{\sqrt(108)}(9,-3,-3,3)\)
etc.
QR Factorization
If \(A\) is an \(m \times n\) full-column-rank matrix, then \(A=Q R\), where the columns of the \(m \times n\) matrix \(Q\) are orthonormal vectors and the \(n \times n\) matrix \(R\) is upper triangular with nonzero diagonal entries.
. . .
Why do we care?
- Can use to solve linear systems:
. . .
- can be less susceptible to round-off error than Gauss-Jordan.
- Can be used to solve least squares problems (will show you how!)
How to find \(Q\) and \(R\)
- Start with the columns of A, \(A=\left[\mathbf{w}_{1}, \mathbf{w}_{2}, \mathbf{w}_{3}\right]\). (For now assume they are linearly independent.)
- Do Gram-Schmidt on the columns of \(A\):
. . .
\[ \begin{aligned} & \mathbf{v}_{1}=\mathbf{w}_{1} \\ & \mathbf{v}_{2}=\mathbf{w}_{2}-\frac{\mathbf{v}_{1} \cdot \mathbf{w}_{2}}{\mathbf{v}_{1} \cdot \mathbf{v}_{1}} \mathbf{v}_{1} \\ & \mathbf{v}_{3}=\mathbf{w}_{3}-\frac{\mathbf{v}_{1} \cdot \mathbf{w}_{3}}{\mathbf{v}_{1} \cdot \mathbf{v}_{1}} \mathbf{v}_{1}-\frac{\mathbf{v}_{2} \cdot \mathbf{w}_{3}}{\mathbf{v}_{2} \cdot \mathbf{v}_{2}} \mathbf{v}_{2} . \end{aligned} \]
. . .
- Solve these equations for \(\mathbf{w}_{1}, \mathbf{w}_{2}, \mathbf{w}_{3}\):
. . .
\[ \begin{aligned} & \mathbf{w}_{1}=\mathbf{v}_{1} \\ & \mathbf{w}_{2}=\frac{\mathbf{v}_{1} \cdot \mathbf{w}_{2}}{\mathbf{v}_{1} \cdot \mathbf{v}_{1}} \mathbf{v}_{1}+\mathbf{v}_{2} \\ & \mathbf{w}_{3}=\frac{\mathbf{v}_{1} \cdot \mathbf{w}_{3}}{\mathbf{v}_{1} \cdot \mathbf{v}_{1}} \mathbf{v}_{1}+\frac{\mathbf{v}_{2} \cdot \mathbf{w}_{3}}{\mathbf{v}_{2} \cdot \mathbf{v}_{2}} \mathbf{v}_{2}+\mathbf{v}_{3} . \end{aligned} \]
\[ \begin{aligned} & \mathbf{w}_{1}=\mathbf{v}_{1} \\ & \mathbf{w}_{2}=\frac{\mathbf{v}_{1} \cdot \mathbf{w}_{2}}{\mathbf{v}_{1} \cdot \mathbf{v}_{1}} \mathbf{v}_{1}+\mathbf{v}_{2} \\ & \mathbf{w}_{3}=\frac{\mathbf{v}_{1} \cdot \mathbf{w}_{3}}{\mathbf{v}_{1} \cdot \mathbf{v}_{1}} \mathbf{v}_{1}+\frac{\mathbf{v}_{2} \cdot \mathbf{w}_{3}}{\mathbf{v}_{2} \cdot \mathbf{v}_{2}} \mathbf{v}_{2}+\mathbf{v}_{3} . \end{aligned} \]
In matrix form, these become:
. . .
\[ A=\left[\mathbf{w}_{1}, \mathbf{w}_{2}, \mathbf{w}_{3}\right]=\left[\mathbf{v}_{1}, \mathbf{v}_{2}, \mathbf{v}_{3}\right]\left[\begin{array}{ccc} 1 & \frac{\mathbf{v}_{1} \cdot \mathbf{w}_{2}}{\mathbf{v}_{1} \cdot \mathbf{v}_{1}} & \frac{\mathbf{v}_{1} \cdot \mathbf{w}_{3}}{\mathbf{v}_{1} \cdot \mathbf{v}_{1}} \\ 0 & 1 & \frac{\mathbf{v}_{2} \cdot \mathbf{w}_{3}}{\mathbf{v}_{2} \cdot \mathbf{v}_{2}} \\ 0 & 0 & 1 \end{array}\right] \]
Normalizing \(Q\)
We can normalize the columns of \(Q\) by dividing each column by its length.
. . .
Set \(\mathbf{q}_{j}=\mathbf{v}_{j} /\left\|\mathbf{v}_{j}\right\|\)..
. . .
\[ \begin{aligned} A & =\left[\mathbf{q}_{1}, \mathbf{q}_{2}, \mathbf{q}_{3}\right]\left[\begin{array}{ccc} \left\|\mathbf{v}_{1}\right\| & 0 & 0 \\ 0 & \left\|\mathbf{v}_{2}\right\| & 0 \\ 0 & 0 & \left\|\mathbf{v}_{3}\right\| \end{array}\right]\left[\begin{array}{ccc} 1 & \frac{\mathbf{v}_{1} \cdot \mathbf{w}_{2}}{\mathbf{v}_{1} \cdot \mathbf{v}_{1}} & \frac{\mathbf{v}_{1} \cdot \mathbf{w}_{3}}{\mathbf{v}_{1} \cdot \mathbf{v}_{1}} \\ 0 & 1 & \frac{\mathbf{v}_{2} \cdot \mathbf{w}_{3}}{\mathbf{v}_{2}, \mathbf{v}_{2}} \\ 0 & 0 & 1 \end{array}\right] \end{aligned} \]
. . .
\[ \begin{aligned} & =\left[\mathbf{q}_{1}, \mathbf{q}_{2}, \mathbf{q}_{3}\right]\left[\begin{array}{ccc} \left\|\mathbf{v}_{1}\right\| & \frac{\mathbf{v}_{1} \cdot \mathbf{w}_{2}}{\left\|\mathbf{v}_{1}\right\|} & \frac{\mathbf{v}_{1} \cdot \mathbf{w}_{3}}{\left\|\mathbf{v}_{1}\right\|} \\ 0 & \left\|\mathbf{v}_{2}\right\| & \frac{\mathbf{v}_{2} \cdot \mathbf{w}_{3}}{\left\|\mathbf{v}_{2}\right\|} \\ 0 & 0 & \left\|\mathbf{v}_{3}\right\| \end{array}\right] . \end{aligned} \]
Final form
\[ \begin{aligned} A & =\left[\mathbf{q}_{1}, \mathbf{q}_{2}, \mathbf{q}_{3}\right]\left[\begin{array}{ccc} \left\|\mathbf{v}_{1}\right\| & \frac{\mathbf{v}_{1} \cdot \mathbf{w}_{2}}{\left\|\mathbf{v}_{1}\right\|} & \frac{\mathbf{v}_{1} \cdot \mathbf{w}_{3}}{\left\|\mathbf{v}_{1}\right\|} \\ 0 & \left\|\mathbf{v}_{2}\right\| & \frac{\mathbf{v}_{2} \cdot \mathbf{w}_{3}}{\left\|\mathbf{v}_{2}\right\|} \\ 0 & 0 & \left\|\mathbf{v}_{3}\right\| \end{array}\right] \\[10pt] &= \left[\mathbf{q}_{1}, \mathbf{q}_{2}, \mathbf{q}_{3}\right]\left[\begin{array}{ccc} \left\|\mathbf{v}_{1}\right\| & \mathbf{q}_{1} \cdot \mathbf{w}_{2} & \mathbf{q}_{1} \cdot \mathbf{w}_{3} \\ 0 & \left\|\mathbf{v}_{2}\right\| & \mathbf{q}_{2} \cdot \mathbf{w}_{3} \\ 0 & 0 & \left\|\mathbf{v}_{3}\right\| \end{array}\right] \end{aligned} \]
Example
Find the QR factorization of the matrix
\[ A=\left[\begin{array}{rrr} 1 & 2 & -1 \\ 1 & -1 & 2 \\ 1 & -1 & 2 \\ -1 & 1 & 1 \end{array}\right] \]
. . .
We already found the Gram-Schmidt orthogonalization of this matrix:
\[ \begin{aligned} \mathbf{v}_{1} & =\mathbf{w}_{1}=(1,1,1,-1), \\ \mathbf{v}_{2} & =\mathbf{w}_{2}-\frac{\mathbf{v}_{1} \cdot \mathbf{w}_{2}}{\mathbf{v}_{1} \cdot \mathbf{v}_{1}} \mathbf{v}_{1} \\ & =(2,-1,-1,1)-\frac{-1}{4}(1,1,1,-1)=\frac{1}{4}(9,-3,-3,3), \\ \mathbf{v}_{3} & =\mathbf{w}_{3}-\frac{\mathbf{v}_{1} \cdot \mathbf{w}_{3}}{\mathbf{v}_{1} \cdot \mathbf{v}_{1}} \mathbf{v}_{1}-\frac{\mathbf{v}_{2} \cdot \mathbf{w}_{3}}{\mathbf{v}_{2} \cdot \mathbf{v}_{2}} \mathbf{v}_{2} \\ & =(-1,2,2,1)-\frac{2}{4}(1,1,1,-1)-\frac{-18}{108}(9,-3,-3,3) \\ & =\frac{1}{4}(-4,8,8,4)-\frac{1}{4}(2,2,2,-2)+\frac{1}{4}(6,-2,-2,2)=(0,1,1,2) . \end{aligned} \]
. . .
\[ \begin{aligned} & \mathbf{u}_{1}=\frac{\mathbf{v}_{1}}{\left\|\mathbf{v}_{1}\right\|}=\frac{1}{2}(1,1,1,-1), \\ & \mathbf{u}_{2}=\frac{\mathbf{v}_{2}}{\left\|\mathbf{v}_{2}\right\|}=\frac{1}{\sqrt{108}}(9,-3,-3,3)=\frac{1}{2 \sqrt{3}}(3,-1,-1,1), \\ & \mathbf{u}_{3}=\frac{\mathbf{v}_{3}}{\left\|\mathbf{v}_{3}\right\|}=\frac{1}{\sqrt{6}}(0,1,1,2) . \end{aligned} \]
. . .
The \(\mathbf{u}\) vectors we calculated before are the \(\mathbf{q}\) vectors in the QR factorization.
\[ \begin{gathered} \left\|\mathbf{v}_{1}\right\|=\|(1,1,1,-1)\|=2 \text { and } \mathbf{q}_{1}=\frac{1}{2}(1,1,1,-1) \\ \left\|\mathbf{v}_{2}\right\|=\left\|\frac{1}{4}(9,-3,-3,3)\right\|=\frac{3}{2} \sqrt{3} \text { and } \mathbf{q}_{2}=\frac{1}{2 \sqrt{3}}(3,-1,-1,1) \\ \left\|\mathbf{v}_{3}\right\|=\|(0,1,1,2)\|=\sqrt{6} \text { and } \mathbf{q}_{3}=\frac{1}{\sqrt{6}}(0,1,1,2) \end{gathered} \]
. . .
\[ \begin{aligned} \left\langle\mathbf{q}_{1}, \mathbf{w}_{2}\right\rangle & =\frac{1}{2}(1,1,1,-1) \cdot(2,-1,-1,1)=-\frac{1}{2} \\ \left\langle\mathbf{q}_{1}, \mathbf{w}_{3}\right\rangle & =\frac{1}{2}(1,1,1,-1) \cdot(-1,2,2,1)=1 \\ \left\langle\mathbf{q}_{2}, \mathbf{w}_{3}\right\rangle & =\frac{1}{2 \sqrt{3}}(3,-1,-1,1) \cdot(-1,2,2,1)=-\sqrt{3} . \end{aligned} \]
. . .
\[ A=\left[\begin{array}{rrr} 1 / 2 & 3 /(2 \sqrt{3}) & 0 \\ 1 / 2 & -1 /(2 \sqrt{3}) & 1 / \sqrt{6} \\ 1 / 2 & -1 /(2 \sqrt{3}) & 1 / \sqrt{6} \\ -1 / 2 & 1 /(2 \sqrt{3}) & 2 / \sqrt{6} \end{array}\right]\left[\begin{array}{rrr} 2 & -1 / 2 & 1 \\ 0 & \frac{3}{2} \sqrt{3} & -\sqrt{3} \\ 0 & 0 & \sqrt{6} \end{array}\right]=Q R \]
Solving a linear system with QR
We would like to solve the system \(A \mathbf{x}=\mathbf{b}\).
. . .
\(Q R \mathbf{x}=\mathbf{b}\).
. . .
Multiply both sides by \(Q^{T}\):
Since \(Q\) is orthogonal, \(Q^{T} Q=I\) and \(Q^{T} Q R \mathbf{x}=R \mathbf{x}=Q^{T} \mathbf{b}\).
. . .
Suppose we have \(\mathbf{b}=(1,1,1,1)\). Then we are trying to solve
\[\left[\begin{array}{rrr} 2 & -1 / 2 & 1 \\ 0 & \frac{3}{2} \sqrt{3} & -\sqrt{3} \\ 0 & 0 & \sqrt{6} \end{array}\right]\mathbf{x}=\left[\begin{array}{rrr} 1 / 2 & 3 /(2 \sqrt{3}) & 0 \\ 1 / 2 & -1 /(2 \sqrt{3}) & 1 / \sqrt{6} \\ 1 / 2 & -1 /(2 \sqrt{3}) & 1 / \sqrt{6} \\ -1 / 2 & 1 /(2 \sqrt{3}) & 2 / \sqrt{6} \end{array}\right]^T \left[\begin{array}{rrr}1 \\ 1 \\ 1 \\ 1 \end{array}\right] \]
import sympy as sp
= sp.Matrix([[1/2, 3/(2*sp.sqrt(3)), 0], [1/2, -1/(2*sp.sqrt(3)), 1/sp.sqrt(6)], [1/2, -1/(2*sp.sqrt(3)), 1/sp.sqrt(6)], [-1/2, 1/(2*sp.sqrt(3)), 2/sp.sqrt(6)]])
Q = sp.Matrix([[2, -1/2, 1], [0, 3*sp.sqrt(3)/2, -sp.sqrt(3)], [0, 0, sp.sqrt(6)]])
R = sp.Matrix([1, 1, 1, 1])
b print('$$\n Q^T b ='+sp.latex(Q.T*b)+'\n$$')
print('$$\n '+sp.latex(R)+' \mathbf{x} ='+sp.latex(Q.T*b)+'\n$$')
\[ Q^T b =\left[\begin{matrix}1.0\\\frac{\sqrt{3}}{3}\\\frac{2 \sqrt{6}}{3}\end{matrix}\right] \] \[ \left[\begin{matrix}2 & -0.5 & 1\\0 & \frac{3 \sqrt{3}}{2} & - \sqrt{3}\\0 & 0 & \sqrt{6}\end{matrix}\right] \mathbf{x} =\left[\begin{matrix}1.0\\\frac{\sqrt{3}}{3}\\\frac{2 \sqrt{6}}{3}\end{matrix}\right] \]
. . .
Find \[ \mathbf{x}=\left[\begin{array}{l} 1 / 3 \\ 2 / 3 \\ 2 / 3 \end{array}\right] \]
Check:
\[ \mathbf{r}=\mathbf{b}-A \mathbf{x}=\left[\begin{array}{l} 1 \\ 1 \\ 1 \\ 1 \end{array}\right]-\left[\begin{array}{rrr} 1 & 2 & -1 \\ 1 & -1 & 2 \\ 1 & -1 & 2 \\ -1 & 1 & 1 \end{array}\right]\left[\begin{array}{l} 1 / 3 \\ 2 / 3 \\ 2 / 3 \end{array}\right]=\left[\begin{array}{l} 0 \\ 0 \\ 0 \\ 0 \end{array}\right] . \]
Least Squares with QR factorization
What if we have a system that’s not consistent?
. . .
pause
. . .
The QR factorization can be used to solve the least squares problem \(A \mathbf{x} \approx \mathbf{b}\).
Haar Wavelet Transform
Image compression idea
Haaar wavelet transform
In 1D, N data points \(\left\{x_{k}\right\}_{k=1}^{N}\)
. . .
Can average terms:
\[ \begin{equation*} y_{k}=\frac{1}{2}\left(x_{k}+x_{k-1}\right), \quad k \in \mathbb{Z} \end{equation*} \]
. . .
At the same time, we can apply a difference filter (“unsmoothing”):
\[ \begin{equation*} z_{k}=\frac{1}{2}\left(x_{k}-x_{k-1}\right), \quad k=1, \ldots, N \end{equation*} \]
. . .
Idea: take our data, apply both filters to it and keep the results.
Example
\(g(t)=\frac{3}{2}+\cos \left(\frac{\pi}{4} t\right)-\frac{1}{4} \cos \left(\frac{7 \pi}{4} t\right), 0 \leq t \leq 8\)
. . .
Sample at \(t_{k}=k / 5, k=0,1, \ldots, 40\)
. . .
\[ \begin{array}{rlll} \frac{1}{2}\left(x_{1}+x_{2}\right)=y_{2} & \frac{1}{2}\left(x_{3}+x_{4}\right)=y_{4} & \frac{1}{2}\left(x_{5}+x_{6}\right)=y_{6} \\ \frac{1}{2}\left(-x_{1}+x_{2}\right)=z_{2} & \frac{1}{2}\left(-x_{3}+x_{4}\right)=z_{4} & \frac{1}{2}\left(-x_{5}+x_{6}\right)=z_{6} \end{array} \]
. . .
\[ A_{6} \mathbf{x} \equiv \frac{1}{2}\left[\begin{array}{rrrrrr} 1 & 1 & 0 & 0 & 0 & 0 \\ 0 & 0 & 1 & 1 & 0 & 0 \\ 0 & 0 & 0 & 0 & 1 & 1 \\ -1 & 1 & 0 & 0 & 0 & 0 \\ 0 & 0 & -1 & 1 & 0 & 0 \\ 0 & 0 & 0 & 0 & -1 & 1 \end{array}\right]\left[\begin{array}{l} x_{1} \\ x_{2} \\ x_{3} \\ x_{4} \\ x_{5} \\ x_{6} \end{array}\right]=\left[\begin{array}{l} y_{2} \\ y_{4} \\ y_{6} \\ z_{2} \\ z_{4} \\ z_{6} \end{array}\right] \]
Almost orthogonal
\[ \begin{gather} A_{6} A_{6}^{T}=\frac{1}{4}\left[\begin{array}{rrrrrr} 1 & 1 & 0 & 0 & 0 & 0 \\ 0 & 0 & 1 & 1 & 0 & 0 \\ 0 & 0 & 0 & 0 & 1 & 1 \\ -1 & 1 & 0 & 0 & 0 & 0 \\ 0 & 0 & -1 & 1 & 0 & 0 \\ 0 & 0 & 0 & 0 & -1 & 1 \end{array}\right]\left[\begin{array}{cccccc} 1 & 0 & 0 & -1 & 0 & 0 \\ 1 & 0 & 0 & 1 & 0 & 0 \\ 0 & 1 & 0 & 0 & -1 & 0 \\ 0 & 1 & 0 & 0 & 1 & 0 \\ 0 & 0 & 1 & 0 & 0 & -1 \\ 0 & 0 & 1 & 0 & 0 & 1 \end{array}\right] \\[10pt] =\frac{1}{2}\left[\begin{array}{llllll} 1 & 0 & 0 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 & 0 & 0 \\ 0 & 0 & 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 0 & 0 & 1 \end{array}\right]=\frac{1}{2} I_{6} \end{gather} \]
In general,
\(A_{N} A_{N}^{T}=\frac{1}{2} I_{N}\)
. . .
These matrices are nearly orthogonal.
Haar Wavelet Transform Matrix
Can make them orthogonal by dividing by \(\sqrt{2}\):
\[ \begin{equation*} y_{k}=\frac{\sqrt{2}}{2}\left(x_{k}+x_{k-1}\right), \quad k \in \mathbb{Z} \end{equation*} \]
and
\[ \begin{equation*} z_{k}=\frac{\sqrt{2}}{2}\left(x_{k}-x_{k-1}\right), \quad k \in \mathbb{Z} \end{equation*} \]
. . .
\[ W_{N}=\frac{\sqrt{2}}{2}\left[\begin{array}{ccccccc} 1 & 1 & 0 & 0 & \ldots & 0 & 0 \\ 0 & 0 & 1 & 1 & \ldots & 0 & 0 \\ \vdots & \vdots & \vdots & \vdots & \ddots & \vdots & \vdots \\ 0 & 0 & 0 & 0 & \ldots & 1 & 1 \\ \ldots & \ldots & \ldots & \ldots & \ldots & \ldots \\ -1 & 1 & 0 & 0 & \ldots & 0 & 0 \\ 0 & 0 & -1 & 1 & \ldots & 0 & 0 \\ \vdots & \vdots & \vdots & \vdots & \ddots & \vdots & \vdots \\ 0 & 0 & 0 & 0 & \ldots & -1 & 1 \end{array}\right]=\frac{\sqrt{2}}{2}\left[\begin{array}{c} U \\ \ldots \\ L \end{array}\right] \]
Moving to 2D
For a 2D image represented pixel-by-pixel in the matrix \(\mathbf{A}\), we can apply the 1D transform to each row and then to each column: \(W_{m} A W_{n}^{T}\)
. . .
Result ends up in block form:
\[ W_{m} A W_{n}^{T}=2\left[\begin{array}{ll} B & V \\ H & D \end{array}\right] \]
. . .
. . .
\(B\) represents the blurred image of \(A\), while \(V, H\) and \(D\) represent edges of the image \(A\) along vertical, horizontal and diagonal directions, respectively.