Ch4 Lecture 2

More Least Squares

Normal equations for multiple predictors

When we have multiple predictors, \(\mathbf{A}\) is a matrix

. . .

Each column corresponds to a predictor variable.

. . .

\(\mathbf{x}\) is a column vector of the coefficients we are trying to find, one per predictor.

. . .

What do the normal equations tell us? We can break them apart into one equation corresponding to each column of \(\mathbf{A}\).

. . .

\[ \begin{aligned} \mathbf{A}^{T} \mathbf{A} \mathbf{x} &=\mathbf{A}^{T} \mathbf{b} \\ \text{becomes} \\ \mathbf{a_i}\cdot \mathbf{A} \mathbf{x} &=\mathbf{a_i} \cdot \mathbf{b}\text{ for every i} \\ \mathbf{a_i}\cdot \left(\mathbf{b}-\mathbf{Ax}\right)&=0\text{ for every i} \end{aligned} \]

. . .

We need to find \(\mathbf{x}\) such that the residuals are orthogonal to every column of \(\mathbf{A}\).

Example

We have two predictors \(a_1\) and \(a_2\) and a response variable \(b\). We have the following data:

\(x_1\)	\(x_2\)	\(y\)
2	1	0
1	1	0
2	1	2

. . .

We are hoping to find a linear relationship of the form \(\beta_1 x_1 + \beta_2 x_2 = y\) for some values of \(\beta_1\) and \(\beta_2\).

. . .

This would bring us the following system of equations:

\[ \begin{aligned} 2 \beta_{1}+\beta_{2} & =0 \\ \beta_{1}+\beta_{2} & =0 \\ 2 \beta_{1}+\beta_{2} & =2 . \end{aligned} \]

. . .

Obviously inconsistent! (\(2 \beta_{1}+\beta_{2}\) cannot equal both 0 and 2.)

. . .

Find the least squares solution using the normal equations:

Change variable names, so the \(\beta\)s are now \(\mathbf{x}\), the \(x\)s are now the columns \(\mathbf{A}\), and the \(y\)s are now \(\mathbf{b}\).

\[ A=\left[\begin{array}{ll} 2 & 1 \\ 1 & 1 \\ 2 & 1 \end{array}\right], \text { and } \mathbf{b}=\left[\begin{array}{l} 0 \\ 0 \\ 2 \end{array}\right] \]

. . .

Residuals are \(\mathbf{b}-\mathbf{A} \mathbf{x}\).

Normal equations are \(\mathbf{A}^{T} \mathbf{A} \mathbf{x}=\mathbf{A}^{T} \mathbf{b}\).

\[ A^{T} A=\left[\begin{array}{lll} 2 & 1 & 2 \\ 1 & 1 & 1 \end{array}\right]\left[\begin{array}{ll} 2 & 1 \\ 1 & 1 \\ 2 & 1 \end{array}\right]=\left[\begin{array}{ll} 9 & 5 \\ 5 & 3 \end{array}\right] \]

with inverse \[ \left(A^{T} A\right)^{-1}=\left[\begin{array}{ll} 9 & 5 \\ 5 & 3 \end{array}\right]^{-1}=\frac{1}{2}\left[\begin{array}{rr} 3 & -5 \\ -5 & 9 \end{array}\right] \]

. . .

\[ A^{T} \mathbf{b}=\left[\begin{array}{lll} 2 & 1 & 2 \\ 1 & 1 & 1 \end{array}\right]\left[\begin{array}{l} 0 \\ 0 \\ 2 \end{array}\right]=\left[\begin{array}{l} 4 \\ 2 \end{array}\right] \]

Least Squares Solution

\[ \mathbf{x}=\left(A^{T} A\right)^{-1} A^{T} \mathbf{b}=\frac{1}{2}\left[\begin{array}{rr} 3 & -5 \\ -5 & 9 \end{array}\right]\left[\begin{array}{l} 4 \\ 2 \end{array}\right]=\left[\begin{array}{r} 1 \\ -1 \end{array}\right] \]

. . .

Back in our original variables, this means the best fit estimate for \(\beta_1\) is 1 and the best \(\beta_2\) is -1.

Orthogonal and Orthonormal Sets

The set of vectors \(\mathbf{v}_{1}, \mathbf{v}_{2}, \ldots, \mathbf{v}_{n}\) is an orthogonal set if \(\mathbf{v}_{i} \cdot \mathbf{v}_{j}=0\) whenever \(i \neq j\).

If, in addition, each vector has unit length, i.e., \(\mathbf{v}_{i} \cdot \mathbf{v}_{i}=1\), then the set of vectors is orthonormal .

Orthogonal Coordinates Theorem

If \(\mathbf{v}_{1}, \mathbf{v}_{2}, \ldots, \mathbf{v}_{n}\) are nonzero and orthogonal, and

\(\mathbf{v} \in \operatorname{span}\left\{\mathbf{v}_{1}, \mathbf{v}_{2}, \ldots, \mathbf{v}_{n}\right\}\),

\(\mathbf{v}\) can be expressed uniquely (up to order) as a linear combination of \(\mathbf{v}_{1}, \mathbf{v}_{2}, \ldots, \mathbf{v}_{n}\), namely

\[ \mathbf{v}=\frac{\mathbf{v}_{1} \cdot \mathbf{v}}{\mathbf{v}_{1} \cdot \mathbf{v}_{1}} \mathbf{v}_{1}+\frac{\mathbf{v}_{2} \cdot \mathbf{v}}{\mathbf{v}_{2} \cdot \mathbf{v}_{2}} \mathbf{v}_{2}+\cdots+\frac{\mathbf{v}_{n} \cdot \mathbf{v}}{\mathbf{v}_{n} \cdot \mathbf{v}_{n}} \mathbf{v}_{n} \]

Proof

Since \(\mathbf{v} \in \operatorname{span}\left\{\mathbf{v}_{1}, \mathbf{v}_{2}, \ldots, \mathbf{v}_{n}\right\}\),

we can write \(\mathbf{v}\) as a linear combination of the \(\mathbf{v}_{i}\) ’s:

\[ \mathbf{v}=c_{1} \mathbf{v}_{1}+c_{2} \mathbf{v}_{2}+\cdots+c_{n} \mathbf{v}_{n} \]

. . .

Now take the inner product of both sides with \(\mathbf{v}_{k}\):

. . .

Since \(\mathbf{v}_{k} \cdot \mathbf{v}_{j}=0\) if \(j \neq k\),

\[ \begin{aligned} \mathbf{v}_{k} \cdot \mathbf{v} & =\mathbf{v}_{k} \cdot\left(c_{1} \mathbf{v}_{1}+c_{2} \mathbf{v}_{2}+\ldots \cdots+c_{n} \mathbf{v}_{n}\right) \\ & =c_{1} \mathbf{v}_{k} \cdot \mathbf{v}_{1}+c_{2} \mathbf{v}_{k} \cdot \mathbf{v}_{2}+\cdots+c_{n} \mathbf{v}_{k} \cdot \mathbf{v}_{n}=c_{k} \mathbf{v}_{k} \cdot \mathbf{v}_{k} \end{aligned} \]

. . .

Since \(\mathbf{v}_{k} \neq 0\), \(\left\|\mathbf{v}_{k}\right\|^{2}=\mathbf{v}_{k} \cdot \mathbf{v}_{k} \neq 0\) so we can divide::

\[ c_{k}=\frac{\mathbf{v}_{k} \cdot \mathbf{v}}{\mathbf{v}_{k} \cdot \mathbf{v}_{k}} \]

. . .

Any linear combination of an orthogonal set of nonzero vectors is the sum of its projections in the direction of each vector in the set.

Orthogonal matrix

Suppose \(\mathbf{u}_{1}, \mathbf{u}_{2}, \ldots, \mathbf{u}_{n}\) is an orthonormal basis of \(\mathbb{R}^{n}\)

. . .

Make these the column vectors of \(\mathbf{A}\): \(A=\left[\mathbf{u}_{1}, \mathbf{u}_{2}, \ldots, \mathbf{u}_{n}\right]\)

. . .

Because the \(\mathbf{u}_{i}\) are orthonormal, \(\mathbf{u}_{i}^{T} \mathbf{u}_{j}=\delta_{i j}\)

. . .

Use this to calculate \(A^{T} A\).

its \((i, j)\) th entry is \(\mathbf{u}_{n}^{T} \mathbf{u}_{n}\)
= \(\delta_{i j}\)
So we have = \(A^{T} A=\left[\delta_{i j}\right]=I\)
Therefore, \(A^{T} = A^{-1}\)

Orthogonal matrix

A square real matrix \(Q\) is called orthogonal if \(Q^{T}=Q^{-1}\).

A square matrix \(U\) is called unitary if \(U^{*}=U^{-1}\).

Example

Show that the matrix \(R(\theta)=\left[\begin{array}{rr}\cos \theta & -\sin \theta \\ \sin \theta & \cos \theta\end{array}\right]\) is orthogonal.

. . .

\[ \begin{aligned} R(\theta)^{T} R(\theta) & =\left(\left[\begin{array}{cc} \cos \theta & -\sin \theta \\ \sin \theta & \cos \theta \end{array}\right]\right)^{T}\left[\begin{array}{cc} \cos \theta & -\sin \theta \\ \sin \theta & \cos \theta \end{array}\right] \\ & =\left[\begin{array}{cc} \cos \theta \sin \theta \\ -\sin \theta \cos \theta \end{array}\right]\left[\begin{array}{cc} \cos \theta-\sin \theta \\ \sin \theta & \cos \theta \end{array}\right] \\ & =\left[\begin{array}{cc} \cos ^{2} \theta+\sin ^{2} \theta & \cos \theta \sin \theta-\sin \theta \cos \theta \\ -\cos \theta \sin \theta+\sin \theta \cos \theta & \sin ^{2} \theta+\cos ^{2} \theta \end{array}\right]=\left[\begin{array}{ll} 1 & 0 \\ 0 & 1 \end{array}\right], \end{aligned} \]

Rigidity of orthogonal (and unitary) matrices

This rotation matrix preserves vector lengths and angles between vectors (illustrate by hand).

Thus, \(R(\theta)\mathbf{x}\cdot R(\theta)\mathbf{y}=\mathbf{x} \cdot \mathbf{y}\)

. . .

This is true in general for orthogonal matrices:

\[ Q \mathbf{x} \cdot Q \mathbf{y}=(Q \mathbf{x})^{T} Q \mathbf{y}=\mathbf{x}^{T} Q^{T} Q \mathbf{y}=\mathbf{x}^{T} \mathbf{y}=\mathbf{x} \cdot \mathbf{y} \]

also,

\[ \|Q \mathbf{x}\|^{2}=Q \mathbf{x} \cdot Q \mathbf{x}=(Q \mathbf{x})^{T} Q \mathbf{x}=\mathbf{x}^{T} Q^{T} Q \mathbf{x}=\mathbf{x}^{T} \mathbf{x}=\|\mathbf{x}\|^{2} \]

Finding orthogonal bases

Gram-Schmidt Algorithm

Let \(\mathbf{w}_{1}, \mathbf{w}_{2}, \ldots, \mathbf{w}_{n}\) be linearly independent vectors in a standard space.

. . .

Define vectors \(\mathbf{v}_{1}, \mathbf{v}_{2}, \ldots, \mathbf{v}_{n}\) recursively:

Take each \(\mathbf{w}_{k}\) and subtract off the projection of \(\mathbf{w}_{k}\) onto each of the previous vectors \(\mathbf{v}_{1}, \mathbf{v}_{2}, \ldots, \mathbf{v}_{k-1}\)

. . .

\[ \mathbf{v}_{k}=\mathbf{w}_{k}-\frac{\mathbf{v}_{1} \cdot \mathbf{w}_{k}}{\mathbf{v}_{1} \cdot \mathbf{v}_{1}} \mathbf{v}_{1}-\frac{\mathbf{v}_{2} \cdot \mathbf{w}_{k}}{\mathbf{v}_{2} \cdot \mathbf{v}_{2}} \mathbf{v}_{2}-\cdots-\frac{\mathbf{v}_{k-1} \cdot \mathbf{w}_{k}}{\mathbf{v}_{k-1} \cdot \mathbf{v}_{k-1}} \mathbf{v}_{k-1}, \quad k=1, \ldots, n \]

. . .

Then

The vectors \(\mathbf{v}_{1}, \mathbf{v}_{2}, \ldots, \mathbf{v}_{k}\) form an orthogonal set.
For each index \(k=1, \ldots, n\),

. . .

\[ \operatorname{span}\left\{\mathbf{w}_{1}, \mathbf{w}_{2}, \ldots, \mathbf{w}_{k}\right\}=\operatorname{span}\left\{\mathbf{v}_{1}, \mathbf{v}_{2}, \ldots, \mathbf{v}_{k}\right\} \]

Example

Let \(V=\mathcal{C}(A)\) with the standard inner product and compute an orthonormal basis of \(V\), where

\[ A=\left[\begin{array}{rrrr} 1 & 2 & 0 & -1 \\ 1 & -1 & 3 & 2 \\ 1 & -1 & 3 & 2 \\ -1 & 1 & -3 & 1 \end{array}\right] \]

. . .

RREF of \(A\) is:

\[ R=\left[\begin{array}{rrrr} 1 & 0 & 2 & 0 \\ 0 & 1 & -1 & 0 \\ 0 & 0 & 0 & 1 \\ 0 & 0 & 0 & 0 \end{array}\right] \]

. . .

The independent columns are columns 1, 2, and 4.

. . .

So let these be our \(\mathbf{w}_{1}, \mathbf{w}_{2}, \mathbf{w}_{3}\).

Step 1: \(\mathbf{v}_{1}=\mathbf{w}_{1}=(1,1,1,-1)\)

. . .

Step 2: \(\mathbf{v}_{2}=\mathbf{w}_{2}-\frac{\mathbf{v}_{1} \cdot \mathbf{w}_{2}}{\mathbf{v}_{1} \cdot \mathbf{v}_{1}} \mathbf{v}_{1}\) . . .

\(=(2,-1,-1,1)-\frac{-1}{4}(1,1,1,-1)=\frac{1}{4}(9,-3,-3,3)\)

. . .

Step 3: \(\mathbf{v}_{3}=\mathbf{w}_{3}-\frac{\mathbf{v}_{1} \cdot \mathbf{w}_{3}}{\mathbf{v}_{1} \cdot \mathbf{v}_{1}} \mathbf{v}_{1}-\frac{\mathbf{v}_{2} \cdot \mathbf{w}_{3}}{\mathbf{v}_{2} \cdot \mathbf{v}_{2}} \mathbf{v}_{2}\)

. . .

\(=(-1,2,2,1)-\frac{2}{4}(1,1,1,-1)-\frac{-18}{108}(9,-3,-3,3)=(0,1,1,2)\)

Finding orthonormal basis

Now we need to normalize these vectors to get an orthonormal basis.

. . .

\(\mathbf{u}_{1}=\frac{1}{\|\mathbf{v}_{1}\|} \mathbf{v}_{1}=\frac{1}{2}(1,1,1,-1)\)

. . .

\(\mathbf{u}_{2}=\frac{1}{\|\mathbf{v}_{2}\|} \mathbf{v}_{2}=\frac{1}{\sqrt(108)}(9,-3,-3,3)\)

etc.

QR Factorization

If \(A\) is an \(m \times n\) full-column-rank matrix, then \(A=Q R\), where the columns of the \(m \times n\) matrix \(Q\) are orthonormal vectors and the \(n \times n\) matrix \(R\) is upper triangular with nonzero diagonal entries.

. . .

Why do we care?

Can use to solve linear systems:

. . .

can be less susceptible to round-off error than Gauss-Jordan.
Can be used to solve least squares problems (will show you how!)

How to find \(Q\) and \(R\)

Start with the columns of A, \(A=\left[\mathbf{w}_{1}, \mathbf{w}_{2}, \mathbf{w}_{3}\right]\). (For now assume they are linearly independent.)
Do Gram-Schmidt on the columns of \(A\):

. . .

\[ \begin{aligned} & \mathbf{v}_{1}=\mathbf{w}_{1} \\ & \mathbf{v}_{2}=\mathbf{w}_{2}-\frac{\mathbf{v}_{1} \cdot \mathbf{w}_{2}}{\mathbf{v}_{1} \cdot \mathbf{v}_{1}} \mathbf{v}_{1} \\ & \mathbf{v}_{3}=\mathbf{w}_{3}-\frac{\mathbf{v}_{1} \cdot \mathbf{w}_{3}}{\mathbf{v}_{1} \cdot \mathbf{v}_{1}} \mathbf{v}_{1}-\frac{\mathbf{v}_{2} \cdot \mathbf{w}_{3}}{\mathbf{v}_{2} \cdot \mathbf{v}_{2}} \mathbf{v}_{2} . \end{aligned} \]

. . .

Solve these equations for \(\mathbf{w}_{1}, \mathbf{w}_{2}, \mathbf{w}_{3}\):

. . .

\[ \begin{aligned} & \mathbf{w}_{1}=\mathbf{v}_{1} \\ & \mathbf{w}_{2}=\frac{\mathbf{v}_{1} \cdot \mathbf{w}_{2}}{\mathbf{v}_{1} \cdot \mathbf{v}_{1}} \mathbf{v}_{1}+\mathbf{v}_{2} \\ & \mathbf{w}_{3}=\frac{\mathbf{v}_{1} \cdot \mathbf{w}_{3}}{\mathbf{v}_{1} \cdot \mathbf{v}_{1}} \mathbf{v}_{1}+\frac{\mathbf{v}_{2} \cdot \mathbf{w}_{3}}{\mathbf{v}_{2} \cdot \mathbf{v}_{2}} \mathbf{v}_{2}+\mathbf{v}_{3} . \end{aligned} \]

In matrix form, these become:

. . .

\[ A=\left[\mathbf{w}_{1}, \mathbf{w}_{2}, \mathbf{w}_{3}\right]=\left[\mathbf{v}_{1}, \mathbf{v}_{2}, \mathbf{v}_{3}\right]\left[\begin{array}{ccc} 1 & \frac{\mathbf{v}_{1} \cdot \mathbf{w}_{2}}{\mathbf{v}_{1} \cdot \mathbf{v}_{1}} & \frac{\mathbf{v}_{1} \cdot \mathbf{w}_{3}}{\mathbf{v}_{1} \cdot \mathbf{v}_{1}} \\ 0 & 1 & \frac{\mathbf{v}_{2} \cdot \mathbf{w}_{3}}{\mathbf{v}_{2} \cdot \mathbf{v}_{2}} \\ 0 & 0 & 1 \end{array}\right] \]

Normalizing \(Q\)

We can normalize the columns of \(Q\) by dividing each column by its length.

. . .

Set \(\mathbf{q}_{j}=\mathbf{v}_{j} /\left\|\mathbf{v}_{j}\right\|\)..

. . .

\[ \begin{aligned} A & =\left[\mathbf{q}_{1}, \mathbf{q}_{2}, \mathbf{q}_{3}\right]\left[\begin{array}{ccc} \left\|\mathbf{v}_{1}\right\| & 0 & 0 \\ 0 & \left\|\mathbf{v}_{2}\right\| & 0 \\ 0 & 0 & \left\|\mathbf{v}_{3}\right\| \end{array}\right]\left[\begin{array}{ccc} 1 & \frac{\mathbf{v}_{1} \cdot \mathbf{w}_{2}}{\mathbf{v}_{1} \cdot \mathbf{v}_{1}} & \frac{\mathbf{v}_{1} \cdot \mathbf{w}_{3}}{\mathbf{v}_{1} \cdot \mathbf{v}_{1}} \\ 0 & 1 & \frac{\mathbf{v}_{2} \cdot \mathbf{w}_{3}}{\mathbf{v}_{2}, \mathbf{v}_{2}} \\ 0 & 0 & 1 \end{array}\right] \end{aligned} \]

. . .

\[ \begin{aligned} & =\left[\mathbf{q}_{1}, \mathbf{q}_{2}, \mathbf{q}_{3}\right]\left[\begin{array}{ccc} \left\|\mathbf{v}_{1}\right\| & \frac{\mathbf{v}_{1} \cdot \mathbf{w}_{2}}{\left\|\mathbf{v}_{1}\right\|} & \frac{\mathbf{v}_{1} \cdot \mathbf{w}_{3}}{\left\|\mathbf{v}_{1}\right\|} \\ 0 & \left\|\mathbf{v}_{2}\right\| & \frac{\mathbf{v}_{2} \cdot \mathbf{w}_{3}}{\left\|\mathbf{v}_{2}\right\|} \\ 0 & 0 & \left\|\mathbf{v}_{3}\right\| \end{array}\right] . \end{aligned} \]

Final form

\[ \begin{aligned} A & =\left[\mathbf{q}_{1}, \mathbf{q}_{2}, \mathbf{q}_{3}\right]\left[\begin{array}{ccc} \left\|\mathbf{v}_{1}\right\| & \frac{\mathbf{v}_{1} \cdot \mathbf{w}_{2}}{\left\|\mathbf{v}_{1}\right\|} & \frac{\mathbf{v}_{1} \cdot \mathbf{w}_{3}}{\left\|\mathbf{v}_{1}\right\|} \\ 0 & \left\|\mathbf{v}_{2}\right\| & \frac{\mathbf{v}_{2} \cdot \mathbf{w}_{3}}{\left\|\mathbf{v}_{2}\right\|} \\ 0 & 0 & \left\|\mathbf{v}_{3}\right\| \end{array}\right] \\[10pt] &= \left[\mathbf{q}_{1}, \mathbf{q}_{2}, \mathbf{q}_{3}\right]\left[\begin{array}{ccc} \left\|\mathbf{v}_{1}\right\| & \mathbf{q}_{1} \cdot \mathbf{w}_{2} & \mathbf{q}_{1} \cdot \mathbf{w}_{3} \\ 0 & \left\|\mathbf{v}_{2}\right\| & \mathbf{q}_{2} \cdot \mathbf{w}_{3} \\ 0 & 0 & \left\|\mathbf{v}_{3}\right\| \end{array}\right] \end{aligned} \]

Example

Find the QR factorization of the matrix

\[ A=\left[\begin{array}{rrr} 1 & 2 & -1 \\ 1 & -1 & 2 \\ 1 & -1 & 2 \\ -1 & 1 & 1 \end{array}\right] \]

. . .

We already found the Gram-Schmidt orthogonalization of this matrix:

\[ \begin{aligned} \mathbf{v}_{1} & =\mathbf{w}_{1}=(1,1,1,-1), \\ \mathbf{v}_{2} & =\mathbf{w}_{2}-\frac{\mathbf{v}_{1} \cdot \mathbf{w}_{2}}{\mathbf{v}_{1} \cdot \mathbf{v}_{1}} \mathbf{v}_{1} \\ & =(2,-1,-1,1)-\frac{-1}{4}(1,1,1,-1)=\frac{1}{4}(9,-3,-3,3), \\ \mathbf{v}_{3} & =\mathbf{w}_{3}-\frac{\mathbf{v}_{1} \cdot \mathbf{w}_{3}}{\mathbf{v}_{1} \cdot \mathbf{v}_{1}} \mathbf{v}_{1}-\frac{\mathbf{v}_{2} \cdot \mathbf{w}_{3}}{\mathbf{v}_{2} \cdot \mathbf{v}_{2}} \mathbf{v}_{2} \\ & =(-1,2,2,1)-\frac{2}{4}(1,1,1,-1)-\frac{-18}{108}(9,-3,-3,3) \\ & =\frac{1}{4}(-4,8,8,4)-\frac{1}{4}(2,2,2,-2)+\frac{1}{4}(6,-2,-2,2)=(0,1,1,2) . \end{aligned} \]

. . .

\[ \begin{aligned} & \mathbf{u}_{1}=\frac{\mathbf{v}_{1}}{\left\|\mathbf{v}_{1}\right\|}=\frac{1}{2}(1,1,1,-1), \\ & \mathbf{u}_{2}=\frac{\mathbf{v}_{2}}{\left\|\mathbf{v}_{2}\right\|}=\frac{1}{\sqrt{108}}(9,-3,-3,3)=\frac{1}{2 \sqrt{3}}(3,-1,-1,1), \\ & \mathbf{u}_{3}=\frac{\mathbf{v}_{3}}{\left\|\mathbf{v}_{3}\right\|}=\frac{1}{\sqrt{6}}(0,1,1,2) . \end{aligned} \]

. . .

The \(\mathbf{u}\) vectors we calculated before are the \(\mathbf{q}\) vectors in the QR factorization.

\[ \begin{gathered} \left\|\mathbf{v}_{1}\right\|=\|(1,1,1,-1)\|=2 \text { and } \mathbf{q}_{1}=\frac{1}{2}(1,1,1,-1) \\ \left\|\mathbf{v}_{2}\right\|=\left\|\frac{1}{4}(9,-3,-3,3)\right\|=\frac{3}{2} \sqrt{3} \text { and } \mathbf{q}_{2}=\frac{1}{2 \sqrt{3}}(3,-1,-1,1) \\ \left\|\mathbf{v}_{3}\right\|=\|(0,1,1,2)\|=\sqrt{6} \text { and } \mathbf{q}_{3}=\frac{1}{\sqrt{6}}(0,1,1,2) \end{gathered} \]

. . .

\[ \begin{aligned} \left\langle\mathbf{q}_{1}, \mathbf{w}_{2}\right\rangle & =\frac{1}{2}(1,1,1,-1) \cdot(2,-1,-1,1)=-\frac{1}{2} \\ \left\langle\mathbf{q}_{1}, \mathbf{w}_{3}\right\rangle & =\frac{1}{2}(1,1,1,-1) \cdot(-1,2,2,1)=1 \\ \left\langle\mathbf{q}_{2}, \mathbf{w}_{3}\right\rangle & =\frac{1}{2 \sqrt{3}}(3,-1,-1,1) \cdot(-1,2,2,1)=-\sqrt{3} . \end{aligned} \]

. . .

\[ A=\left[\begin{array}{rrr} 1 / 2 & 3 /(2 \sqrt{3}) & 0 \\ 1 / 2 & -1 /(2 \sqrt{3}) & 1 / \sqrt{6} \\ 1 / 2 & -1 /(2 \sqrt{3}) & 1 / \sqrt{6} \\ -1 / 2 & 1 /(2 \sqrt{3}) & 2 / \sqrt{6} \end{array}\right]\left[\begin{array}{rrr} 2 & -1 / 2 & 1 \\ 0 & \frac{3}{2} \sqrt{3} & -\sqrt{3} \\ 0 & 0 & \sqrt{6} \end{array}\right]=Q R \]

Solving a linear system with QR

We would like to solve the system \(A \mathbf{x}=\mathbf{b}\).

. . .

\(Q R \mathbf{x}=\mathbf{b}\).

. . .

Multiply both sides by \(Q^{T}\):

Since \(Q\) is orthogonal, \(Q^{T} Q=I\) and \(Q^{T} Q R \mathbf{x}=R \mathbf{x}=Q^{T} \mathbf{b}\).

. . .

Suppose we have \(\mathbf{b}=(1,1,1,1)\). Then we are trying to solve

\[\left[\begin{array}{rrr} 2 & -1 / 2 & 1 \\ 0 & \frac{3}{2} \sqrt{3} & -\sqrt{3} \\ 0 & 0 & \sqrt{6} \end{array}\right]\mathbf{x}=\left[\begin{array}{rrr} 1 / 2 & 3 /(2 \sqrt{3}) & 0 \\ 1 / 2 & -1 /(2 \sqrt{3}) & 1 / \sqrt{6} \\ 1 / 2 & -1 /(2 \sqrt{3}) & 1 / \sqrt{6} \\ -1 / 2 & 1 /(2 \sqrt{3}) & 2 / \sqrt{6} \end{array}\right]^T \left[\begin{array}{rrr}1 \\ 1 \\ 1 \\ 1 \end{array}\right] \]

import sympy as sp
Q = sp.Matrix([[1/2, 3/(2*sp.sqrt(3)), 0], [1/2, -1/(2*sp.sqrt(3)), 1/sp.sqrt(6)], [1/2, -1/(2*sp.sqrt(3)), 1/sp.sqrt(6)], [-1/2, 1/(2*sp.sqrt(3)), 2/sp.sqrt(6)]])
R = sp.Matrix([[2, -1/2, 1], [0, 3*sp.sqrt(3)/2, -sp.sqrt(3)], [0, 0, sp.sqrt(6)]])
b = sp.Matrix([1, 1, 1, 1])
print('$$\n Q^T b ='+sp.latex(Q.T*b)+'\n$$')
print('$$\n '+sp.latex(R)+' \mathbf{x} ='+sp.latex(Q.T*b)+'\n$$')

\[ Q^T b =\left[\begin{matrix}1.0\\\frac{\sqrt{3}}{3}\\\frac{2 \sqrt{6}}{3}\end{matrix}\right] \] \[ \left[\begin{matrix}2 & -0.5 & 1\\0 & \frac{3 \sqrt{3}}{2} & - \sqrt{3}\\0 & 0 & \sqrt{6}\end{matrix}\right] \mathbf{x} =\left[\begin{matrix}1.0\\\frac{\sqrt{3}}{3}\\\frac{2 \sqrt{6}}{3}\end{matrix}\right] \]

. . .

Find \[ \mathbf{x}=\left[\begin{array}{l} 1 / 3 \\ 2 / 3 \\ 2 / 3 \end{array}\right] \]

Check:

\[ \mathbf{r}=\mathbf{b}-A \mathbf{x}=\left[\begin{array}{l} 1 \\ 1 \\ 1 \\ 1 \end{array}\right]-\left[\begin{array}{rrr} 1 & 2 & -1 \\ 1 & -1 & 2 \\ 1 & -1 & 2 \\ -1 & 1 & 1 \end{array}\right]\left[\begin{array}{l} 1 / 3 \\ 2 / 3 \\ 2 / 3 \end{array}\right]=\left[\begin{array}{l} 0 \\ 0 \\ 0 \\ 0 \end{array}\right] . \]

Least Squares with QR factorization

What if we have a system that’s not consistent?

. . .

pause

. . .

The QR factorization can be used to solve the least squares problem \(A \mathbf{x} \approx \mathbf{b}\).

Haar Wavelet Transform

Image compression idea

Haaar wavelet transform

In 1D, N data points \(\left\{x_{k}\right\}_{k=1}^{N}\)

. . .

Can average terms:

\[ \begin{equation*} y_{k}=\frac{1}{2}\left(x_{k}+x_{k-1}\right), \quad k \in \mathbb{Z} \end{equation*} \]

. . .

At the same time, we can apply a difference filter (“unsmoothing”):

\[ \begin{equation*} z_{k}=\frac{1}{2}\left(x_{k}-x_{k-1}\right), \quad k=1, \ldots, N \end{equation*} \]

. . .

Idea: take our data, apply both filters to it and keep the results.

Example

\(g(t)=\frac{3}{2}+\cos \left(\frac{\pi}{4} t\right)-\frac{1}{4} \cos \left(\frac{7 \pi}{4} t\right), 0 \leq t \leq 8\)

. . .

Sample at \(t_{k}=k / 5, k=0,1, \ldots, 40\)

. . .

\[ \begin{array}{rlll} \frac{1}{2}\left(x_{1}+x_{2}\right)=y_{2} & \frac{1}{2}\left(x_{3}+x_{4}\right)=y_{4} & \frac{1}{2}\left(x_{5}+x_{6}\right)=y_{6} \\ \frac{1}{2}\left(-x_{1}+x_{2}\right)=z_{2} & \frac{1}{2}\left(-x_{3}+x_{4}\right)=z_{4} & \frac{1}{2}\left(-x_{5}+x_{6}\right)=z_{6} \end{array} \]

. . .

\[ A_{6} \mathbf{x} \equiv \frac{1}{2}\left[\begin{array}{rrrrrr} 1 & 1 & 0 & 0 & 0 & 0 \\ 0 & 0 & 1 & 1 & 0 & 0 \\ 0 & 0 & 0 & 0 & 1 & 1 \\ -1 & 1 & 0 & 0 & 0 & 0 \\ 0 & 0 & -1 & 1 & 0 & 0 \\ 0 & 0 & 0 & 0 & -1 & 1 \end{array}\right]\left[\begin{array}{l} x_{1} \\ x_{2} \\ x_{3} \\ x_{4} \\ x_{5} \\ x_{6} \end{array}\right]=\left[\begin{array}{l} y_{2} \\ y_{4} \\ y_{6} \\ z_{2} \\ z_{4} \\ z_{6} \end{array}\right] \]

Almost orthogonal

\[ \begin{gather} A_{6} A_{6}^{T}=\frac{1}{4}\left[\begin{array}{rrrrrr} 1 & 1 & 0 & 0 & 0 & 0 \\ 0 & 0 & 1 & 1 & 0 & 0 \\ 0 & 0 & 0 & 0 & 1 & 1 \\ -1 & 1 & 0 & 0 & 0 & 0 \\ 0 & 0 & -1 & 1 & 0 & 0 \\ 0 & 0 & 0 & 0 & -1 & 1 \end{array}\right]\left[\begin{array}{cccccc} 1 & 0 & 0 & -1 & 0 & 0 \\ 1 & 0 & 0 & 1 & 0 & 0 \\ 0 & 1 & 0 & 0 & -1 & 0 \\ 0 & 1 & 0 & 0 & 1 & 0 \\ 0 & 0 & 1 & 0 & 0 & -1 \\ 0 & 0 & 1 & 0 & 0 & 1 \end{array}\right] \\[10pt] =\frac{1}{2}\left[\begin{array}{llllll} 1 & 0 & 0 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 & 0 & 0 \\ 0 & 0 & 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 0 & 0 & 1 \end{array}\right]=\frac{1}{2} I_{6} \end{gather} \]

In general,

\(A_{N} A_{N}^{T}=\frac{1}{2} I_{N}\)

. . .

These matrices are nearly orthogonal.

Haar Wavelet Transform Matrix

Can make them orthogonal by dividing by \(\sqrt{2}\):

\[ \begin{equation*} y_{k}=\frac{\sqrt{2}}{2}\left(x_{k}+x_{k-1}\right), \quad k \in \mathbb{Z} \end{equation*} \]

and

\[ \begin{equation*} z_{k}=\frac{\sqrt{2}}{2}\left(x_{k}-x_{k-1}\right), \quad k \in \mathbb{Z} \end{equation*} \]

. . .

\[ W_{N}=\frac{\sqrt{2}}{2}\left[\begin{array}{ccccccc} 1 & 1 & 0 & 0 & \ldots & 0 & 0 \\ 0 & 0 & 1 & 1 & \ldots & 0 & 0 \\ \vdots & \vdots & \vdots & \vdots & \ddots & \vdots & \vdots \\ 0 & 0 & 0 & 0 & \ldots & 1 & 1 \\ \ldots & \ldots & \ldots & \ldots & \ldots & \ldots \\ -1 & 1 & 0 & 0 & \ldots & 0 & 0 \\ 0 & 0 & -1 & 1 & \ldots & 0 & 0 \\ \vdots & \vdots & \vdots & \vdots & \ddots & \vdots & \vdots \\ 0 & 0 & 0 & 0 & \ldots & -1 & 1 \end{array}\right]=\frac{\sqrt{2}}{2}\left[\begin{array}{c} U \\ \ldots \\ L \end{array}\right] \]

Moving to 2D

For a 2D image represented pixel-by-pixel in the matrix \(\mathbf{A}\), we can apply the 1D transform to each row and then to each column: \(W_{m} A W_{n}^{T}\)

. . .

Result ends up in block form:

\[ W_{m} A W_{n}^{T}=2\left[\begin{array}{ll} B & V \\ H & D \end{array}\right] \]

. . .

. . .

\(B\) represents the blurred image of \(A\), while \(V, H\) and \(D\) represent edges of the image \(A\) along vertical, horizontal and diagonal directions, respectively.

Other Formats

More Least Squares

Normal equations for multiple predictors

Example

Least Squares Solution

Orthogonal and Orthonormal Sets

Orthogonal Coordinates Theorem

Proof

Orthogonal matrix

Orthogonal matrix

Example

Rigidity of orthogonal (and unitary) matrices

Finding orthogonal bases

Gram-Schmidt Algorithm

Example

Finding orthonormal basis

QR Factorization

How to find \(Q\) and \(R\)

Normalizing \(Q\)

Final form

Example

Solving a linear system with QR

Least Squares with QR factorization

Haar Wavelet Transform

Image compression idea

Haaar wavelet transform

Example

Almost orthogonal

Haar Wavelet Transform Matrix

Moving to 2D