Ch5 Lecture 2

Eigenvalues of Symmetric Matrices

Real eigenvalues, orthogonal eigenvectors, and positive definiteness.

Positive Definite Matrices

A matrix \(A\) is called positive definite if \(x^{T} A x>0\) for all nonzero vectors \(x\).

. . .

A symmetric matrix \(K=K^{T}\) is positive definite if and only if all of its eigenvalues are strictly positive.

. . .

Proof: Use the positive definiteness of the matrix to show that the eigenvalues are positive:

If \(\mathbf{x}=\mathbf{v} \neq \mathbf{0}\) is an eigenvector with (necessarily real) eigenvalue \(\lambda\), then

\[ 0<\mathbf{v}^{T} K \mathbf{v}=\mathbf{v}^{T}(\lambda \mathbf{v})=\lambda \mathbf{v}^{T} \mathbf{v}=\lambda\|\mathbf{v}\|^{2} \]

So \(\lambda>0\)

Conversely, suppose \(K\) has all positive eigenvalues.

Let \(\mathbf{u}_{1}, \ldots, \mathbf{u}_{n}\) be the orthonormal eigenvector basis with \(K \mathbf{u}_{j}=\lambda_{j} \mathbf{u}_{j}\) with \(\lambda_{j}>0\).

\[ \mathbf{x}=c_{1} \mathbf{u}_{1}+\cdots+c_{n} \mathbf{u}_{n}, \quad \text { we obtain } \quad K \mathbf{x}=c_{1} \lambda_{1} \mathbf{u}_{1}+\cdots+c_{n} \lambda_{n} \mathbf{u}_{n} \]

. . .

Therefore,

\[ \mathbf{x}^{T} K \mathbf{x}=\left(c_{1} \mathbf{u}_{1}^{T}+\cdots+c_{n} \mathbf{u}_{n}^{T}\right)\left(c_{1} \lambda_{1} \mathbf{u}_{1}+\cdots+c_{n} \lambda_{n} \mathbf{u}_{n}\right)=\lambda_{1} c_{1}^{2}+\cdots+\lambda_{n} c_{n}^{2}>0 \]

Let \(A=A^{T}\) be a real symmetric \(n \times n\) matrix. Then

  1. All the eigenvalues of \(A\) are real.

. . .

  1. Eigenvectors corresponding to distinct eigenvalues are orthogonal.

(b2) If an eigenvalue \(\lambda\) has algebraic multiplicity \(m\), then its eigenspace has dimension \(m\). In particular, there exist \(m\) orthonormal eigenvectors corresponding to \(\lambda\).

  1. There is an orthonormal basis of \(\mathbb{R}^{n}\) consisting of \(n\) eigenvectors of \(A\). In particular, all real symmetric matrices are non-defective and real diagonalizable.

Example

\[ A=\left(\begin{array}{ll}3 & 1 \\ 1 & 3\end{array}\right) \]

. . .

We compute the determinant in the characteristic equation

\[ \operatorname{det}(A-\lambda \mathrm{I})=\operatorname{det}\left(\begin{array}{cc} 3-\lambda & 1 \\ 1 & 3-\lambda \end{array}\right)=(3-\lambda)^{2}-1=\lambda^{2}-6 \lambda+8 \]

. . .

\[ \lambda^{2}-6 \lambda+8=(\lambda-4)(\lambda-2)=0 \]

. . .

Eigenvectors:

For the first eigenvalue, the eigenvector equation is

\[ (A-4 \mathrm{I}) \mathbf{v}=\left(\begin{array}{rr} -1 & 1 \\ 1 & -1 \end{array}\right)\binom{x}{y}=\binom{0}{0}, \quad \text { or } \quad \begin{array}{r} -x+y=0 \\ x-y=0 \end{array} \]

General solution: \[ x=y=a, \quad \text { so } \quad \mathbf{v}=\binom{a}{a}=a\binom{1}{1} \]

. . .

\[ \lambda_{1}=4, \quad \mathbf{v}_{1}=\binom{1}{1}, \quad \lambda_{2}=2, \quad \mathbf{v}_{2}=\binom{-1}{1} \]

. . .

The eigenvectors are orthogonal: \(\mathbf{v}_{1} \cdot \mathbf{v}_{2}=0\)

Proof of part (a)

Let \(A=A^{T}\) be a real symmetric \(n \times n\) matrix. Then

  1. All the eigenvalues of \(A\) are real.

. . .

Suppose \(\lambda\) is a complex eigenvalue with complex eigenvector \(\mathbf{v} \in \mathbb{C}^{n}\).

\[ (A \mathbf{v}) \cdot \mathbf{v}=(\lambda \mathbf{v}) \cdot \mathbf{v}=\lambda\|\mathbf{v}\|^{2} \]

. . .

Now, if \(A\) is real and symmetric,

\[ (A \mathbf{v}) \cdot \mathbf{w}=\left(\mathbf{v}^T A^T \right)\mathbf{w} = \mathbf{v} \cdot(A \mathbf{w}) \quad \text { for all } \quad \mathbf{v}, \mathbf{w} \in \mathbb{C}^{n} \]

Therefore

\[ (A \mathbf{v}) \cdot \mathbf{v}=\mathbf{v} \cdot(A \mathbf{v})=\mathbf{v} \cdot(\lambda \mathbf{v})=\mathbf{v}^{T} \overline{\lambda \mathbf{v}}=\bar{\lambda}\|\mathbf{v}\|^{2} \]

. . .

\(\Rightarrow\), \(\lambda\|\mathbf{v}\|^{2}=\bar{\lambda}\|\mathbf{v}\|^{2}\) \(\Rightarrow\) \(\lambda=\bar{\lambda}\), so \(\lambda\) is real.

Proof of part (b)

Part b: Eigenvectors corresponding to distinct eigenvalues are orthogonal.

Suppose \(A \mathbf{v}=\lambda \mathbf{v}, A \mathbf{w}=\mu \mathbf{w}\), where \(\lambda \neq \mu\) are distinct real eigenvalues.

. . .

\(\lambda \mathbf{v} \cdot \mathbf{w}=(A \mathbf{v}) \cdot \mathbf{w}=\mathbf{v} \cdot(A \mathbf{w})=\mathbf{v} \cdot(\mu \mathbf{w})=\mu \mathbf{v} \cdot \mathbf{w}, \quad\) and hence \(\quad(\lambda-\mu) \mathbf{v} \cdot \mathbf{w}=0\).

. . .

Since \(\lambda \neq \mu\), this implies that \(\mathbf{v} \cdot \mathbf{w}=0\), so the eigenvectors \(\mathbf{v}, \mathbf{w}\) are orthogonal.

Proof of part (c)

Part c: There is an orthonormal basis of \(\mathbb{R}^{n}\) consisting of \(n\) eigenvectors of \(A\).

. . .

If the eigenvalues are distinct, then the eigenvectors are automaticallyorthogonal by part (b).

. . .

If the eigenvalues are repeated, the eigenvectors may not initially be orthogonal. But we can use the Gram-Schmidt process to orthogonalize them. We will end up with a set of orthogonal eigenvectors.

Skills

  • State the spectral theorem for symmetric matrices
  • Prove that symmetric matrices have real eigenvalues
  • Show eigenvectors for distinct eigenvalues are orthogonal
  • Check if a symmetric matrix is positive definite via its eigenvalues

Diagonalization of Symmetric Matrices

The spectral theorem, quadratic forms, and orthogonal diagonalization.

Diagonalizability of Symmetric Matrices: the Spectral Theorem

  • Every real, symmetric matrix admits an eigenvector basis, and hence is diagonalizable.
  • Moreover, since we can choose eigenvectors that form an orthonormal basis, the diagonalizing matrix takes a particularly simple form.
  • Recall that an \(n \times n\) matrix \(Q\) is orthogonal if and only if its columns form an orthonormal basis of \(\mathbb{R}^{n}\).

. . .

Writing our diagonalization from the previous lecture, specifically for the case of a real symmetric matrix \(A\):

. . .

If \(A=A^{T}\) is a real symmetric \(n \times n\) matrix, then there exists an orthogonal matrix \(Q\) and a real diagonal matrix \(\Lambda\) such that

\[ A=Q \Lambda Q^{-1}=Q \Lambda Q^{T} \]

The eigenvalues of \(A\) appear on the diagonal of \(\Lambda\), while the columns of \(Q\) are the corresponding orthonormal eigenvectors.

One example of a useful symmetric matrix: Quadratic Forms

A quadratic form is a homogeneous polynomial of degree 2 in \(n\) variables \(x_{1}, \ldots, x_{n}\). For example, in \(x, y, z\): \(Q(x, y, z)=a x^{2}+b y^{2}+c z^{2}+2 d x y+2 e y z+2 f z x\).

. . .

Every quadratic form can be written in matrix form as \(Q(\mathbf{x})=\mathbf{x}^{T} A \mathbf{x}\).

Example:

\[ Q(x, y, z)=x^{2}+2 y^{2}+z^{2}+2 x y+y z+3 x z . \]

. . .

\[ \begin{aligned} x(x+2 y+3 z)+y(2 y+z)+z^{2} & =\left[\begin{array}{lll} x & y & z \end{array}\right]\left[\begin{array}{c} x+2 y+3 z \\ 2 y+z \\ z \end{array}\right] \\ & =\left[\begin{array}{lll} x & y & z \end{array}\right]\left[\begin{array}{lll} 1 & 2 & 3 \\ 0 & 2 & 1 \\ 0 & 0 & 1 \end{array}\right]\left[\begin{array}{c} x \\ y \\ z \end{array}\right]=\mathbf{x}^{T} A \mathbf{x}, \end{aligned} \]

Now, if we have a quadratic form \(Q(\mathbf{x})=\mathbf{x}^{T} A \mathbf{x}\), we always write this in terms of an equivalent symmetric matrix \(B\) as \(Q(\mathbf{x})=\mathbf{x}^{T} B \mathbf{x}\) where \(B=\frac{1}{2}(A+A^{T})\).

(See Exercise 2.4.34 in your textbook.)

So in this case, we can write \(Q(\mathbf{x})=\mathbf{x}^{T} B \mathbf{x}\) where

\[ B=\frac{1}{2}\left[\begin{array}{lll} 2 & 2 & 3 \\ 2 & 4 & 1 \\ 3 & 1 & 2 \end{array}\right] \]

Check:

import sympy as sp
B = sp.Matrix([[2,2,3],[2,4,1],[3,1,2]])/2
x = sp.Matrix(sp.symbols(['x','y','z']))
(x.T*B*x).expand()

\(\displaystyle \left[\begin{matrix}x^{2} + 2 x y + 3 x z + 2 y^{2} + y z + z^{2}\end{matrix}\right]\)

Yes, this is the same as the original quadratic form.

Diagonalizing Quadratic Forms

Symmetric matrices are diagonalizable \(\Rightarrow\) we can always find a basis in which the quadratic form takes a particularly simple form. Just diagonalize:

. . .

\(Q(\mathbf{x})=\mathbf{x}^{T} B \mathbf{x}=x^{T} P D P^{T} x\) where \(P\) is the matrix of eigenvectors of \(B\) and \(D\) is the diagonal matrix of eigenvalues of \(B\).

. . .

Then if we define new variables \(\mathbf{y}=P^{T} \mathbf{x}\), we have \(Q(\mathbf{x})=\mathbf{y}^{T} D \mathbf{y}\)

. . .

which just becomes a sum of squares:

\[ q(\mathbf{x})=\lambda_{1} y_{1}^{2}+\cdots+\lambda_{n} y_{n}^{2} \]

Example:

Suppose we have the quadratic form \(3x^2+2xy+3y^2\). We can write this in matrix form as \(Q(\mathbf{x})=\mathbf{x}^{T} B \mathbf{x}\) where \(\mathbf{x}=\binom{x_1}{x_2}\) and

\[ B=\frac{1}{2}\left[\begin{array}{ll} 3 & 1 \\ 1 & 3 \end{array}\right] \]

. . .

We diagonalize \(B\):

\[ \left(\begin{array}{ll} 3 & 1 \\ 1 & 3 \end{array}\right)=A=P D P^{T}=\left(\begin{array}{cc} \frac{1}{\sqrt{2}} & -\frac{1}{\sqrt{2}} \\ \frac{1}{\sqrt{2}} & \frac{1}{\sqrt{2}} \end{array}\right)\left(\begin{array}{ll} 4 & 0 \\ 0 & 2 \end{array}\right)\left(\begin{array}{cc} \frac{1}{\sqrt{2}} & \frac{1}{\sqrt{2}} \\ -\frac{1}{\sqrt{2}} & \frac{1}{\sqrt{2}} \end{array}\right) \]

. . .

Now, if we define \(\mathbf{y}=P^{T} \mathbf{x}=\frac{1}{\sqrt{2}}\binom{x_1+x_2}{-x_1+x_2}\), we have \(Q(\mathbf{x})=\mathbf{y}^{T} D \mathbf{y}\), or

\[ q(\mathbf{x})=3 x_{1}^{2}+2 x_{1} x_{2}+3 x_{2}^{2}=4 y_{1}^{2}+2 y_{2}^{2} \]

The numbers aren’t always clean, though!

Q,Lambda = B.diagonalize()
Q

\(\displaystyle \left[\begin{matrix}\frac{- 13248 \cdot \left(1 + \sqrt{3} i\right) \left(53 + 9 \sqrt{1167} i\right)^{\frac{2}{3}} - \sqrt[3]{53 + 9 \sqrt{1167} i} \left(184 + \left(-16 + \left(1 + \sqrt{3} i\right) \sqrt[3]{53 + 9 \sqrt{1167} i}\right) \left(1 + \sqrt{3} i\right) \sqrt[3]{53 + 9 \sqrt{1167} i}\right)^{2} + 72 \left(1 + \sqrt{3} i\right)^{2} \cdot \left(11 - \left(1 + \sqrt{3} i\right) \sqrt[3]{53 + 9 \sqrt{1167} i}\right) \left(53 + 9 \sqrt{1167} i\right)}{792 \left(1 + \sqrt{3} i\right)^{2} \cdot \left(53 + 9 \sqrt{1167} i\right)} & \frac{72 \left(1 - \sqrt{3} i\right)^{3} \cdot \left(11 + \left(-1 + \sqrt{3} i\right) \sqrt[3]{53 + 9 \sqrt{1167} i}\right) \left(53 + 9 \sqrt{1167} i\right) + \left(-1 + \sqrt{3} i\right) \sqrt[3]{53 + 9 \sqrt{1167} i} \left(184 + \left(-16 + \left(1 - \sqrt{3} i\right) \sqrt[3]{53 + 9 \sqrt{1167} i}\right) \left(1 - \sqrt{3} i\right) \sqrt[3]{53 + 9 \sqrt{1167} i}\right)^{2} - 13248 \left(1 - \sqrt{3} i\right)^{2} \left(53 + 9 \sqrt{1167} i\right)^{\frac{2}{3}}}{792 \left(1 - \sqrt{3} i\right)^{3} \cdot \left(53 + 9 \sqrt{1167} i\right)} & \frac{126 \sqrt{1167} - 289 i \left(53 + 9 \sqrt{1167} i\right)^{\frac{2}{3}} - 3 \sqrt{1167} \left(53 + 9 \sqrt{1167} i\right)^{\frac{2}{3}} - 742 i + 352 i \sqrt[3]{53 + 9 \sqrt{1167} i} + 60 \sqrt{1167} \sqrt[3]{53 + 9 \sqrt{1167} i}}{66 \cdot \left(9 \sqrt{1167} - 53 i\right)}\\\frac{28 \left(-22 + \left(1 + \sqrt{3} i\right) \sqrt[3]{53 + 9 \sqrt{1167} i}\right) \left(1 + \sqrt{3} i\right)^{2} \cdot \left(53 + 9 \sqrt{1167} i\right) + \sqrt[3]{53 + 9 \sqrt{1167} i} \left(184 + \left(-16 + \left(1 + \sqrt{3} i\right) \sqrt[3]{53 + 9 \sqrt{1167} i}\right) \left(1 + \sqrt{3} i\right) \sqrt[3]{53 + 9 \sqrt{1167} i}\right)^{2} + 5152 \cdot \left(1 + \sqrt{3} i\right) \left(53 + 9 \sqrt{1167} i\right)^{\frac{2}{3}}}{264 \left(1 + \sqrt{3} i\right)^{2} \cdot \left(53 + 9 \sqrt{1167} i\right)} & \frac{5152 \cdot \left(1 - \sqrt{3} i\right) \left(53 + 9 \sqrt{1167} i\right)^{\frac{2}{3}} + \sqrt[3]{53 + 9 \sqrt{1167} i} \left(184 + \left(-16 + \left(1 - \sqrt{3} i\right) \sqrt[3]{53 + 9 \sqrt{1167} i}\right) \left(1 - \sqrt{3} i\right) \sqrt[3]{53 + 9 \sqrt{1167} i}\right)^{2} + 28 \left(-22 + \left(1 - \sqrt{3} i\right) \sqrt[3]{53 + 9 \sqrt{1167} i}\right) \left(1 - \sqrt{3} i\right)^{2} \cdot \left(53 + 9 \sqrt{1167} i\right)}{264 \left(1 - \sqrt{3} i\right)^{2} \cdot \left(53 + 9 \sqrt{1167} i\right)} & \frac{18 \sqrt{1167} - 2222 i \sqrt[3]{53 + 9 \sqrt{1167} i} - 145 i \left(53 + 9 \sqrt{1167} i\right)^{\frac{2}{3}} - 106 i + 18 \sqrt{1167} \sqrt[3]{53 + 9 \sqrt{1167} i} + 9 \sqrt{1167} \left(53 + 9 \sqrt{1167} i\right)^{\frac{2}{3}}}{66 \cdot \left(9 \sqrt{1167} - 53 i\right)}\\1 & 1 & 1\end{matrix}\right]\)

Lambda

\(\displaystyle \left[\begin{matrix}\frac{4}{3} + \left(- \frac{1}{2} - \frac{\sqrt{3} i}{2}\right) \sqrt[3]{\frac{53}{216} + \frac{\sqrt{1167} i}{24}} + \frac{23}{18 \left(- \frac{1}{2} - \frac{\sqrt{3} i}{2}\right) \sqrt[3]{\frac{53}{216} + \frac{\sqrt{1167} i}{24}}} & 0 & 0\\0 & \frac{4}{3} + \frac{23}{18 \left(- \frac{1}{2} + \frac{\sqrt{3} i}{2}\right) \sqrt[3]{\frac{53}{216} + \frac{\sqrt{1167} i}{24}}} + \left(- \frac{1}{2} + \frac{\sqrt{3} i}{2}\right) \sqrt[3]{\frac{53}{216} + \frac{\sqrt{1167} i}{24}} & 0\\0 & 0 & \frac{4}{3} + \frac{23}{18 \sqrt[3]{\frac{53}{216} + \frac{\sqrt{1167} i}{24}}} + \sqrt[3]{\frac{53}{216} + \frac{\sqrt{1167} i}{24}}\end{matrix}\right]\)

Yuck!

We can visualize the previous example as a rotation of the axes (a change of basis) to a new coordinate system where the quadratic form is just a sum of squares.

import numpy as np
import matplotlib.pyplot as plt
x = np.linspace(-2,2,100)
y = np.linspace(-2,2,100)
X,Y = np.meshgrid(x,y)
Z = 3*X**2+2*X*Y+3*Y**2
plt.contour(X,Y,Z,levels=[1,2,3,4,5,6,7,8,9,10])
plt.xlabel('x1')
plt.ylabel('x2')
plt.axis('equal')
# plot the vector P.T times (1,0) and (0,1)
P = np.array([[1/np.sqrt(2),1/np.sqrt(2)],[-1/np.sqrt(2),1/np.sqrt(2)]])
v1 = P.T @ np.array([1,0])
v2 = P.T @ np.array([0,1])
plt.quiver(0,0,v1[0],v1[1],angles='xy',scale_units='xy',scale=1,color='r')
plt.quiver(0,0,v2[0],v2[1],angles='xy',scale_units='xy',scale=1,color='r')
# label the two quivers ("x1=1, x2=0" and "x1=0, x2=1")
plt.text(v1[0],v1[1],'y(x1=1,x2=0)',fontsize=12)
plt.text(v2[0],v2[1],'y(x1=0,x2=1)',fontsize=12)
plt.show()

Now make the same plot but in y coordinates:

x = np.linspace(-2,2,100)
y = np.linspace(-2,2,100)
X,Y = np.meshgrid(x,y)
Z = 4*X**2+2*Y**2
plt.contour(X,Y,Z,levels=[1,2,3,4,5,6,7,8,9,10])
plt.xlabel('y1')
plt.ylabel('y2')
plt.axis('equal')
v1 = P.T @ np.array([1,0])
v2 = P.T @ np.array([0,1])
plt.quiver(0,0,1,0,angles='xy',scale_units='xy',scale=1,color='r')
plt.quiver(0,0,0,1,angles='xy',scale_units='xy',scale=1,color='r')
# label the two quivers ("x1=1, x2=0" and "x1=0, x2=1")
plt.show()

In general, we can think of the diagonalization as a rotation of the axes followed by a scaling of the axes.

We often visualize this by plotting the effects of the transformations on the unit circle.

Skills

  • State the spectral theorem in the form \(A=Q\Lambda Q^T\)
  • Write a quadratic form as \(\mathbf{x}^T B \mathbf{x}\) with symmetric \(B\)
  • Diagonalize a quadratic form to a sum of squares
  • Interpret the diagonalization geometrically as a rotation of axes

Coupled Springs

Example: Coupled Springs (Physics)

Two identical masses \(m\) connected by springs: wall—\(k\)—mass 1—\(k_c\)—mass 2—\(k\)—wall. Let \(x_1\), \(x_2\) be displacements from equilibrium.

Figure 1: Coupled springs: two masses, three springs.

. . .

We can define \(\mathbf{x}=\begin{bmatrix} x_1 \\ x_2 \end{bmatrix}\) to be the vector of displacements.

Let’s take a moment to think intuitively about what will happen here if we pull one one of the masses.

. . .

It’s complicated, right? Moving one mass will change its own potential energy but also the potential energy of the other mass.

. . .

But we can use diagonalization to help us find a new set of coordinates that will make the problem simpler.

Coupled Springs via Newton’s Law

We begin with Newton’s second law for the two masses. Each spring provides a force proportional to how far it’s stretched, and acceleration is proportional to force.

The middle spring stretches by \((x_2-x_1)\), so the force on mass 1 from the middle spring is \(k_c(x_2-x_1)\).

. . .

\[ m \ddot{x}_1 = -k x_1 + k_c (x_2-x_1) = -(k+k_c)x_1 + k_c x_2, \]

. . .

\[ m \ddot{x}_2 = -k x_2 + k_c (x_1-x_2) = -(k+k_c)x_2 + k_c x_1. \]

. . .

We see the relationship between the two masses in the fact that \(x_1\) and \(x_2\) appear in each equation.

Matrix Form

Remember, we defined \(\mathbf{x}=\begin{bmatrix} x_1 \\ x_2 \end{bmatrix}\) to be the vector of displacements.

. . .

Then if we define \(K = \begin{bmatrix} k+k_c & -k_c \\ -k_c & k+k_c \end{bmatrix}.\),

the system can be written compactly as

\[ m \ddot{\mathbf{x}} = -K \mathbf{x} \]

. . .

Because \(K\) is not diagonal, each acceleration depends on both \(x_1\) and \(x_2\). This is what makes the system coupled.

Diagonalizing the System

Suppose we diagonalize \(K\):

\[ K = P D P^T, \]

where \(P\) is orthogonal with columns that are the normalized eigenvectors of \(K\), and

\[ D = \begin{bmatrix} \lambda_1 & 0 \\ 0 & \lambda_2 \end{bmatrix}. \]

. . .

As it happens, for our system, we get

\[ P = \frac{1}{\sqrt{2}}\begin{bmatrix} 1 & -1 \\ 1 & 1 \end{bmatrix} \]

with eigenvectors \(\lambda_1=k+2k_c\) and \(\lambda_2=k\).

Using the diagonalized form to define new coordinates

We want to use this to make the behavior of the system more understandable.

. . .

Remember, Newton’s Law was

\[ m \ddot{\mathbf{x}} = -K \mathbf{x} \]

. . .

Define new coordinates

\[ \mathbf{y} = P^T \mathbf{x} \]

. . .

Each coordinate \(y_i\) is the projection of the positions \(\mathbf{x}\) onto the eigenvector \(\mathbf{v}_i\). . . .

\[ \mathbf{y} = P^T \mathbf{x} = \frac{1}{\sqrt{2}}\begin{bmatrix} 1 & -1 \\ 1 & 1 \end{bmatrix}\begin{bmatrix} x_1 \\ x_2 \end{bmatrix} \]

We can multiply to see what these new coordinates correspond to. \(y_1\) is the product of the first column of \(P\) with \(\mathbf{x}\):

\[ y_1 = \frac{1}{\sqrt{2}}(x_1-x_2) \]

This represents the relative displacement between the masses.

. . .

\(y_2\) is the product of the second column of \(P\) with \(\mathbf{x}\):

\[ y_2 = \frac{1}{\sqrt{2}}(x_1+x_2) \]

This represents the center-of-mass position.

. . .

The eigenvectors describe special patterns of motion (called normal modes), and the evolution of the variables \(y_1, y_2\) measure how strongly the system is moving in each of those patterns.

Let’s express the laws of motion in these new coordinates.

\[ \mathbf{y} = P^T \mathbf{x}. \]

. . .

Differentiating twice gives

\[ \ddot{\mathbf{y}} = P^T \ddot{\mathbf{x}}. \]

. . .

Multiplying by \(m\) and applying Newton’s law, \(m \ddot{\mathbf{x}} = -K \mathbf{x}\), gives:

\[ m \ddot{\mathbf{y}} = m P^T \ddot{\mathbf{x}} = - P^T K \mathbf{x}. \]

. . .

Using \(K = P D P^T\):

\[ m \ddot{\mathbf{y}} = - P^T (P D P^T) \mathbf{x} = - D P^T \mathbf{x} = - D \mathbf{y}. \]


Decoupled Equations

\[ m \ddot{\mathbf{y}} = - D \mathbf{y}. \]

. . .

Because \(D\) is diagonal, this represents two separate equations – each coordinate only interacts with itself!

. . .

\[ m \ddot{y}_1 + \lambda_1 y_1 = m \ddot{y}_1 0, \]

. . .

\[ m \ddot{y}_2 + \lambda_2 y_2 = 0. \]

. . .

These are two independent harmonic oscillators!

Motion in the Center-of-Mass Mode

\[ m \ddot{y}_2 + \lambda_2 y_2 = 0; \]

\[ y_2 = \frac{1}{\sqrt{2}}(x_1+x_2) \]

The center of mass of the system, which is proportional to \(y_2\), will oscillate back and forth with frequency \(\sqrt{\lambda_2/m} = \sqrt{k/m}\). This is one of the patterns of motion, defined by one of the eigenvectors of \(K\).

Motion in the Relative Displacement Mode

\[ m \ddot{y}_1 + \lambda_1 y_1 = m \ddot{y}_1 0, \]

\[ y_1 = \frac{1}{\sqrt{2}}(x_1-x_2) \]

At the same time, the distance between the two masses, which is proportional to \(y_1\), will stretch and compress with frequency \(\sqrt{\lambda_1/m}=\sqrt{(k+2k_c)/m}\).

Interpretation

Diagonalizing \(K\) finds patterns of motion which evolve independently.

The eigenvectors describe the shapes of motion in which the system naturally prefers to oscillate, and the eigenvalues determine the squared frequencies of these motions.

\[ \omega_i = \sqrt{\frac{\lambda_i}{m}}. \]

Thus linear algebra reveals the natural motions of the physical system.

The SVD

Singular values, the SVD theorem, and geometric interpretation for general \(m\times n\) matrices.

Singular Values

We’ve talked a lot about eigenvalues and eigenvectors, but these only make any sense for square matrices. What can we do for a general \(m \times n\) matrix \(A\)?

. . .

It turns out we can learn a lot from the matrix \(A^{T} A\) (or \(A A^{T}\)), which is always square and symmetric.

. . .

The singular values \(\sigma_{1}, \ldots, \sigma_{r}\) of an \(m \times n\) matrix \(A\) are the positive square roots, \(\sigma_{i}=\sqrt{\lambda_{i}}>0\), of the nonzero eigenvalues of the associated “Gram matrix” \(K=A^{T} A\).

. . .

The corresponding eigenvectors of \(K\) are known as the singular vectors of \(A\).

. . .

All of the eigenvalues of \(K\) are real and nonnegative – but some may be zero.

. . .

If \(K=A^{T} A\) has repeated eigenvalues, the singular values of \(A\) are repeated with the same multiplicities.

. . .

The number \(r\) of singular values is equal to the rank of the matrices \(A\) and \(K\).

Example

Let \(A=\left(\begin{array}{ll}3 & 5 \\ 4 & 0\end{array}\right)\).

\[ K=A^{T} A=\left(\begin{array}{ll} 3 & 4 \\ 5 & 0 \end{array}\right)\left(\begin{array}{ll} 3 & 5 \\ 4 & 0 \end{array}\right)=\left(\begin{array}{ll} 25 & 15 \\ 15 & 25 \end{array}\right) \]

. . .

This has eigenvalues \(\lambda_{1}=40, \lambda_{2}=10\), and corresponding eigenvectors \(\mathbf{v}_{1}=\binom{1}{1}, \mathbf{v}_{2}=\binom{1}{-1}\).

. . .

Therefore, the singular values of \(A\) are \(\sigma_{1}=\sqrt{40}=2 \sqrt{10}, \sigma_{2}=\sqrt{10}\).

Singular Values of a Symmetric Matrix

Singular Value Decomposition

Let \(A\) be an \(m \times n\) real matrix. Then there exist an \(m \times m\) orthogonal matrix \(U\), an \(n \times n\) orthogonal matrix \(V\), and an \(m \times n\) diagonal matrix \(\Sigma\) with diagonal entries \(\sigma_{1} \geq \sigma_{2} \geq \cdots \geq \sigma_{p} \geq 0\), with \(p=\min \{m, n\}\), such that \(U^{T} A V=\Sigma\). Moreover, the numbers \(\sigma_{1}, \sigma_{2}, \ldots, \sigma_{p}\) are uniquely determined by \(A\).

Proof:

Geometric interpretation of the SVD

(following closely this blog post)

Goal: to understand the SVD as finding perpendicular axes that remain perpendicular after a transformation.

Take a very simple matrix:

\[ A=\left(\begin{array}{cc} 6 & 2 \\ -7 & 6 \end{array}\right) \]

. . .

Represents a linear map \(\mathrm{T}: \mathrm{R}^{2} \rightarrow \mathbf{R}^{2}\) with respect to the standard basis \(e_{1}=(1,0)\) and \(e_{2}=(0,1)\).

. . .

Sends the usual basis elements \(e_{1} \rightsquigarrow(6,-7)\) and \(e_{2} \rightsquigarrow(2,6)\).

We can see what T does to any vector:

\[ \begin{aligned} T(2,3) & =T(2,0)+T(0,3)=T\left(2 e_{1}\right)+T\left(3 e_{2}\right) \\ & =2 T\left(e_{1}\right)+3 T\left(e_{2}\right)=2(6,-7)+3(2,6) \\ & =(18,4) \end{aligned} \]

The usual basis elements are no longer perpendicular after the transformation:

Given a linear map T:Rm \(\rightarrow R^{n}\) (which is represented by a \(n \times m\) matrix \(A\) with respect to the standard basis),

  • can we find an orthonormal basis \(\left\{v_{-} 1, v_{-} 2, \ldots, v_{-} m\right\}\) of \(R^{m}\) such that
  • \(\left\{T\left(v_{-} 1\right), T\left(v_{-} 2\right), \ldots\right.\) \(\left.\mathrm{T}\left(\mathrm{v}_{-} \mathrm{m}\right)\right\}\) are still mutually perpendicular to each other?

This is what the SVD does!

Let the SVD of \(A\) be \(A=U \Sigma V^{T}\)

. . .

How are \(U\) and \(V\) related?

Skills

  • Define singular values from \(A^T A\) and state the SVD theorem
  • Compute singular values for a given matrix
  • Explain the SVD geometrically: orthogonal input axes map to orthogonal output axes