FCLA Similarity and Diagonalization

Section SD Similarity and Diagonalization

This section's topic will perhaps seem out of place at first, but we will make the connection soon with eigenvalues and eigenvectors. This is also our first look at one of the central ideas of Chapter R.

Subsection SM Similar Matrices

The notion of matrices being “similar” is a lot like saying two matrices are row-equivalent. Two similar matrices are not equal, but they share many important properties. This section, and later sections in Chapter R will be devoted in part to discovering just what these common properties are.

First, the main definition for this section.

Definition SIM. Similar Matrices.

Suppose \(A\) and \(B\) are two square matrices of size \(n\text{.}\) Then \(A\) and \(B\) are similar if there exists a nonsingular matrix of size \(n\text{,}\) \(S\text{,}\) such that \(\similar{A}{S}=B\text{.}\)

Equivalently, we can require that \(AS=SB\text{.}\)

We will say “\(A\) is similar to \(B\) via \(S\)” when we want to emphasize the role of \(S\) in the relationship between \(A\) and \(B\text{.}\) Also, it does not matter if we say \(A\) is similar to \(B\text{,}\) or \(B\) is similar to \(A\text{.}\) If one statement is true then so is the other, as can be seen by using \(\inverse{S}\) in place of \(S\) (see Theorem SER for the careful proof). Finally, we will refer to \(\similar{A}{S}\) as a similarity transformation when we want to emphasize the way \(S\) changes \(A\text{.}\) OK, enough about language, let us build a few examples.

Example SMS5. Similar matrices of size 5.

If you wondered if there are examples of similar matrices, then it will not be hard to convince you they exist. Define

\begin{align*} A=\begin{bmatrix} -4 & 1 & -3 & -2 & 2 \\ 1 & 2 & -1 & 3 & -2 \\ -4 & 1 & 3 & 2 & 2 \\ -3 & 4 & -2 & -1 & -3 \\ 3 & 1 & -1 & 1 & -4 \end{bmatrix}&& S=\begin{bmatrix} 1 & 2 & -1 & 1 & 1 \\ 0 & 1 & -1 & -2 & -1 \\ 1 & 3 & -1 & 1 & 1 \\ -2 & -3 & 3 & 1 & -2 \\ 1 & 3 & -1 & 2 & 1\\ \end{bmatrix}\text{.} \end{align*}

Check that \(S\) is nonsingular and then compute

\begin{align*} &B=\similar{A}{S}\\ &= \begin{bmatrix} 10 & 1 & 0 & 2 & -5 \\ -1 & 0 & 1 & 0 & 0 \\ 3 & 0 & 2 & 1 & -3 \\ 0 & 0 & -1 & 0 & 1 \\ -4 & -1 & 1 & -1 & 1 \end{bmatrix} \begin{bmatrix} -4 & 1 & -3 & -2 & 2 \\ 1 & 2 & -1 & 3 & -2 \\ -4 & 1 & 3 & 2 & 2 \\ -3 & 4 & -2 & -1 & -3 \\ 3 & 1 & -1 & 1 & -4 \end{bmatrix} \begin{bmatrix} 1 & 2 & -1 & 1 & 1 \\ 0 & 1 & -1 & -2 & -1 \\ 1 & 3 & -1 & 1 & 1 \\ -2 & -3 & 3 & 1 & -2 \\ 1 & 3 & -1 & 2 & 1 \end{bmatrix}\\ &= \begin{bmatrix} -10 & -27 & -29 & -80 & -25 \\ -2 & 6 & 6 & 10 & -2 \\ -3 & 11 & -9 & -14 & -9 \\ -1 & -13 & 0 & -10 & -1 \\ 11 & 35 & 6 & 49 & 19 \end{bmatrix}\text{.} \end{align*}

So by this construction, we know that \(A\) and \(B\) are similar.

Let us do that again.

Example SMS3. Similar matrices of size 3.

Define

\begin{align*} A=\begin{bmatrix} -13 & -8 & -4 \\ 12 & 7 & 4 \\ 24 & 16 & 7 \end{bmatrix}&& S=\begin{bmatrix} 1 & 1 & 2 \\ -2 & -1 & -3 \\ 1 & -2 & 0 \end{bmatrix}\text{.} \end{align*}

Check that \(S\) is nonsingular and then compute

\begin{align*} B&=\similar{A}{S}\\ &= \begin{bmatrix} -6 & -4 & -1 \\ -3 & -2 & -1 \\ 5 & 3 & 1 \end{bmatrix} \begin{bmatrix} -13 & -8 & -4 \\ 12 & 7 & 4 \\ 24 & 16 & 7 \end{bmatrix} \begin{bmatrix} 1 & 1 & 2 \\ -2 & -1 & -3 \\ 1 & -2 & 0 \end{bmatrix}\\ &= \begin{bmatrix} -1 & 0 & 0 \\ 0 & 3 & 0 \\ 0 & 0 & -1 \end{bmatrix}\text{.} \end{align*}

So by this construction, we know that \(A\) and \(B\) are similar. But before we move on, look at how pleasing the form of \(B\) is. Not convinced? Then consider that several computations related to \(B\) are especially easy. For example, the eigenvalues are transparently \(\lambda=3,\,-1\) since the eigenvectors of \(B\) are just the standard unit vectors (Definition SUV).

Subsection PSM Properties of Similar Matrices

Similar matrices share many properties and it is these theorems that justify the choice of the word “similar.” First we will show that similarity is an equivalence relation. Equivalence relations are important in the study of various algebras and can always be regarded as a kind of weak version of equality. Sort of alike, but not quite equal. The notion of two matrices being row-equivalent is an example of an equivalence relation we have been working with since the beginning of the course (see Exercise RREF.T11). Row-equivalent matrices are not equal, but they are a lot alike. For example, row-equivalent matrices have the same rank. Formally, an equivalence relation requires three conditions hold: reflexive, symmetric and transitive. We will illustrate these as we prove that similarity is an equivalence relation.

Theorem SER. Similarity is an Equivalence Relation.

Suppose \(A\text{,}\) \(B\) and \(C\) are square matrices of size \(n\text{.}\) Then we have the following three properties, by name.

Reflexive: \(A\) is similar to \(A\text{.}\)
Symmetric: If \(A\) is similar to \(B\text{,}\) then \(B\) is similar to \(A\text{.}\)
Transitive: If \(A\) is similar to \(B\) and \(B\) is similar to \(C\text{,}\) then \(A\) is similar to \(C\text{.}\)

Proof.

To see that \(A\) is similar to \(A\text{,}\) we need only demonstrate a nonsingular matrix that effects a similarity transformation of \(A\) to \(A\text{.}\) \(I_n\) is nonsingular (since it row-reduces to the identity matrix, Theorem NMRRI). Then

\begin{equation*} AI_n = A = I_nA \end{equation*}

and we see that \(A\) is similar to \(A\) via the nonsingular matrix \(I_n\text{.}\)

If we assume that \(A\) is similar to \(B\text{,}\) then we know there is a nonsingular matrix \(S\) so that \(AS=SB\) by Definition SIM. By Theorem MIMI, \(\inverse{S}\) is invertible, and by Theorem NI is therefore nonsingular. Then

\begin{equation*} B\inverse{S} = \left(\inverse{S}S\right)B\inverse{S} = \inverse{S}\left(SB\right)\inverse{S} = \inverse{S}\left(AS\right)\inverse{S} = \inverse{S}A\left(S\inverse{S}\right) = \inverse{S}A \end{equation*}

and we see that \(B\) is similar to \(A\) via the nonsingular matrix \(\inverse{S}\text{.}\)

If we assume that \(A\) is similar to \(B\text{,}\) and \(B\) is similar to \(C\text{,}\) then we know there are two nonsingular matrices, \(S\) and \(R\text{,}\) such that \(AS=SB\) and \(BR=RC\text{,}\) by Definition SIM. (Notice how we cannot presume \(S\) and \(R\) are the same matrix!) Since \(S\) and \(R\) are invertible, so too \(SR\) is invertible by Theorem SS and then nonsingular by Theorem NI. Then

\begin{equation*} A\left(SR\right) = \left(AS\right)R = \left(SB\right)R = S\left(BR\right) = S\left(RC\right) = \left(SR\right)C \end{equation*}

so \(A\) is similar to \(C\) via the nonsingular matrix \(SR\text{.}\)

When \(A\) and \(B\) are similar via \(S\) you can think of reversing the order of the matrix product \(AS\) at a small cost—the matrix \(A\) must “change into” \(B\) in the new matrix product \(SB\text{.}\) That is wildly informal, but you might employ this mental picture as you study the next theorem.

Theorem PSMS. Polynomials of Similar Matrices are Similar.

Suppose \(A\) and \(B\) are similar matrices and \(p(x)\) is a polynomial. Then \(p(A)\) and \(p(B)\) are similar matrices. More precisely, if \(AS=SB\) then \(p(A)S=Sp(B)\text{.}\)

Proof.

Suppose \(AS=SB\text{.}\) Then

\begin{equation*} A^kS = A^{k-1}AS = A^{k-1}SB = A^{k-2}ASB = A^{k-2}SBB = A^{k-2}SB^2 = \cdots = SB^k\text{.} \end{equation*}

(Note how there is a simple proof by induction, Proof Technique I, disguised by the ellipsis, see Exercise SD.T10.) So like powers of \(A\) and \(B\) are similar.

For concreteness, define \(p(x)=a_0+a_1x+a_2x^2+a_3x^3+\cdots+a_nx^n\text{.}\) Then

\begin{align*} p(A)S &=\left(a_0+a_1A+a_2A^2+a_3A^3+\cdots+a_nA^n\right)S\\ &=a_0S+a_1AS+a_2A^2S+a_3A^3S+\cdots+a_nA^nS&&\knowl{./knowl/theorem-MMDAA.html}{\text{Theorem MMDAA}}\\ &=a_0S+a_1SB+a_2SB^2+a_3SB^3+\cdots+a_nSB^n&&\text{Similarity of powers}\\ &=Sa_0+Sa_1B+Sa_2B^2+Sa_3B^3+\cdots+Sa_nB^n&&\knowl{./knowl/theorem-MMSMM.html}{\text{Theorem MMSMM}}\\ &=S\left(a_0+a_1B+a_2B^2+a_3B^3+\cdots+a_nB^n\right)&&\knowl{./knowl/theorem-MMDAA.html}{\text{Theorem MMDAA}}\\ &=Sp(B)&&\knowl{./knowl/theorem-MMDAA.html}{\text{Theorem MMDAA}}\text{.} \end{align*}

Notice that we never exercised our assumption that \(S\) was nonsingular. However a consideration of sizes will imply that \(S\) cannot be entirely arbitrary, and must be square, with of the same size as that common to \(A\) and \(B\text{.}\)

Here is another theorem that tells us exactly what sorts of properties similar matrices share.

Theorem SMEE. Similar Matrices have Equal Eigenvalues.

Suppose \(A\) and \(B\) are similar matrices. Then \(A\) and \(B\) have the same eigenvalues, with identical algebraic and geometric multiplicities.

Proof.

Suppose that \(A\) and \(B\) are similar via \(S\text{,}\) so \(AS=SB\text{,}\) and suppose these matrices are all square of size \(n\text{.}\) We can get all three parts of the conclusion easily from the following claim. For any \(k\text{,}\)

\begin{equation*} \dimension{\nsp{\left(B-\lambda I_n\right)^k}} \leq \dimension{\nsp{\left(A-\lambda I_n\right)^k}}\text{.} \end{equation*}

Suppose that \(C=\set{\vectorlist{x}{\ell}}\) is a basis for \(\nsp{\left(B-\lambda I_n\right)^k}\text{.}\) Consider the vectors \(S\vect{x}_i\) and apply Theorem PSMS with \(p(x)=(x-\lambda)^k\)

\begin{equation*} \left(A-\lambda I_n\right)^kS\vect{x}_i = S\left(B-\lambda I_n\right)^k\vect{x}_i = S\zerovector = \zerovector \end{equation*}

which qualifies \(S\vect{x}_i\) for membership in \(\nsp{\left(A-\lambda I_n\right)^k}\text{.}\) So the set \(D=\set{S\vect{x}_1,\,S\vect{x}_2,\,S\vect{x}_3,\ldots,\,S\vect{x}_\ell}\) is contained in \(\nsp{\left(A-\lambda I_n\right)^k}\text{.}\) Furthermore, it is an exercise to show that because \(S\) is nonsingular, the linear independence of \(C\) implies the linear independence of \(D\) (see Exercise MM.T80). Thus, by Theorem G we have the claimed inequality for the dimensions.

We are acting as if \(\lambda\) is an eigenvalue (of something) but the claim never assumed this, but rather only assumed \(\lambda\) is a scalar. Suppose now that \(\lambda\) is an eigenvector of \(B\text{.}\) Then \(\dimension{\nsp{B-\lambda I_n}}\geq 1\) and thus by the claim with \(k=1\) we have \(\dimension{\nsp{A-\lambda I_n}}\geq 1\text{.}\) By Theorem ESM, we see that \(\lambda\) is an eigenvalue of \(A\text{.}\) By entirely the same logic we can show that if \(\lambda\) is an eigenvalue of \(A\text{,}\) then \(\lambda\) is an eigenvalue of \(B\text{.}\) So the eigenvalues of \(A\) and the eigenvalues of \(B\) are equal as sets (Definition SE).

Employing the claim again with \(k=1\) we have

\begin{equation*} \geomult{B}{\lambda} = \dimension{\nsp{B-\lambda I_n}} \leq \dimension{\nsp{A-\lambda I_n}} = \geomult{A}{\lambda}\text{.} \end{equation*}

Reversing the roles of the matrices will reverse the inequality, and together the inequalities give the desired equality of the geometric multiplicities.

Employing the claim yet again with \(k=n\text{,}\) and employing Theorem GENS, we have

\begin{equation*} \algmult{B}{\lambda} = \dimension{\nsp{\left(B-\lambda I_n\right)^n}} \leq \dimension{\nsp{\left(A-\lambda I_n\right)^n}} = \algmult{A}{\lambda}\text{.} \end{equation*}

Reversing the roles of the matrices will reverse the inequality, and together the inequalities give the desired equality of the algebraic multiplicities.

Be very careful with this theorem. It is tempting to think the converse is true, and argue that if two matrices have the same eigenvalues, then they are similar. Not so, as the following example illustrates. So do not think this theorem is a route to establishing that two matrices are similar.

Example EENS. Equal eigenvalues, not similar.

Define

\begin{align*} A&=\begin{bmatrix}1&1\\0&1\end{bmatrix} & B&=\begin{bmatrix}1&0\\0&1\end{bmatrix}\text{.} \end{align*}

Since \(A\) and \(B\) are both upper triangular, Theorem ETM describes their eigenvalues and we see that their eigenvalues are equal.

If the converse of Theorem SMEE were true, then \(A\) and \(B\) would be similar. Suppose this is the case. More precisely, suppose there is a nonsingular matrix \(S\) so that \(A=\similar{B}{S}\text{.}\)

Then

\begin{equation*} A=\similar{B}{S}=\similar{I_2}{S}=\inverse{S}S=I_2\text{.} \end{equation*}

Clearly \(A\neq I_2\) and this contradiction tells us that two matrices can have the same eigenvalues but not be similar. (Note that the geometric multiplicity of the eigenvalue of \(\lambda=1\) is different for the two matrices. So strictly speaking this example does not prove that the converse of Theorem SMEE is false. We are simply cautioning against a frequent temptation.)

Sage SM. Similar Matrices.

It is quite easy to determine if two matrices are similar, using the matrix method .is_similar(). However, computationally this can be a very difficult proposition, so support in Sage is incomplete now, though it will always return a result for matrices with rational entries. Here are examples where the two matrices are, and are not, similar. Notice that the keyword option transformation=True will cause a pair to be returned, such that if the matrices are indeed similar, the matrix effecting the similarity transformation will be in the second slot of the pair.

Since we knew in advance these two matrices are similar, we requested the transformation matrix, so the output is a pair. The similarity matrix is a bit of a mess, so we will use three Sage routines to clean up trans. We convert the entries to numerical approximations, clip very small values (less than \(10^{-5}\)) to zero and then round to three decimal places. You can experiment printing just trans all by itself.

The matrix C is not similar to A (and hence not similar to B by Theorem SER), so we illustrate the return value when we do not request the similarity matrix (since it does not even exist).

Subsection D Diagonalization

Good things happen when a matrix is similar to a diagonal matrix. For example, the eigenvalues of the matrix are the entries on the diagonal of the diagonal matrix. And it can be a much simpler matter to compute high powers of the matrix. Diagonalizable matrices are also of interest in more abstract settings. Here are the relevant definitions, then our main theorem for this section.

Definition DIM. Diagonal Matrix.

Suppose that \(A\) is a square matrix. Then \(A\) is a diagonal matrix if \(\matrixentry{A}{ij}=0\) whenever \(i\neq j\text{.}\)

Definition DZM. Diagonalizable Matrix.

Suppose \(A\) is a square matrix. Then \(A\) is diagonalizable if \(A\) is similar to a diagonal matrix.

Example DAB. Diagonalization of Archetype B.

Archetype B has a \(3\times 3\) coefficient matrix

\begin{equation*} B=\begin{bmatrix} -7&-6&-12\\ 5&5&7\\ 1&0&4 \end{bmatrix} \end{equation*}

and is similar to a diagonal matrix, as can be seen by the following computation with the nonsingular matrix \(S\text{,}\)

\begin{align*} \similar{B}{S}&= \inverse{\begin{bmatrix}-5&-3&-2\\3&2&1\\1&1&1\end{bmatrix}}\begin{bmatrix} -7&-6&-12\\ 5&5&7\\ 1&0&4 \end{bmatrix} \begin{bmatrix}-5&-3&-2\\3&2&1\\1&1&1\end{bmatrix}\\ &=\begin{bmatrix}-1&-1&-1\\2&3&1\\-1&-2&1\end{bmatrix} \begin{bmatrix} -7&-6&-12\\ 5&5&7\\ 1&0&4 \end{bmatrix} \begin{bmatrix}-5&-3&-2\\3&2&1\\1&1&1\end{bmatrix}\\ &= \begin{bmatrix}-1&0&0\\0&1&0\\0&0&2\end{bmatrix}\text{.} \end{align*}

Example SMS3 provides yet another example of a matrix that is subjected to a similarity transformation and the result is a diagonal matrix. Alright, just how would we find the magic matrix \(S\) that can be used in a similarity transformation to produce a diagonal matrix? Before you read the statement of the next theorem, you might study the eigenvalues and eigenvectors of Archetype B and compute the eigenvalues and eigenvectors of the matrix in Example SMS3.

Theorem DC. Diagonalization Characterization.

Suppose \(A\) is a square matrix of size \(n\text{.}\) Then \(A\) is diagonalizable if and only if there exists a linearly independent set \(S\) that contains \(n\) eigenvectors of \(A\text{.}\)

Proof.

(⇐)

Let \(S=\set{\vectorlist{x}{n}}\) be a linearly independent set of eigenvectors of \(A\) for the eigenvalues \(\scalarlist{\lambda}{n}\text{.}\) Recall Definition SUV and define

\begin{align*} R&=\matrixcolumns{x}{n} & D&= \begin{bmatrix} \lambda_1 & 0 & 0 &\cdots & 0\\ 0 &\lambda_2 & 0 &\cdots & 0\\ 0 & 0 &\lambda_3 &\cdots & 0\\ \vdots & \vdots & \vdots & & \vdots\\ 0 & 0 & 0 &\cdots & \lambda_n \end{bmatrix} =[\lambda_1\vect{e}_1|\lambda_2\vect{e}_2|\lambda_3\vect{e}_3|\ldots|\lambda_n\vect{e}_n]\text{.} \end{align*}

The columns of \(R\) are the vectors of the linearly independent set \(S\) and so by Theorem NMLIC the matrix \(R\) is nonsingular. We have

\begin{align*} AR &=A\matrixcolumns{x}{n}\\ &=[A\vect{x}_1|A\vect{x}_2|A\vect{x}_3|\ldots|A\vect{x}_n]&& \knowl{./knowl/definition-MM.html}{\text{Definition MM}}\\ &=[\lambda_1\vect{x}_1|\lambda_2\vect{x}_2|\lambda_3\vect{x}_3|\ldots|\lambda_n\vect{x}_n]&& \knowl{./knowl/definition-EEM.html}{\text{Definition EEM}}\\ &=[\lambda_1R\vect{e}_1|\lambda_2R\vect{e}_2|\lambda_3R\vect{e}_3|\ldots|\lambda_nR\vect{e}_n]&& \knowl{./knowl/definition-MVP.html}{\text{Definition MVP}}\\ &=[R(\lambda_1\vect{e}_1)|R(\lambda_2\vect{e}_2)|R(\lambda_3\vect{e}_3)|\ldots|R(\lambda_n\vect{e}_n)]&& \knowl{./knowl/theorem-MMSMM.html}{\text{Theorem MMSMM}}\\ &=R[\lambda_1\vect{e}_1|\lambda_2\vect{e}_2|\lambda_3\vect{e}_3|\ldots|\lambda_n\vect{e}_n]&& \knowl{./knowl/definition-MM.html}{\text{Definition MM}}\\ &=RD\text{.} \end{align*}

This says that \(A\) is similar to the diagonal matrix \(D\) via the nonsingular matrix \(R\text{.}\) Thus \(A\) is diagonalizable (Definition DZM).

(⇒)

Suppose that \(A\) is diagonalizable, so there is a nonsingular matrix \(T\) of size \(n\) and a diagonal matrix \(E\) (recall Definition SUV)

\begin{align*} T&=\matrixcolumns{y}{n} & E&=\begin{bmatrix} d_1 & 0 & 0 &\cdots & 0\\ 0 &d_2 & 0 &\cdots & 0\\ 0 & 0 &d_3 &\cdots & 0\\ \vdots & \vdots & \vdots & & \vdots\\ 0 & 0 & 0 &\cdots & d_n \end{bmatrix} =[d_1\vect{e}_1|d_2\vect{e}_2|d_3\vect{e}_3|\ldots|d_n\vect{e}_n]&&\text{} \end{align*}

such that \(AT=TE\text{.}\)

Then consider,

\begin{align*} [A\vect{y}_1|A\vect{y}_2|A\vect{y}_3&|\ldots|A\vect{y}_n]\\ &=A\matrixcolumns{y}{n}&& \knowl{./knowl/definition-MM.html}{\text{Definition MM}}\\ &=AT\\ &=TE&&\knowl{./knowl/definition-SIM.html}{\text{Definition SIM}}\\ &=T[d_1\vect{e}_1|d_2\vect{e}_2|d_3\vect{e}_3|\ldots|d_n\vect{e}_n]\\ &=[T(d_1\vect{e}_1)|T(d_2\vect{e}_2)|T(d_3\vect{e}_3)|\ldots|T(d_n\vect{e}_n)]&& \knowl{./knowl/definition-MM.html}{\text{Definition MM}}\\ &=[d_1T\vect{e}_1|d_2T\vect{e}_2|d_3T\vect{e}_3|\ldots|d_nT\vect{e}_n]&& \knowl{./knowl/theorem-MMSMM.html}{\text{Theorem MMSMM}}\\ &=[d_1\vect{y}_1|d_2\vect{y}_2|d_3\vect{y}_3|\ldots|d_n\vect{y}_n]&& \knowl{./knowl/definition-MVP.html}{\text{Definition MVP}}\text{.} \end{align*}

This equality of matrices (Definition ME) allows us to conclude that the individual columns are equal vectors (Definition CVE). That is, \(A\vect{y}_i=d_i\vect{y}_i\) for \(1\leq i\leq n\text{.}\) In other words, \(\vect{y}_i\) is an eigenvector of \(A\) for the eigenvalue \(d_i\text{,}\) \(1\leq i\leq n\text{.}\) (Why does \(\vect{y}_i\neq\zerovector\text{?}\)). Because \(T\) is nonsingular, the set containing \(T\)'s columns, \(S=\set{\vectorlist{y}{n}}\text{,}\) is a linearly independent set (Theorem NMLIC). So the set \(S\) has all the required properties.

Notice that the proof of Theorem DC is constructive. To diagonalize a matrix, we need only locate \(n\) linearly independent eigenvectors. Then we can construct a nonsingular matrix using the eigenvectors as columns (\(R\)) so that \(\inverse{R}AR\) is a diagonal matrix (\(D\)). The entries on the diagonal of \(D\) will be the eigenvalues of the eigenvectors used to create \(R\text{,}\) in the same order as the eigenvectors appear in \(R\text{.}\) We illustrate this by diagonalizing some matrices.

Example DMS3. Diagonalizing a matrix of size 3.

Consider the matrix

\begin{equation*} F= \begin{bmatrix} -13 & -8 & -4\\ 12 & 7 & 4\\ 24 & 16 & 7 \end{bmatrix} \end{equation*}

of Example CPMS3, Example EMS3 and Example ESMS3. \(F\)'s eigenvalues and eigenspaces are

\begin{align*} \lambda&=3&\eigenspace{F}{3}&=\spn{\set{\colvector{-\frac{1}{2}\\\frac{1}{2}\\1}}}\\ \lambda&=-1&\eigenspace{F}{-1}&=\spn{\set{\colvector{-\frac{2}{3}\\1\\0},\,\colvector{-\frac{1}{3}\\0\\1}}}\text{.} \end{align*}

Define the matrix \(S\) to be the \(3\times 3\) matrix whose columns are the three basis vectors in the eigenspaces for \(F\text{,}\)

\begin{equation*} S= \begin{bmatrix} -\frac{1}{2} & -\frac{2}{3} & -\frac{1}{3}\\ \frac{1}{2} & 1 & 0\\ 1 & 0 & 1 \end{bmatrix}\text{.} \end{equation*}

Check that \(S\) is nonsingular (row-reduces to the identity matrix, Theorem NMRRI, or has a nonzero determinant, Theorem SMZD). Then the three columns of \(S\) are a linearly independent set (Theorem NMLIC). By Theorem DC we now know that \(F\) is diagonalizable. Furthermore, the construction in the proof of Theorem DC tells us that if we apply the matrix \(S\) to \(F\) in a similarity transformation, the result will be a diagonal matrix with the eigenvalues of \(F\) on the diagonal. The eigenvalues appear on the diagonal of the matrix in the same order as the eigenvectors appear in \(S\text{.}\) So,

\begin{align*} \similar{F}{S}&= \inverse{ \begin{bmatrix} -\frac{1}{2} & -\frac{2}{3} & -\frac{1}{3}\\ \frac{1}{2} & 1 & 0\\ 1 & 0 & 1 \end{bmatrix} } \begin{bmatrix} -13 & -8 & -4\\ 12 & 7 & 4\\ 24 & 16 & 7 \end{bmatrix} \begin{bmatrix} -\frac{1}{2} & -\frac{2}{3} & -\frac{1}{3}\\ \frac{1}{2} & 1 & 0\\ 1 & 0 & 1 \end{bmatrix}\\ &= \begin{bmatrix} 6 & 4 & 2\\ -3 & -1 & -1\\ -6 & -4 & -1 \end{bmatrix} \begin{bmatrix} -13 & -8 & -4\\ 12 & 7 & 4\\ 24 & 16 & 7 \end{bmatrix} \begin{bmatrix} -\frac{1}{2} & -\frac{2}{3} & -\frac{1}{3}\\ \frac{1}{2} & 1 & 0\\ 1 & 0 & 1 \end{bmatrix}\\ &= \begin{bmatrix} 3 & 0 & 0\\ 0 & -1 & 0\\ 0 & 0 & -1 \end{bmatrix}\text{.} \end{align*}

Note that the above computations can be viewed two ways. The proof of Theorem DC tells us that the four matrices (\(F\text{,}\) \(S\text{,}\) \(\inverse{F}\) and the diagonal matrix) will interact the way we have written the equation. Or as an example, we can actually perform the computations to verify what the theorem predicts.

The dimension of an eigenspace can be no larger than the algebraic multiplicity of the eigenvalue by Theorem ME. When every eigenvalue's eigenspace is this large, then we can diagonalize the matrix, and only then. Three examples we have seen so far in this section, Example SMS5, Example DAB and Example DMS3, illustrate the diagonalization of a matrix, with varying degrees of detail about just how the diagonalization is achieved. However, in each case, you can verify that the geometric and algebraic multiplicities are equal for every eigenvalue. This is the substance of the next theorem.

Example SEE, Example CASE, Example ESMS3, [cross-reference to target(s) "example-ESMS4" missing or not unique], [cross-reference to target(s) "example-DEMS5" missing or not unique], Archetype B, Archetype F, Archetype K and Archetype L are all examples of matrices that are diagonalizable and that illustrate Theorem DMFE. While we have provided many examples of matrices that are diagonalizable, especially among the archetypes, there are many matrices that are not diagonalizable. Here is one now.

Example NDMS4. A non-diagonalizable matrix of size 4.

In [cross-reference to target(s) "example-EMMS4" missing or not unique] the matrix

\begin{equation*} B= \begin{bmatrix} -2 & 1 & -2 & -4\\ 12 & 1 & 4 & 9\\ 6 & 5 & -2 & -4\\ 3 & -4 & 5 & 10 \end{bmatrix} \end{equation*}

was determined to have characteristic polynomial

\begin{equation*} \charpoly{B}{x}=(x-1)(x-2)^3 \end{equation*}

and an eigenspace for \(\lambda=2\) of

\begin{equation*} \eigenspace{B}{2}=\spn{\set{\colvector{-\frac{1}{2}\\1\\-\frac{1}{2}\\1}}}\text{.} \end{equation*}

So the geometric multiplicity of \(\lambda=2\) is \(\geomult{B}{2}=1\text{,}\) while the algebraic multiplicity is \(\algmult{B}{2}=3\text{.}\) By Theorem DMFE, the matrix \(B\) is not diagonalizable.

Archetype A is the lone archetype with a square matrix that is not diagonalizable, as the algebraic and geometric multiplicities of the eigenvalue \(\lambda=0\) differ. [cross-reference to target(s) "example-HMEM5" missing or not unique] is another example of a matrix that cannot be diagonalized due to the difference between the geometric and algebraic multiplicities of \(\lambda=2\text{,}\) as is [cross-reference to target(s) "example-CEMS6" missing or not unique] which has two complex eigenvalues, each with differing multiplicities. Likewise, [cross-reference to target(s) "example-EMMS4" missing or not unique] has an eigenvalue with different algebraic and geometric multiplicities and so cannot be diagonalized.

Sage MD. Matrix Diagonalization.

The third way to get eigenvectors is the matrix method .eigenmatrix_right() (and the analogous .eigenmatrix_left()). It always returns two square matrices of the same size as the original matrix. The first matrix of the output is a diagonal matrix with the eigenvalues of the matrix filling the diagonal entries of the matrix. The second matrix has eigenvectors in the columns, in the same order as the corresponding eigenvalues. For a single eigenvalue, these columns/eigenvectors form a linearly independent set.

A careful reading of the previous paragraph suggests the question: what if we do not have enough eigenvectors to fill the columns of the second square matrix? When the geometric multiplicity does not equal the algebraic multiplicity, the deficit is met by inserting zero columns in the matrix of eigenvectors. Conversely, when the matrix is diagonalizable, by Theorem DMFE the geometric and algebraic multiplicities of each eigenvalue are equal, and the union of the bases of the eigenspaces provides a complete set of linearly independent vectors. So for a matrix \(A\text{,}\) Sage will output two matrices, \(D\) and \(S\) such that \(\inverse{S}AS=D\text{.}\)

We can rewrite the relation above as \(AS=SD\text{.}\) In the case of a non-diagonalizable matrix, the matrix of eigenvectors is singular (it has zero columns), but the relationship \(AS=SD\) still holds. Here are examples of the two scenarios, along with demonstrations of the matrix method is_diagonalizable().

Now for a matrix that is far from diagonalizable.

Theorem DED. Distinct Eigenvalues implies Diagonalizable.

Suppose \(A\) is a square matrix of size \(n\) with \(n\) distinct eigenvalues. Then \(A\) is diagonalizable.

Proof.

If we collect a single eigenvector of \(A\) for each eigenvalue, then we will have a set of \(n\) vectors that Theorem EDELI guarantees is a linearly independent set. Then Theorem DC implies that \(A\) is diagonalizable.

Example DEHD. Distinct eigenvalues, hence diagonalizable.

In [cross-reference to target(s) "example-DEMS5" missing or not unique] the matrix

\begin{equation*} H= \begin{bmatrix} 15 & 18 & -8 & 6 & -5\\ 5 & 3 & 1 & -1 & -3\\ 0 & -4 & 5 & -4 & -2\\ -43 & -46 & 17 & -14 & 15\\ 26 & 30 & -12 & 8 & -10 \end{bmatrix} \end{equation*}

has characteristic polynomial

\begin{equation*} \charpoly{H}{x}=x(x-2)(x-1)(x+1)(x+3) \end{equation*}

and so is a \(5\times 5\) matrix with 5 distinct eigenvalues.

By Theorem DED we know \(H\) must be diagonalizable. But just for practice, we exhibit a diagonalization. The matrix \(S\) contains eigenvectors of \(H\) as columns, one from each eigenspace, guaranteeing linear independent columns and thus the nonsingularity of \(S\text{.}\) Notice that we are using the versions of the eigenvectors from [cross-reference to target(s) "example-DEMS5" missing or not unique] that have integer entries. The diagonal matrix has the eigenvalues of \(H\) in the same order that their respective eigenvectors appear as the columns of \(S\text{.}\) With these matrices, verify computationally that \(\similar{H}{S}=D\text{.}\)

\begin{align*} S&= \begin{bmatrix} 2 & 1 & -1 & 1 & 1\\ -1 & 0 & 2 & 0 & -1\\ -2 & 0 & 2 & -1 & -2\\ -4 & -1 & 0 & -2 & -1\\ 2 & 2 & 1 & 2 & 1 \end{bmatrix} &D&= \begin{bmatrix} -3 & 0 & 0 & 0 & 0\\ 0 & -1 & 0 & 0 & 0\\ 0 & 0 & 0 & 0 & 0\\ 0 & 0 & 0 & 1 & 0\\ 0 & 0 & 0 & 0 & 2 \end{bmatrix}\text{.} \end{align*}

Note that there are many different ways to diagonalize \(H\text{.}\) We could replace eigenvectors by nonzero scalar multiples, or we could rearrange the order of the eigenvectors as the columns of \(S\) (which would subsequently reorder the eigenvalues along the diagonal of \(D\)).

Archetype B is another example of a matrix that has as many distinct eigenvalues as its size, and is hence diagonalizable by Theorem DED.

Powers of a diagonal matrix are easy to compute, and when a matrix is diagonalizable, it is almost as easy. We could state a theorem here perhaps, but we will settle instead for an example that makes the point just as well.

Example HPDM. High power of a diagonalizable matrix.

Suppose that

\begin{equation*} A=\begin{bmatrix} 19 & 0 & 6 & 13 \\ -33 & -1 & -9 & -21 \\ 21 & -4 & 12 & 21 \\ -36 & 2 & -14 & -28 \end{bmatrix} \end{equation*}

and we wish to compute \(A^{20}\text{.}\) Normally this would require 19 matrix multiplications, but since \(A\) is diagonalizable, we can simplify the computations substantially.

First, we diagonalize \(A\text{.}\) With

\begin{equation*} S=\begin{bmatrix} 1 & -1 & 2 & -1 \\ -2 & 3 & -3 & 3 \\ 1 & 1 & 3 & 3 \\ -2 & 1 & -4 & 0 \end{bmatrix} \end{equation*}

we find

\begin{align*} D&=\similar{A}{S}\\ &= \begin{bmatrix} -6 & 1 & -3 & -6 \\ 0 & 2 & -2 & -3 \\ 3 & 0 & 1 & 2 \\ -1 & -1 & 1 & 1 \end{bmatrix} \begin{bmatrix} 19 & 0 & 6 & 13 \\ -33 & -1 & -9 & -21 \\ 21 & -4 & 12 & 21 \\ -36 & 2 & -14 & -28 \end{bmatrix} \begin{bmatrix} 1 & -1 & 2 & -1 \\ -2 & 3 & -3 & 3 \\ 1 & 1 & 3 & 3 \\ -2 & 1 & -4 & 0 \end{bmatrix}\\ &= \begin{bmatrix} -1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 2 & 0 \\ 0 & 0 & 0 & 1 \end{bmatrix}\text{.} \end{align*}

Now we find an alternate expression for \(A^{20}\text{,}\)

\begin{align*} A^{20} &=AAA\ldots A\\ &=I_nAI_nAI_nAI_n\ldots I_nAI_n\\ &=\left(S\inverse{S}\right)A\left(S\inverse{S}\right)A\left(S\inverse{S}\right)A\left(S\inverse{S}\right)\ldots \left(S\inverse{S}\right)A\left(S\inverse{S}\right)\\ &=S\left(\inverse{S}AS\right)\left(\inverse{S}AS\right)\left(\inverse{S}AS\right)\ldots \left(\inverse{S}AS\right)\inverse{S}\\ &=SDDD\ldots D\inverse{S}\\ &=SD^{20}\inverse{S}\\ \end{align*}

and since \(D\) is a diagonal matrix, powers are much easier to compute,

\begin{align*} &= S \begin{bmatrix} -1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 2 & 0 \\ 0 & 0 & 0 & 1 \end{bmatrix}^{20} \inverse{S}\\ &= S \begin{bmatrix} (-1)^{20} & 0 & 0 & 0 \\ 0 & (0)^{20} & 0 & 0 \\ 0 & 0 & (2)^{20} & 0 \\ 0 & 0 & 0 & (1)^{20} \end{bmatrix} \inverse{S}\\ &= \begin{bmatrix} 1 & -1 & 2 & -1 \\ -2 & 3 & -3 & 3 \\ 1 & 1 & 3 & 3 \\ -2 & 1 & -4 & 0 \end{bmatrix} \begin{bmatrix} 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 1048576 & 0 \\ 0 & 0 & 0 & 1 \end{bmatrix} \begin{bmatrix} -6 & 1 & -3 & -6 \\ 0 & 2 & -2 & -3 \\ 3 & 0 & 1 & 2 \\ -1 & -1 & 1 & 1 \end{bmatrix}\\ &= \begin{bmatrix} 6291451 & 2 & 2097148 & 4194297 \\ -9437175 & -5 & -3145719 & -6291441 \\ 9437175 & -2 & 3145728 & 6291453 \\ -12582900 & -2 & -4194298 & -8388596 \end{bmatrix}\text{.} \end{align*}

Notice how we effectively replaced the twentieth power of \(A\) by the twentieth power of \(D\text{,}\) and how a high power of a diagonal matrix is just a collection of powers of scalars on the diagonal. The price we pay for this simplification is the need to diagonalize the matrix (by computing eigenvalues and eigenvectors) and finding the inverse of the matrix of eigenvectors. And we still need to do two matrix products. But the higher the power, the greater the savings.

Subsection UT Upper Triangular

When a matrix is diagonalizable the proof of Theorem DC shows that the diagonal entries are the eigenvalues of the matrix. We cannot always diagonalize a matrix, but our next theorem says that we can always find a similarity transformation which creates an upper triangular matrix that has the eigenvalues on the diagonal. You could construe many advanced topics in linear algebra as being how to limit the number of nonzero entries above the diagonal.

Theorem SUT. Similar to Upper Triangular.

Suppose that \(A\) is a square matrix. Then \(A\) is similar to an upper triangular matrix, where each diagonal entry is an eigenvalue of \(A\text{.}\)

Proof.

Suppose that \(A\) has size \(n\text{.}\) We will use induction (Proof Technique I) to establish a basis for \(\complex{n}\) that will form the columns of the matrix used for the similarity transformation. So the upper triangular matrix will not appear until the very end of the proof.

Our induction hypothesis is that every \(A\)-invariant subspace of dimension \(m\) has a basis \(B = \set{\vectorlist{x}{m}}\) such that \(A\vect{x}_k = \vect{x} + \lambda_k\vect{x}_k\text{,}\) \(1\leq k\leq m\text{,}\) where

\(\lambda_k\) is an eigenvalue of \(A\)
\(\displaystyle \vect{x}\in\spn{\set{\vectorlist{x}{k-1}}}\)

Induction proceeds on \(m\text{,}\) the dimension of the \(A\)-invariant subspace.

Base Case.

Let \(U\) be an \(A\)-invariant subspace of dimension \(m=1\text{.}\) Then there is a nonzero vector \(\vect{x}_1\in U\) such that \(B=\set{\vect{x}_1}\) is a basis of \(U\text{.}\) Because \(U\) is \(A\)-invariant, \(A\vect{x}_1\in U = \spn{\set{\vect{x}_1}}\) and we can write

\begin{equation*} A\vect{x}_1 = a\vect{x}_1 = \zerovector + a\vect{x}_1\text{.} \end{equation*}

Thus \(a\) is an eigenvalue of \(A\) and because the span of an empty set is the trivial subspace containing only the zero vector, setting \(\vect{x}=\zerovector\) shows that \(B\) is the required basis.

Induction Step.

Suppose that \(U\) is an \(A\)-invariant subspace of dimension \(m\text{.}\) With slight modifications to Theorem EMHE we can assert the existence of an eigenvector of \(A\) that is in \(U\text{.}\) So let \(\vect{u}_1\in U\) be a vector such that \(A\vect{u}_1=\lambda\vect{u}_1\text{.}\) Now, define a subspace

\begin{equation*} V=\setparts{\left(A-\lambda I\right)\vect{u}}{\vect{u}\in U} \end{equation*}

and determine several properties of \(V\text{.}\)

First, a typical element of \(V\) is

\begin{equation*} \left(A-\lambda I\right)\vect{u} = A\vect{u}-\lambda\vect{u} \end{equation*}

and both terms of this difference are elements of \(U\text{,}\) so \(V\) is a subspace of \(U\text{.}\)

Second, we show that \(V\) is \(A\)-invariant. Grab \(\vect{v}\in V\text{.}\) Then

\begin{equation*} A\vect{v} = \left(A-\lambda I\right)\vect{v} + \lambda\vect{v}\text{.} \end{equation*}

The first term of this sum qualifies as an element of \(V\text{,}\) as does the second, so the sum is in \(V\) and thus \(V\) is \(A\)-invariant.

Third, we consider the dimension of \(V\text{.}\) Extend the linearly independent set \(\set{\vect{u}_1}\) with repeated applications of Theorem ELIS until the set has \(m\) vectors and is a basis of \(U\text{,}\) \(C=\set{\vect{u}_1, \vect{u}_2, \dots, \vect{u}_m}\text{.}\) Then the set

\begin{equation*} D = \set{\left(A-\lambda I\right)\vect{u}_1, \left(A-\lambda I\right)\vect{u}_2, \dots, \left(A-\lambda I\right)\vect{u}_m} \end{equation*}

will span \(V\text{.}\) But the first element of this set is the zero vector, so \(V\) is spanned by a set of \(m-1\) vectors, and by Theorem SSLD the dimension of \(V\) is at most \(m-1\text{.}\)

So \(V\) is an \(A\)-invariant subspace of \(U\) with dimension less than \(m\text{,}\) and we can apply our induction hypothesis. Let \(\vectorlist{v}{\ell}\) be the basis of \(V\) guaranteed by the induction hypothesis. Again, employing Theorem ELIS, extend this basis of \(V\) to a basis of \(U\) with the addition of vectors \(\vectorlist{w}{m-\ell}\) from \(U\text{.}\) We have

\begin{equation*} A\vect{w}_k = \left(A-\lambda I\right)\vect{w}_k + \lambda\vect{w}_k\text{,} \end{equation*}

with \(\left(A-\lambda I\right)\vect{w}_k\in V\text{,}\) where

\begin{equation*} V = \spn{\set{\vectorlist{v}{\ell}}}\subseteq\spn{\set{\vectorlist{v}{\ell}, \vectorlist{w}{k-1}}}\text{.} \end{equation*}

So the basis \(B = \set{\vectorlist{v}{\ell}, \vectorlist{w}{m-\ell}}\) of \(U\) satisfies the conclusion of the induction hypothesis.

Trivially, \(\complex{n}\) is an \(A\)-invariant subspace of \(A\text{.}\) Let \(S\) be the nonsingular matrix whose columns are the vectors \(\vectorlist{x}{n}\) guaranteed by our induction proof. The product \(AS\) will have columns \(A\vect{x}_k\text{,}\) which are then each a linear combination of \(\vectorlist{x}{k-1}\) plus a multiple of \(\vect{x}_k\) by an eigenvalue of \(A\text{.}\) This means \(AS = ST\) where \(T\) is upper triangular and the diagonal entries of \(T\) are eigenvalues of \(A\text{.}\)

Example TBM. Two block matrices.

So sometimes we can diagonalize a matrix and sometimes we cannot. Generalized eigenspaces can get us to a similarity transformation that produces a block diagonal matrix. In the proof of the upcoming Theorem GEB we will see that individual bases for generalized eigenspaces collectively provide a basis for all of \(\complex{n}\text{.}\) This is a forward-looking example, so not everything here is explained by the theorems we have so far.

Continue with the matrix from Example TIS,

\begin{equation*} A=\begin{bmatrix} -8 & 6 & -15 & 9 \\ -8 & 14 & -10 & 18 \\ 1 & 1 & 3 & 0 \\ 3 & -8 & 2 & -11 \end{bmatrix}\text{.} \end{equation*}

In Example TGE we found bases for the generalized eigenspaces for the two eigenvalues,

\begin{align*} X = \geneigenspace{A}{-2} &= \spn{\set{\colvector{-3 \\ -1 \\ 1 \\ 0},\,\colvector{0 \\ -1 \\ 0 \\ 1}}} & W = \geneigenspace{A}{1} &= \spn{\set{\colvector{-7 \\ -2 \\ 3 \\ 0},\,\colvector{-1 \\ -2 \\ 0 \\ 1}}}\text{.} \end{align*}

Without much motivation, and no results yet, use these four basis vectors as the columns of a matrix

\begin{equation*} S=\begin{bmatrix} -3 & 0 & -7 & -1 \\ -1 & -1 & -2 & -2 \\ 1 & 0 & 3 & 0 \\ 0 & 1 & 0 & 1 \end{bmatrix}\text{.} \end{equation*}

In Example TGE we suggested that you check that these four vectors form a basis of \(\complex{4}\text{.}\) So either you already know \(S\) is nonsingular, or you should check that now. We will use \(S\) to effect a similarity transformation of \(A\)

\begin{equation*} \similar{A}{S}= \begin{bmatrix} -1 & -1 & 0 & 0 \\ 1 & -3 & 0 & 0 \\ 0 & 0 & 0 & -1 \\ 0 & 0 & 1 & 2 \end{bmatrix}\text{.} \end{equation*}

Nice! Not diagonal, but all the non-zero elements are confined to \(2\times 2\) submatrices astride the diagonal. This is a direct consequence of having two generalized eigenspaces (two blocks), each of dimension \(2\) (\(2\times 2\) blocks). See Exercise SD.M60 for an extension of this idea and Exercise SD.M61 for a slightly larger example.

Subsection FS Fibonacci Sequences

Example FSCF. Fibonacci sequence, closed form.

The Fibonacci sequence is a sequence of integers defined recursively by

\begin{align*} a_0&=0 & a_1&=1 & a_{n+1}&=a_n+a_{n-1},\quad n\geq 1\text{.} \end{align*}

So the initial portion of the sequence is \(0,\,1,\,1,\,2,\,3,\,5,\,8,\,13,\,21,\,\ldots\text{.}\) In this subsection we will illustrate an application of eigenvalues and diagonalization through the determination of a closed-form expression for an arbitrary term of this sequence.

To begin, verify that for any \(n\geq 1\) the recursive statement above establishes the truth of the statement

\begin{align*} \colvector{a_n\\a_{n+1}} &= \begin{bmatrix}0&1\\1&1\end{bmatrix} \colvector{a_{n-1}\\a_n}\text{.} \end{align*}

Let \(A\) denote this \(2\times 2\) matrix. Through repeated applications of the statement above we have

\begin{align*} \colvector{a_n\\a_{n+1}} & =A\colvector{a_{n-1}\\a_n} =A^2\colvector{a_{n-2}\\a_{n-1}} =A^3\colvector{a_{n-3}\\a_{n-2}} =\cdots =A^n\colvector{a_{0}\\a_{1}}\text{.} \end{align*}

In preparation for working with this high power of \(A\text{,}\) not unlike in Example HPDM, we will diagonalize \(A\text{.}\) The two distinct eigenvalues of \(A\) arise as roots of the polynomial \(x^2-x-1\text{,}\) and are

\begin{align*} \rho&=\frac{1+\sqrt{5}}{2} & \delta&=\frac{1-\sqrt{5}}{2}\text{.} \end{align*}

With two distinct eigenvalues, Theorem DED implies that \(A\) is diagonalizable. It will be easier to compute with these eigenvalues once you confirm the following properties (all but the last can be derived from the fact that \(\rho\) and \(\delta\) are roots of the polynomial, in a factored or unfactored form)

\begin{align*} \rho+\delta&=1 & \rho\delta&=-1 & 1+\rho&=\rho^2 & 1+\delta&=\delta^2 & \rho-\delta&=\sqrt{5}\text{.} \end{align*}

Then eigenvectors of \(A\) (for \(\rho\) and \(\delta\text{,}\) respectively) are

\begin{align*} &\colvector{1\\\rho} & &\colvector{1\\\delta} \end{align*}

which can be easily confirmed, as we demonstrate for the eigenvector for \(\rho\text{,}\)

\begin{align*} \begin{bmatrix}0&1\\1&1\end{bmatrix}\colvector{1\\\rho} & =\colvector{\rho\\1+\rho} =\colvector{\rho\\\rho^2} =\rho\colvector{1\\\rho}\text{.} \end{align*}

From the proof of Theorem DC we know \(A\) can be diagonalized by a matrix \(S\) with these eigenvectors as columns, giving \(D=\inverse{S}AS\text{.}\) We list \(S\text{,}\) \(\inverse{S}\) and the diagonal matrix \(D\text{,}\)

\begin{align*} S&=\begin{bmatrix}1&1\\\rho&\delta\end{bmatrix} & \inverse{S}&=\frac{1}{\rho-\delta}\begin{bmatrix}-\delta&1\\\rho&-1\end{bmatrix} & D&=\begin{bmatrix}\rho&0\\0&\delta\end{bmatrix}\text{.} \end{align*}

OK, we have everything in place now. The main step in the following is to replace \(A\) by \(SD\inverse{S}\text{.}\) Here we go,

\begin{align*} \colvector{a_n\\a_{n+1}} &=A^n\colvector{a_{0}\\a_{1}}\\ &=\left(SD\inverse{S}\right)^n\colvector{a_{0}\\a_{1}}\\ &=SD\inverse{S}SD\inverse{S}SD\inverse{S}\cdots SD\inverse{S}\colvector{a_{0}\\a_{1}}\\ &=SDDD\cdots D\inverse{S}\colvector{a_{0}\\a_{1}}\\ &=SD^n\inverse{S}\colvector{a_{0}\\a_{1}}\\ &= \begin{bmatrix}1&1\\\rho&\delta\end{bmatrix} \begin{bmatrix}\rho&0\\0&\delta\end{bmatrix}^n \frac{1}{\rho-\delta}\begin{bmatrix}-\delta&1\\\rho&-1\end{bmatrix} \colvector{a_{0}\\a_{1}}\\ &= \frac{1}{\rho-\delta} \begin{bmatrix}1&1\\\rho&\delta\end{bmatrix} \begin{bmatrix}\rho^n&0\\0&\delta^n\end{bmatrix} \begin{bmatrix}-\delta&1\\\rho&-1\end{bmatrix} \colvector{0\\1}\\ &= \frac{1}{\rho-\delta} \begin{bmatrix}1&1\\\rho&\delta\end{bmatrix} \begin{bmatrix}\rho^n&0\\0&\delta^n\end{bmatrix} \colvector{1\\-1}\\ &= \frac{1}{\rho-\delta} \begin{bmatrix}1&1\\\rho&\delta\end{bmatrix} \colvector{\rho^n\\-\delta^n}\\ &= \frac{1}{\rho-\delta} \colvector{\rho^n-\delta^n\\\rho^{n+1}-\delta^{n+1}}\text{.} \end{align*}

Performing the scalar multiplication and equating the first entries of the two vectors, we arrive at the closed form expression

\begin{align*} a_n&=\frac{1}{\rho-\delta}\left(\rho^n-\delta^n\right)\\ &=\frac{1}{\sqrt{5}} \left(\left(\frac{1+\sqrt{5}}{2}\right)^n-\left(\frac{1-\sqrt{5}}{2}\right)^n\right)\\ &=\frac{1}{2^n\sqrt{5}} \left(\left(1+\sqrt{5}\right)^n-\left(1-\sqrt{5}\right)^n\right)\text{.} \end{align*}

Notice that it does not matter whether we use the equality of the first or second entries of the vectors, we will arrive at the same formula, once in terms of \(n\) and again in terms of \(n+1\text{.}\) Also, our definition clearly describes a sequence that will only contain integers, yet the presence of the irrational number \(\sqrt{5}\) might make us suspicious. But no, our expression for \(a^n\) will always yield an integer!

The Fibonacci sequence, and generalizations of it, have been extensively studied (Fibonacci lived in the 12th and 13th centuries). There are many ways to derive the closed-form expression we just found, and our approach may not be the most efficient route. But it is a nice demonstration of how diagonalization can be used to solve a problem outside the field of linear algebra.

We close this section with a comment about an important upcoming theorem that we prove in Chapter R. A consequence of Theorem OD is that every Hermitian matrix (Definition HM) is diagonalizable (Definition DZM), and the similarity transformation that accomplishes the diagonalization uses a unitary matrix (Definition UM). This means that for every Hermitian matrix of size \(n\) there is a basis of \(\complex{n}\) that is composed entirely of eigenvectors for the matrix and also forms an orthonormal set (Definition ONS). Notice that for matrices with only real entries, we only need the hypothesis that the matrix is symmetric (Definition SYM) to reach this conclusion ([cross-reference to target(s) "example-ESMS4" missing or not unique]). Can you imagine a prettier basis for use with a matrix? I cannot.

These results in Section OD explain much of our recurring interest in orthogonality, and make the section a high point in your study of linear algebra. A precise statement of this diagonalization result applies to a slightly broader class of matrices, known as “normal” matrices (Definition NRML), which are matrices that commute with their adjoints. With this expanded category of matrices, the result becomes an equivalence (Proof Technique E). See Theorem OD and Theorem OBNM in Section OD for all the details.

Reading Questions SD Reading Questions

1.

What is an equivalence relation?

2.

State a condition that is equivalent to a matrix being diagonalizable, but is not the definition.

3.

Find a diagonal matrix similar to

\begin{equation*} A=\begin{bmatrix} -5 & 8\\-4 & 7 \end{bmatrix}\text{.} \end{equation*}

Exercises SD Exercises

C20.

Consider the matrix \(A\) below. First, show that \(A\) is diagonalizable by computing the geometric multiplicities of the eigenvalues and quoting the relevant theorem. Second, find a diagonal matrix \(D\) and a nonsingular matrix \(S\) so that \(\similar{A}{S}=D\text{.}\) (See Exercise EE.C21 for some of the necessary computations.)

\begin{equation*} A= \begin{bmatrix} 18 & -15 & 33 & -15\\ -4 & 8 & -6 & 6\\ -9 & 9 & -16 & 9\\ 5 & -6 & 9 & -4 \end{bmatrix} \end{equation*}

Solution

Using a calculator, we find that \(A\) has three distinct eigenvalues, \(\lambda=3,\,2,\,-1\text{,}\) with \(\lambda=2\) having algebraic multiplicity two, \(\algmult{A}{2}=2\text{.}\) The eigenvalues \(\lambda=3,\,-1\) have algebraic multiplicity one, and so by Theorem ME we can conclude that their geometric multiplicities are one as well. Together with the computation of the geometric multiplicity of \(\lambda=2\) from Exercise EE.C21, we know

\begin{align*} \geomult{A}{3}&=\algmult{A}{3}=1& \geomult{A}{2}&=\algmult{A}{2}=2& \geomult{A}{-1}&=\algmult{A}{-1}=1\text{.} \end{align*}

This satisfies the hypotheses of Theorem DMFE, and so we can conclude that \(A\) is diagonalizable.

A calculator will give us four eigenvectors of \(A\text{,}\) the two for \(\lambda=2\) being linearly independent presumably. Or, by hand, we could find basis vectors for the three eigenspaces. For \(\lambda=3,\,-1\) the eigenspaces have dimension one, and so any eigenvector for these eigenvalues will be multiples of the ones we use below. For \(\lambda=2\) there are many different bases for the eigenspace, so your answer could vary. Our eigenvectors are the basis vectors we would have obtained if we had actually constructed a basis in Exercise EE.C21 rather than just computing the dimension.

By the construction in the proof of Theorem DC, the required matrix \(S\) has columns that are four linearly independent eigenvectors of \(A\) and the diagonal matrix has the eigenvalues on the diagonal (in the same order as the eigenvectors in \(S\)). Here are the pieces, “doing” the diagonalization,

\begin{equation*} \inverse{ \begin{bmatrix} -1 & 0 & -3 & 6\\ -2 & -1 & -1 & 0\\ 0 & 0 & 1 & -3\\ 1 & 1 & 0 & 1 \end{bmatrix} } \begin{bmatrix} 18 & -15 & 33 & -15\\ -4 & 8 & -6 & 6\\ -9 & 9 & -16 & 9\\ 5 & -6 & 9 & -4 \end{bmatrix} \begin{bmatrix} -1 & 0 & -3 & 6\\ -2 & -1 & -1 & 0\\ 0 & 0 & 1 & -3\\ 1 & 1 & 0 & 1 \end{bmatrix} = \begin{bmatrix} 3 & 0 & 0 & 0\\ 0 & 2 & 0 & 0\\ 0 & 0 & 2 & 0\\ 0 & 0 & 0 & -1 \end{bmatrix}\text{.} \end{equation*}

C21.

Determine if the matrix \(A\) below is diagonalizable. If the matrix is diagonalizable, then find a diagonal matrix \(D\) that is similar to \(A\text{,}\) and provide the invertible matrix \(S\) that performs the similarity transformation. You should use your calculator to find the eigenvalues of the matrix, but try only using the row-reducing function of your calculator to assist with finding eigenvectors.

\begin{equation*} A= \begin{bmatrix} 1 & 9 & 9 & 24 \\ -3 & -27 & -29 & -68 \\ 1 & 11 & 13 & 26 \\ 1 & 7 & 7 & 18 \end{bmatrix} \end{equation*}

Solution

A calculator will provide the eigenvalues \(\lambda=2,\,2,\,1,\,0\text{,}\) so we can reconstruct the characteristic polynomial as

\begin{equation*} \charpoly{A}{x}=(x-2)^2(x-1)x \end{equation*}

so the algebraic multiplicities of the eigenvalues are

\begin{align*} \algmult{A}{2}&=2& \algmult{A}{1}&=1& \algmult{A}{0}&=1\text{.} \end{align*}

Now compute eigenspaces by hand, obtaining null spaces for each of the three eigenvalues by constructing the correct singular matrix (Theorem EMNS),

\begin{align*} A-2I_4&= \begin{bmatrix} -1 & 9 & 9 & 24 \\ -3 & -29 & -29 & -68 \\ 1 & 11 & 11 & 26 \\ 1 & 7 & 7 & 16 \end{bmatrix} \rref \begin{bmatrix} 1 & 0 & 0 & -\frac{3}{2} \\ 0 & 1 & 1 & \frac{5}{2} \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \end{bmatrix}\\ \eigenspace{A}{2}&=\nsp{A-2I_4} =\spn{\set{\colvector{\frac{3}{2}\\-\frac{5}{2}\\0\\1},\,\colvector{0\\-1\\1\\0}}} =\spn{\set{\colvector{3\\-5\\0\\2},\,\colvector{0\\-1\\1\\0}}}\\ A-1I_4&= \begin{bmatrix} 0 & 9 & 9 & 24 \\ -3 & -28 & -29 & -68 \\ 1 & 11 & 12 & 26 \\ 1 & 7 & 7 & 17 \end{bmatrix} \rref \begin{bmatrix} 1 & 0 & 0 & -\frac{5}{3} \\ 0 & 1 & 0 & \frac{13}{3} \\ 0 & 0 & 1 & -\frac{5}{3} \\ 0 & 0 & 0 & 0 \end{bmatrix}\\ \eigenspace{A}{1}&=\nsp{A-I_4} =\spn{\set{\colvector{\frac{5}{3}\\-\frac{13}{3}\\\frac{5}{3}\\1}}} =\spn{\set{\colvector{5\\-13\\5\\3}}}\\ A-0I_4&= \begin{bmatrix} 1 & 9 & 9 & 24 \\ -3 & -27 & -29 & -68 \\ 1 & 11 & 13 & 26 \\ 1 & 7 & 7 & 18 \end{bmatrix} \rref \begin{bmatrix} 1 & 0 & 0 & -3 \\ 0 & 1 & 0 & 5 \\ 0 & 0 & 1 & -2 \\ 0 & 0 & 0 & 0 \end{bmatrix}\\ \eigenspace{A}{0}&=\nsp{A}=\spn{\set{\colvector{3\\-5\\2\\1}}}\text{.} \end{align*}

From this we can compute the dimensions of the eigenspaces to obtain the geometric multiplicities,

\begin{align*} \geomult{A}{2}&=2& \geomult{A}{1}&=1& \geomult{A}{0}&=1\text{.} \end{align*}

For each eigenvalue, the algebraic and geometric multiplicities are equal and so by Theorem DMFE we now know that \(A\) is diagonalizable. The construction in Theorem DC suggests we form a matrix whose columns are eigenvectors of \(A\)

\begin{equation*} S= \begin{bmatrix} 3 & 0 & 5 & 3 \\ -5 & -1 & -13 & -5 \\ 0 & 1 & 5 & 2 \\ 2 & 0 & 3 & 1 \end{bmatrix}\text{.} \end{equation*}

Since \(\detname{S}=-1\neq 0\text{,}\) we know that \(S\) is nonsingular (Theorem SMZD), so the columns of \(S\) are a set of 4 linearly independent eigenvectors of \(A\text{.}\) By the proof of Theorem SMZD we know

\begin{equation*} \similar{A}{S}= \begin{bmatrix} 2 & 0 & 0 & 0 \\ 0 & 2 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 0 \end{bmatrix} \end{equation*}

is a diagonal matrix with the eigenvalues of \(A\) along the diagonal, in the same order as the associated eigenvectors appear as columns of \(S\text{.}\)

C22.

Consider the matrix \(A\) below. Find the eigenvalues of \(A\) using a calculator and use these to construct the characteristic polynomial of \(A\text{,}\) \(\charpoly{A}{x}\text{.}\) State the algebraic multiplicity of each eigenvalue. Find all of the eigenspaces for \(A\) by computing expressions for null spaces, only using your calculator to row-reduce matrices. State the geometric multiplicity of each eigenvalue. Is \(A\) diagonalizable? If not, explain why. If so, find a diagonal matrix \(D\) that is similar to \(A\text{.}\)

\begin{equation*} A= \begin{bmatrix} 19 & 25 & 30 & 5 \\ -23 & -30 & -35 & -5 \\ 7 & 9 & 10 & 1 \\ -3 & -4 & -5 & -1 \end{bmatrix} \end{equation*}

Solution

A calculator will report \(\lambda=0\) as an eigenvalue of algebraic multiplicity of 2, and \(\lambda=-1\) as an eigenvalue of algebraic multiplicity 2 as well. Since eigenvalues are roots of the characteristic polynomial (Theorem EMRCP) we have the factored version

\begin{equation*} \charpoly{A}{x}=(x-0)^2(x-(-1))^2=x^2(x^2+2x+1)=x^4+2x^3+x^2\text{.} \end{equation*}

The eigenspaces are then

\begin{align*} \lambda&=0\\ A-(0)I_4&= \begin{bmatrix} 19 & 25 & 30 & 5 \\ -23 & -30 & -35 & -5 \\ 7 & 9 & 10 & 1 \\ -3 & -4 & -5 & -1 \end{bmatrix} \rref \begin{bmatrix} \leading{1} & 0 & -5 & -5 \\ 0 & \leading{1} & 5 & 4 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \end{bmatrix}\\ \eigenspace{A}{0}&=\nsp{C-(0)I_4}= \spn{\set{\colvector{5\\-5\\1\\0},\,\colvector{5\\-4\\0\\1}}}\\ \lambda&=-1\\ A-(-1)I_4&= \begin{bmatrix} 20 & 25 & 30 & 5 \\ -23 & -29 & -35 & -5 \\ 7 & 9 & 11 & 1 \\ -3 & -4 & -5 & 0 \end{bmatrix} \rref \begin{bmatrix} \leading{1} & 0 & -1 & 4 \\ 0 & \leading{1} & 2 & -3 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \end{bmatrix}\\ \eigenspace{A}{-1}&=\nsp{C-(-1)I_4}= \spn{\set{\colvector{1\\-2\\1\\0},\,\colvector{-4\\3\\0\\1}}}\text{.} \end{align*}

Each eigenspace above is described by a spanning set obtained through an application of Theorem BNS and so is a basis for the eigenspace. In each case the dimension, and therefore the geometric multiplicity, is 2.

For each of the two eigenvalues, the algebraic and geometric multiplicities are equal. Theorem DMFE says that in this situation the matrix is diagonalizable. We know from Theorem DC that when we diagonalize \(A\) the diagonal matrix will have the eigenvalues of \(A\) on the diagonal (in some order). So we can claim that

\begin{equation*} D= \begin{bmatrix} 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & -1 & 0 \\ 0 & 0 & 0 & -1 \end{bmatrix}\text{.} \end{equation*}

M60.

This exercise will guide you through an even better analysis of the matrix used in Example TIS, Example TGE, and Example TBM.

(a)

Choose almost any vector from \(X = \geneigenspace{A}{-2}\) and call it \(\vect{x}_2\text{,}\) and choose almost any vector from \(W = \geneigenspace{A}{1}\) and call it \(\vect{x}_4\text{.}\) The only restriction is that these vectors should not be traditional eigenvectors. (What might be an easy way to check that?) Yes, the subscripts look odd, stay with us.

(b)

Define and compute

\begin{align*} \vect{x}_1 &= \left(A-(-2)I_4\right)\vect{x}_2 & \vect{x}_3 &= \left(A-1I_4\right)\vect{x}_4\text{.} \end{align*}

(c)

Verify, both symbolically and computationally that \(\vect{x}_1\) and \(\vect{x}_3\) are eigenvectors of \(A\text{.}\) So \(A\vect{x}_1\) and \(A\vect{x}_3\) will just be scalar multiples of \(\vect{x}_1\) and \(\vect{x}_3\) (respectively).

Solution

Symbolically, since \(\vect{x_2}\) is a generalized eigenvector,

\begin{equation*} \left(A-(-2)I_4\right)\vect{x}_1 = \left(A-(-2)I_4\right)\left(A-(-2)I_4\right)\vect{x}_2 = \left(A-(-2)I_4\right)^2\vect{x}_2 = \zerovector \end{equation*}

which by Theorem EMNS shows that \(\vect{x}_1\) is an eigenvector.

(d)

Find a simple expression (symbolically) for \(A\vect{x}_2\) and \(A\vect{x}_4\text{.}\)

Solution

Symbolically, we have,

\begin{equation*} \vect{x}_1 = \left(A-(-2)I_4\right)\vect{x}_2 = A\vect{x}_2 - (-2)\vect{x}_2 \end{equation*}

which can be rearranged as

\begin{equation*} A\vect{x}_2 = \vect{x}_1 + (-2)\vect{x}_2\text{.} \end{equation*}

So \(A\vect{x}_2\) and \(A\vect{x}_4\) are almost, almost, almost, scalar multiples of \(\vect{x}_2\) and \(\vect{x}_4\) (respectively).

(e)

Define \(S\) to be the matrix whose columns are \(\vect{x}_1\text{,}\) \(\vect{x}_2\text{,}\) \(\vect{x}_3\text{,}\) and \(\vect{x}_4\text{.}\) Check that \(S\) is nonsingular and use \(S\) in a similarity transformation of \(A\text{.}\)

No matter what vectors you started with, you should get an upper triangular matrix, with the eigenvalues on the diagonal (see Theorem SUT). Above the diagonal there should be only two nonzero entries, which should both be \(1\text{,}\) and both should be just above the diagonal:

\begin{equation*} \similar{A}{S} = \begin{bmatrix} -2 & 1 & 0 & 0 \\ 0 & -2 & 0 & 0 \\ 0 & 0 & 1 & 1 \\ 0 & 0 & 0 & 1 \end{bmatrix}\text{.} \end{equation*}

(f)

Explain the various points in this exercise when the key features of this simple matrix became apparent. For example, this is very close to being a diagonal matrix, but where did those pesky ones come from? The matrix you created is known as the Jordan canonical form of a matrix.

M61.

Use ideas from the previous exercise (Exercise SD.M60) to find a matrix similar to \(A\) that is very nearly diagonal. Ideally you will have just three non-zero entries in the ten entries above the diagonal. You will need to guess and experiment about how to slightly generalize the previous exercise for a \(4\times 4\) matrix in the case of this \(5\times 5\) matrix.

\begin{equation*} A = \begin{bmatrix} 4 & 0 & 0 & 2 & -2 \\ 8 & 6 & -7 & -6 & 13 \\ 7 & 2 & -2 & -2 & 3 \\ -4 & -2 & 3 & 5 & -9 \\ -1 & 0 & 1 & 1 & -3 \end{bmatrix} \end{equation*}

T10.

Provide the proof by induction (Proof Technique I) that is missing from the very beginning of the proof of Theorem PSMS.

T15.

Suppose that \(A\) and \(B\) are similar matrices of size \(n\text{.}\) Prove that \(A^3\) and \(B^3\) are similar matrices. Generalize.

Solution

By Definition SIM we know that there is a nonsingular matrix \(S\) so that \(AS=SB\text{.}\) Then

\begin{equation*} A^3S = A^2AS = A^2SB = AASB = ASBB = ASB^2 = SBB^2 = SB^3 \end{equation*}

So \(A^3\) is similar to \(B^3\) (via the matrix \(S\)).

More generally, if \(A\) is similar to \(B\text{,}\) and \(m\) is a non-negative integer, then \(A^m\) is similar to \(B^m\text{.}\) This can be proved carefully using induction (Proof Technique I).

T16.

Suppose that \(A\) and \(B\) are similar matrices, with \(A\) nonsingular. Prove that \(B\) is nonsingular, and that \(\inverse{A}\) is similar to \(\inverse{B}\text{.}\)

Solution

There is a nonsingular matrix \(S\) such that \(AS=SB\text{.}\) With our hypothesis that \(A\) is nonsingular, Theorem NPNF says \(AS\) is nonsingular. Then \(SB\) is nonsingular, and the “other half” of Theorem NPNF says \(B\) is nonsingular.

With \(B\) nonsingular, Theorem NI allows us to employ \(\inverse{B}\text{.}\) Use Theorem SS twice to see

\begin{equation*} \inverse{B}\inverse{S} = \inverse{(SB)} = \inverse{(AS)} = \inverse{S}\inverse{A} \end{equation*}

So by Definition SIM, \(\inverse{A}\) is similar to \(\inverse{B}\text{.}\)

T17.

Suppose that \(B\) is a nonsingular matrix. Prove that \(AB\) is similar to \(BA\text{.}\)

Solution

The nonsingular matrix \(B\) will provide the desired similarity transformation,

\begin{equation*} \left(BA\right)B = B\left(AB\right)\text{.} \end{equation*}

Done. That was almost too easy!