FCLA Multiplicity of an Eigenvalue

Section ME Multiplicity of an Eigenvalue

We have defined two types of multiplicities, the geometric and the algebraic, as dimensions of null spaces. We will see in this section that the algebraic multiplicity is the more interesting of the two, and the more important. When you hear someone say simply “the multiplicity of an eigenvalue,” they almost surely intend for multiplicity to mean the algebraic multiplicity.

Subsection AM Algebraic Multiplicity

Theorem SUT said that any square matrix is similar to some matrix that is upper triangular and has eigenvalues on the diagonal. The next theorem says (among other things) that when a matrix is similar to an upper triangular matrix then each eigenvalue must appear on the diagonal. And by Theorem TME each diagonal entry is an eigenvalue.

Theorem UTEC. Upper Triangular Eigenvalue Count.

Suppose that \(A\) is a square matrix of size \(n\) that is similar to an upper triangular matrix \(T\text{.}\) Then the number of times \(\lambda\) occurs on the diagonal of \(T\) is the algebraic multiplicity of \(\lambda\text{,}\) \(\algmult{A}{\lambda}\text{.}\)

Proof.

We will prove a simpler version of the theorem, which will then allow us to obtain the more general result easily. We will show that the number of times zero occurs on the diagonal of \(T\) is equal to the dimension of \(\nsp{A^n}\text{.}\) We proceed by induction on \(n\) (Proof Technique I).

Base Case.

When \(A\) is a \(1\times 1\) matrix, the number zero is on the diagonal zero times or one time. In the first case, the matrix is nonsingular, so \(\nsp{A^1}\) is trivial and its dimension is zero. In the second case, the matrix is the zero matrix and \(\nsp{A^1}=\complex{1}\) with dimension one—equal to the number of zeros on the diagonal of \(A\text{.}\)

Induction Step.

The similarity of \(A\) and \(T\) implies there is a nonsingular matrix \(S\) such that \(AS=ST\text{.}\) Let \(\vectorlist{x}{n}\) be the columns of \(S\text{.}\) In order to apply the induction step, we adjust this matrix equality by removing the last column from the matrix on each side of the equality. Specifically, define \(\widehat{S}\) to be the \(n\times n-1\) matrix whose columns are \(\vectorlist{x}{n-1}\text{.}\) Define \(\widehat{T}\) to be the \(n-1\times n-1\) matrix obtained from \(T\) be removing the last column and the last row. \(T\) is upper triangular so all but the last entry of the last row is zero, so the \(n-1\) columns of \(\widehat{S}\widehat{T}\) equal the first \(n-1\) columns of \(ST\text{,}\) and we can write a key equation,

\begin{equation*} A\widehat{S}=\widehat{S}\widehat{T}\text{.} \end{equation*}

And a key consequence is,

\begin{equation*} A^n\widehat{S} = A^{n-1}A\widehat{S} = A^{n-1}\widehat{S}\widehat{T} = A^{n-2}A\widehat{S}\widehat{T} = A^{n-2}\widehat{S}\widehat{T}\widehat{T} = A^{n-2}\widehat{S}\widehat{T}^2 =\cdots = \widehat{S}\widehat{T}^n\text{.} \end{equation*}

Note that \(\widehat{T}\) is an \(n-1\times n-1\) upper triangular matrix which is similar to itself (Theorem SER). So applying the induction hypothesis, we see that the number of zeros on the diagonal of \(\widehat{T}\) is equal to the dimension of \(\nsp{\widehat{T}^{n-1}}\text{.}\) By Theorem NSPM, and since \(\widehat{T}\) has size \(n-1\) we also have \(\dimension{\nsp{\widehat{T}^{n-1}}} = \dimension{\nsp{\widehat{T}^{n}}}\text{.}\)

Let \(U=\spn{\vectorlist{x}{n-1}}\text{,}\) which is the column space of \(\widehat{S}\) (A-invariant noted?). We claim that

\begin{equation*} \dimension{U\cap\nsp{A^n}} = \dimension{\nsp{\widehat{T}^{n}}}\text{.} \end{equation*}

Let \(\set{\vectorlist{w}{k}}\) be a basis for the subspace \(U\cap\nsp{A^n}\) of \(\complex{n}\text{.}\) Because \(\vect{w}_i\in U\) there is a vector \(\vect{v}_i\in\complex{n-1}\) such that \(\vect{w}_i=\widehat{S}\vect{v}_i\text{.}\) It is an exercise to prove that \(\set{\vectorlist{v}{k}}\) is a linearly independent set. Now,

\begin{equation*} \zerovector = A^n\vect{w}_i = A^n\widehat{S}\vect{v}_i = \widehat{S}\widehat{T}^n\vect{v}_i\text{.} \end{equation*}

Because the columns of \(\widehat{S}\) are linearly inependent, we conclude that \(\widehat{T}^n\vect{v}_i = \zerovector\) and thus \(\vect{v}_i\in\nsp{\widehat{T}^n}\text{.}\) So we have found at least \(k\) linearly independent vectors in \(\nsp{\widehat{T}^n}\) and thus

\begin{equation*} \dimension{U\cap\nsp{A^n}} = k \leq \dimension{\nsp{\widehat{T}^{n}}}\text{.} \end{equation*}

For the opposite inequality, let \(\set{\vectorlist{y}{\ell}}\) be a basis for the subspace \(\nsp{\widehat{T}^{n}}\) of \(\complex{n-1}\text{.}\) By a very similar exercise to the previous one, \(\set{\widehat{S}\vect{y}_1,\,\widehat{S}\vect{y}_2,\,\widehat{S}\vect{y}_3,\dots,\widehat{S}\vect{y}_\ell}\) is a linearly independent set. Note that each vector is a linear combination of the columns of \(\widehat{S}\) and so an element of \(U\text{.}\) Also

\begin{equation*} A^n\left(\widehat{S}\vect{y}_i\right) = A^n\widehat{S}\vect{y}_i = \widehat{S}\widehat{T}^n\vect{y}_i = \widehat{S}\zerovector = \zerovector\text{.} \end{equation*}

So each vector is also in \(\nsp{A^n}\text{.}\) Therefore we have found \(\ell\) linearly independent vectors in the subspace \(U\cap\nsp{A^n}\text{,}\) so

\begin{equation*} \dimension{U\cap\nsp{A^n}} \geq \ell = \dimension{\nsp{\widehat{T}^{n}}}\text{.} \end{equation*}

\begin{align*} \dimension{\nsp{A^n}}&=\dimension{U} + \dimension{\nsp{A^n}} - \dimension{U}\\ &=\dimension{U\cap\nsp{A^n}} + \dimension{U + \nsp{A^n}} - \dimension{U}&&\knowl{./knowl/theorem-SID.html}{\text{Theorem SID}}\\ &=\dimension{\nsp{\widehat{T}^{n}}} + \dimension{U + \nsp{A^n}} - \dimension{U} \end{align*}

So we are close! Let's take stock of where we are. We know \(U\) has dimension \(n-1\text{.}\) We do not know much yet about the subspace \(U + \nsp{A^n}\)—however its dimension is limited to two possibilities: \(n-1\) or \(n\text{.}\) So \(\dimension{U + \nsp{A^n}} - \dimension{U}\) is \(0\) or \(1\text{.}\) We know \(\dimension{\nsp{\widehat{T}^{n}}}\) counts zeros on the diagonal of \(\widehat{T}\text{.}\) There is only the final diagonal entry of \(T\) to consider: the entry is nonzero and the count increases by \(0\text{,}\) or the entry is zero and the count increases by \(1\text{.}\) We finish the proof by showing that these two zero-one scenarios are the same.

First, suppose the final entry of the diagonal of \(T\) is nonzero and equal to \(a\text{.}\) We claim that \(\nsp{A^n}\subseteq U\text{.}\) The final diagonal entry of \(T^n\) is \(a^n\text{,}\) and since \(A^nS = ST^n\text{,}\) there is a \(\vect{u}\in U\) such that

\begin{equation*} A^n\vect{x}_n = \vect{u} + a^n\vect{x}_n\text{.} \end{equation*}

Grab an arbitrary \(\vect{z}\in\nsp{A^n}\) and since the columns of \(S\) are a basis, we can write \(\vect{z}=\widehat{\vect{u}} + b\vect{x}_n\text{,}\) where \(\widehat{\vect{u}}\in U\text{.}\) Now

\begin{equation*} \zerovector = A^n\vect{z} = A^n\widehat{\vect{u}} + bA^n\vect{x}_n = A^n\widehat{\vect{u}} + \vect{u} + ba^n\vect{x}_n\text{.} \end{equation*}

Because \(U\) is an \(A\)-invariant subspace, the first term is a vector in \(U\text{,}\) and so is the second. Additive closure for \(U\text{,}\) plus a rearrangement, makes it clear that \(ba^n\vect{x}_n\in U\text{.}\) This is only possible if this vector is the zero vector, and since we are assuming \(a\neq 0\text{,}\) we conclude that \(b=0\text{.}\) Thus \(\vect{z}=\widehat{\vect{u}}\in U\text{,}\) establishing our claim, and we see that \(U + \nsp{A^n} = U\text{.}\) So,

\begin{align*} \dimension{\nsp{A^n}} &= \dimension{\nsp{\widehat{T}^{n}}} + \dimension{U + \nsp{A^n}} - \dimension{U}\\ &= \dimension{\nsp{\widehat{T}^{n}}} + \dimension{U} - \dimension{U}\\ &= \dimension{\nsp{\widehat{T}^{n}}}\\ &= \text{number of zeros on diagonal of }\widehat{T}\\ &= \text{number of zeros on diagonal of }T\text{.} \end{align*}

Second, suppose the final entry of the diagonal of \(T\) is zero. We claim that \(U + \nsp{A^n} = \complex{n}\text{,}\) which will be true if we can find a vector in \(\nsp{A^n}\) that lies outside of \(U\text{.}\) The last column of the equality \(AS = ST\text{,}\) together with the form of the last column of \(T\) in this case, implies that \(A\vect{x}_n\) is a linear combination of the first \(n-1\) columns of \(S\text{.}\) So there is a vector \(\vect{y}\in\complex{n-1}\) such that \(A\vect{x}_n = \widehat{S}\vect{y}\text{.}\) \(\nullity{\widehat{T}^{n-1}} = \nullity{\widehat{T}^n}\text{,}\) so \(\rank{\widehat{T}^{n-1}} = \rank{\widehat{T}^n}\) by Theorem RPNC. Null spaces of powers of a matrix increase in size, while column spaces of powers decrease in the same manner (see Exercise IS.T31), and since \(\widehat{T}\) is a matrix of size \(n-1\text{,}\) we have \(\csp{\widehat{T}^{n-1}}=\csp{\widehat{T}^n}\text{.}\) \(\widehat{T}^{n-1}\vect{y}\in\csp{\widehat{T}^{n-1}}=\csp{\widehat{T}^n}\text{.}\) So there is a vector \(\vect{w}\in\complex{n-1}\) such that \(\widehat{T}^{n-1}\vect{y} = \widehat{T}^n\vect{w}\text{.}\) The vector we are after is \(\vect{z}=\widehat{S}\vect{w} - \vect{x}_n\text{,}\) as we will now show. We see that \(\widehat{S}\vect{w}\) is in \(U\) and \(\vect{x}_n\) is not in \(U\text{,}\) so it would be a contradiction if \(\vect{z}\) were in \(U\text{,}\) so \(\vect{z}\not\in U\text{.}\) Consider

\begin{align*} A^n\vect{z} &= A^n\widehat{S}\vect{w} - A^n\vect{x}_n = \widehat{S}\widehat{T}^n\vect{w} - A^n\vect{x}_n = \widehat{S}\widehat{T}^{n-1}\vect{y} - A^n\vect{x}_n\\ &= A^{n-1}\widehat{S}\vect{y} - A^n\vect{x}_n = A^{n-1}A\vect{x_n} - A^n\vect{x}_n = A^n\vect{x_n} - A^n\vect{x}_n = \zerovector\text{.} \end{align*}

So there is an element of \(\nsp{A^n}\) outside of \(U\text{,}\) so the dimension of \(U + \nsp{A^n}\) must be strictly greater than the dimension of \(U\text{,}\) which only leaves \(\dimension{U + \nsp{A^n}} = n\) and so \(U + \nsp{A^n} = \complex{n}\text{.}\) We have,

\begin{align*} \dimension{\nsp{A^n}} &= \dimension{\nsp{\widehat{T}^{n}}} + \dimension{U + \nsp{A^n}} - \dimension{U}\\ &= \dimension{\nsp{\widehat{T}^{n}}} + n - (n-1)\\ &= \dimension{\nsp{\widehat{T}^{n}}} + 1\\ &= \text{number of zeros on diagonal of }\widehat{T} + 1\\ &= \text{number of zeros on diagonal of }T\text{.} \end{align*}

So for the two possibilities for \(\dimension{U + \nsp{A^n}}\text{,}\) \(n-1\) and \(n\text{,}\) we have the same conclusion: the number of zeros on the diagonal of \(T\) is equal to the dimension of the null space of \(A^n\text{.}\)

We have been simply counting zeros on the diagonal, without considering whether or not they are even eigenvalues. It is a simple matter to get back to eigenvalues. For an eigenvalue \(\lambda\) of \(A\text{,}\) the upper triangular matrix \(T - \lambda I_n\) is similar to \(A-\lambda I_n\) by Theorem PSMS with the polynomial \(p(x)=x-\lambda\text{.}\) So we can apply the above results to \(A-\lambda I_n\) and \(T-\lambda I_n\text{.}\) The number of times an eigenvalue of \(A\) appears on the diagonal of \(T\) is the number of times zero appears on the diagonal of \(T - \lambda I_n\text{,}\) which is the dimension of the null space of \(\left(A - \lambda I_n\right)^n\text{,}\) the algebraic multiplicity of \(\lambda\) for \(A\text{.}\)

Theorem NEM. Number of Eigenvalues of a Matrix.

Suppose that \(\scalarlist{\lambda}{k}\) are the distinct eigenvalues of a square matrix \(A\) of size \(n\text{.}\) Then

\begin{equation*} \sum_{i=1}^{k}\algmult{A}{\lambda_i}=n\text{.} \end{equation*}

Proof.

By Theorem SUT, \(A\) is similar to an upper triangular matrix \(T\text{.}\) By Theorem SMEE, the eigenvalues of \(A\) and \(T\) are the same. The eigenvalues of \(T\) (and thus \(A\)) are the diagonal entries of \(T\) by Theorem TME, and each distinct eigenvalue is repeated as many times as its algebraic multiplicity for \(A\text{,}\) according to Theorem UTEC. So \(\sum_{i=1}^{k}\algmult{A}{\lambda_i}\) counts the number of entries on the diagonal of \(T\text{,}\) which is \(n\) for an \(n\times n\) matrix.

Theorem NEM is often stated as “The number of eigenvalues of a matrix, including multiplicities, is the size of the matrix.” When we want to ignore the multiplicity of the eigenvalues, we tend to qualify a statement with “distinct.”

Subsection GEB Generalized Eigenvector Basis

Generalized eigenspaces provide a basis of \(\complex{n}\text{,}\) which can be used to provide a similar matrix which is “close” to being diagonal. This is where we think of generalized eigenvectors as being more general than traditional eigenvectors. For any matrix there is always a basis of generalized eigenvectors. We first saw this in Example TGE.

Theorem GEB. Generalized Eigenvector Basis.

Suppose that \(A\) is a square matrix of size \(n\text{.}\) Then there is a basis of \(\complex{n}\) composed entirely of generalized eigenvectors of \(A\text{.}\)

Proof.

The proof has three distinct parts. The first two are generalizations of previous results, and could be split out as theorems of their own.

Generalized eigenspaces are disjoint.

Informally, we claim a generalized eigenvector cannot serve two eigenvalues. More precisely, suppose \(\lambda\) and \(\rho\) are two different eigenvalues of \(A\text{.}\) Then \(\geneigenspace{A}{\lambda}\cap\geneigenspace{A}{\rho} = \set{\zerovector}\text{.}\) The more specific result for traditional eigenvectors is Exercise PEE.T20.

For a proof by contradiction (Proof Technique CD), suppose that

\begin{equation*} \vect{v}\in\geneigenspace{A}{\lambda}\cap\geneigenspace{A}{\rho},\quad\vect{v}\neq\zerovector\text{.} \end{equation*}

Then there is a minimal \(k\) such that \(\left(A - \lambda I_n\right)^k\vect{v} = \zerovector\text{.}\) Since \(\vect{v}\neq\zerovector\text{,}\) \(k\geq 1\text{.}\) Define

\begin{equation*} \vect{w} = \left(A - \lambda I_n\right)^{k-1}\vect{v}\text{.} \end{equation*}

The minimality of \(k\) implies that \(\vect{w}\neq\zerovector\text{.}\)

Now, since

\begin{equation*} \left(A - \lambda I_n\right)\vect{w} = \left(A - \lambda I_n\right)\left(A - \lambda I_n\right)^{k-1}\vect{v} = \left(A - \lambda I_n\right)^k\vect{v} = \zerovector \end{equation*}

we see that \(\vect{w}\) is a traditional eigenvector of \(A\) for \(\lambda\text{.}\)

We claim \(\vect{w}\) is a generalized eigenvector of \(A\) for \(\rho\text{.}\) We establish by induction a more general claim:

\begin{equation*} \vect{w}_i = \left(A - \lambda I_n\right)^i\vect{v}\in\geneigenspace{A}{\rho}, \quad i\geq 0\text{.} \end{equation*}

For the base case, when \(i=0\text{,}\) \(\vect{v}\in\geneigenspace{A}{\rho}\) follows from our choice of \(\vect{v}\text{.}\) More generally,

\begin{align*} \vect{w}_i &= \left(A - \lambda I_n\right)^i\vect{v}\\ &= \left(A - \lambda I_n\right)\left(A - \lambda I_n\right)^{i-1}\vect{v}\\ &= \left(A - \lambda I_n\right)\vect{w}_{i-1}\\ &= A\vect{w}_{i-1} - \lambda\vect{w}_{i-1} \end{align*}

By induction, \(\vect{w}_{i-1}\in\geneigenspace{A}{\rho}\text{,}\) which is an \(A\)-invariant subspace (Theorem GEIS), and so \(A\vect{w}_{i-1}\in\geneigenspace{A}{\rho}\text{.}\) By scalar closure, \(\lambda\vect{w}_{i-1}\in\geneigenspace{A}{\rho}\text{.}\) So additive closure then shows that \(\vect{w}_i\in\geneigenspace{A}{\rho}\text{,}\) as desired.

Since \(\vect{w}\) is a generalized eigenvector of \(A\) for \(\rho\text{,}\)

\begin{align*} \zerovector &= \left(A - \rho I_n\right)^n\vect{w}\\ &= \left(A - \rho I_n\right)^{n-1}\left(A - \rho I_n\right)\vect{w}\\ &= \left(A - \rho I_n\right)^{n-1}\left(A\vect{w} - \rho\vect{w}\right)&&\knowl{./knowl/theorem-MMDAA.html}{\text{Theorem MMDAA}}\\ &= \left(A - \rho I_n\right)^{n-1}\left(\lambda\vect{w} - \rho\vect{w}\right)&&\knowl{./knowl/definition-EEM.html}{\text{Definition EEM}}\\ &= \left(A - \rho I_n\right)^{n-1}\left(\lambda - \rho\right)\vect{w}&&\knowl{./knowl/property-DSA.html}{\text{Property DSA}}\\ &= \left(\lambda - \rho\right)\left(A - \rho I_n\right)^{n-1}\vect{w}&&\knowl{./knowl/theorem-MMSMM.html}{\text{Theorem MMSMM}}\\ &= \left(\lambda - \rho\right)^2\left(A - \rho I_n\right)^{n-2}\vect{w}\\ &= \left(\lambda - \rho\right)^3\left(A - \rho I_n\right)^{n-3}\vect{w}\\ &\quad\quad\vdots\\ &= \left(\lambda - \rho\right)^n\vect{w} \end{align*}

Since \(\lambda\) and \(\rho\) are by hypothesis different, \(\left(\lambda - \rho\right)^n\neq 0\text{,}\) and we argued above that \(\vect{w}\neq\zerovector\text{.}\) So the previous equation and Theorem SMEZV gives the desired contradiction.

Generalized eigenvectors for distinct eigenvalues are linearly independent.

This is a generalization of Theorem EDELI. Suppose that \(S=\set{\vectorlist{x}{p}}\) is a set of generalized eigenvectors of \(A\) with eigenvalues \(\scalarlist{\lambda}{p}\) such that \(\lambda_i\neq\lambda_j\) whenever \(i\neq j\text{.}\) Then we claim that \(S\) is a linearly independent set.

We argue by induction on \(k\) (Proof Technique I) with the induction hypothesis that any set of \(k\) generalized eigenvectors of \(A\text{,}\) \(\set{\vectorlist{x}{k}}\text{,}\) for different eigenvalues \(\scalarlist{\lambda}{k}\text{,}\) is linearly independent.

When \(k=1\text{,}\) the set \(\set{\vect{x}_1}\) is linearly independent since eigenvectors are nonzero (Definition EEM).

For the induction step begin with a relation of linear dependence

\begin{equation*} \lincombo{a}{x}{k}=\zerovector \end{equation*}

and then

\begin{align*} \zerovector &= \left(A - \lambda_k I_n\right)^n\zerovector\\ &= \left(A - \lambda_k I_n\right)^n\left(\lincombo{a}{x}{k}\right)\\ &= \left(A - \lambda_k I_n\right)^n a_1\vect{x}_1 + \cdots + \left(A - \lambda_k I_n\right)^n a_k\vect{x}_k\\ &= a_1\left(A - \lambda_k I_n\right)^n \vect{x}_1 + \cdots + a_k\left(A - \lambda_k I_n\right)^n \vect{x}_k\\ &= a_1\left(A - \lambda_k I_n\right)^n \vect{x}_1 + \cdots + a_{k-1}\left(A - \lambda_k I_n\right)^n \vect{x}_{k-1} + a_k\zerovector\\ &= a_1\left(A - \lambda_k I_n\right)^n \vect{x}_1 + \cdots + a_{k-1}\left(A - \lambda_k I_n\right)^n \vect{x}_{k-1} \end{align*}

Since \(\vect{x}_i\text{,}\) \(1\leq i\leq k-1\text{,}\) is not a generalized eigenvector for \(\lambda_k\text{,}\) we have \(\left(A - \lambda_k I_n\right)^n \vect{x}_i\neq\zerovector\text{.}\) Furthermore, we claim each of these nonzero vectors is a generalized eigenvector for \(\lambda_i\text{.}\) To wit,

\begin{align*} &\left(A - \lambda_i I_n\right)^n \left(A - \lambda_k I_n\right)^n \vect{x}_i\\ &\quad\quad = \left(A - \lambda_k I_n\right)^n \left(A - \lambda_i I_n\right)^n \vect{x}_i&&\knowl{./knowl/theorem-PMC.html}{\text{Theorem PMC}}\\ &\quad\quad = \left(A - \lambda_k I_n\right)^n \zerovector\\ &\quad\quad = \zerovector \end{align*}

and so \(\left(A - \lambda_k I_n\right)^n \vect{x}_i\in\geneigenspace{A}{\lambda_i}\text{.}\) So above we have a relation of linear dependence on the set \(\set{\left(A - \lambda_k I_n\right)^n \vect{x}_1, \ldots, \left(A - \lambda_k I_n\right)^n \vect{x}_{k-1}}\text{,}\) which is a set of \(k-1\) generalized eigenvectors for distinct eigenvalues, and hence by the induction hypothesis is linearly independent. By Definition LI, \(a_1=a_2=\cdots=a_{k-1}=0\text{.}\) The original relation of linear dependence then simplifies to \(a_k\vect{x}_k = \zerovector\text{.}\) Because \(\vect{x}_k\neq\zerovector\text{,}\) Theorem SMEZV implies \(a_k=0\text{.}\) So the only relation of linear dependence on the set of \(k\) generalized eigenvectors is trivial, so by Definition LI, the set is linearly independent.

A basis of generalized eigenvectors.

Let \(\scalarlist{\lambda}{k}\) be a complete list of the distinct eigenvalues of \(A\text{.}\) For each generalized eigenvalue, and for convenience, let \(m_i = \algmult{A}{\lambda_i}\text{.}\) For each generalized eigenspace choose a basis

\begin{equation*} B_i = \set{\vect{x}_{i1},\,\vect{x}_{i2},\,\vect{x}_{i3},\,\ldots,\,\vect{x}_{im_{i}}}, \quad 1\leq i\leq k \end{equation*}

and define our putative basis of \(\complex{n}\) as

\begin{equation*} B = B_1 \cup B_2 \cup B_3 \cup \cdots \cup B_k\text{.} \end{equation*}

Because a vector cannot simultaneously be a generalized eigenvector for two different eigenvalues, and employing Theorem NEM, the number of vectors in \(B\) is given by

\begin{equation*} \card{B} = \sum_{i=1}^k\card{B_i} = \sum_{i=1}^k\algmult{A}{\lambda_i} = n\text{.} \end{equation*}

So we have a set of \(n\) vectors from \(\complex{n}\) and Theorem G tells us \(B\) will be a basis of \(\complex{n}\) if we can show that \(B\) is linearly independent.

So we form a relation of linear dependence on \(B\text{,}\) using doubly-subscripted scalars and generalized eigenvectors,

\begin{align*} \zerovector= &\left(a_{11}\vect{x}_{11}+a_{12}\vect{x}_{12}+a_{13}\vect{x}_{13}+\cdots+a_{1{m_1}}\vect{x}_{1{m_1}}\right)+\\ &\left(a_{21}\vect{x}_{21}+a_{22}\vect{x}_{22}+a_{23}\vect{x}_{23}+\cdots+a_{2{m_2}}\vect{x}_{2{m_2}}\right)+\\ &\left(a_{31}\vect{x}_{31}+a_{32}\vect{x}_{32}+a_{33}\vect{x}_{33}+\cdots+a_{3{m_3}}\vect{x}_{3{m_3}}\right)+\\ &\quad\quad\vdots\\ &\left(a_{k1}\vect{x}_{k1}+a_{k2}\vect{x}_{k2}+a_{k3}\vect{x}_{k3}+\cdots+a_{k{m_k}}\vect{x}_{k{m_k}}\right)\text{.} \end{align*}

Define the vectors \(\vect{y}_i\text{,}\) \(1\leq i\leq k\) by

\begin{align*} \vect{y}_1&=\left(a_{11}\vect{x}_{11}+a_{12}\vect{x}_{12}+a_{13}\vect{x}_{13}+\cdots+a_{1{m_1}}\vect{x}_{1{m_1}}\right)\\ \vect{y}_2&=\left(a_{21}\vect{x}_{21}+a_{22}\vect{x}_{22}+a_{23}\vect{x}_{23}+\cdots+a_{2{m_2}}\vect{x}_{2{m_2}}\right)\\ \vect{y}_3&=\left(a_{31}\vect{x}_{31}+a_{32}\vect{x}_{32}+a_{33}\vect{x}_{33}+\cdots+a_{3{m_3}}\vect{x}_{3{m_3}}\right)\\ &\quad\quad\vdots\\ \vect{y}_k&=\left(a_{k1}\vect{x}_{k1}+a_{k2}\vect{x}_{k2}+a_{k3}\vect{x}_{k3}+\cdots+a_{k{m_k}}\vect{x}_{k{m_k}}\right)\text{.} \end{align*}

Then the relation of linear dependence becomes

\begin{align*} \zerovector&=\vect{y}_1+\vect{y}_2+\vect{y}_3+\cdots+\vect{y}_k\text{.} \end{align*}

Since the generalized eigenspace \(\geneigenspace{A}{\lambda_i}\) is closed under vector addition and scalar multiplication, \(\vect{y}_i\in\geneigenspace{A}{\lambda_i}\text{,}\) \(1\leq i\leq k\text{.}\) Thus, for each \(i\text{,}\) the vector \(\vect{y}_i\) is a generalized eigenvector of \(A\) for \(\lambda_i\text{,}\) or is the zero vector. We proved above that sets of generalized eigenvectors whose eigenvalues are distinct form a linearly independent set. Should any (or some) \(\vect{y}_i\) be nonzero, the previous equation would provide a nontrivial relation of linear dependence on a set of generalized eigenvectors with distinct eigenvalues, contradicting the result in our second step. Thus \(\vect{y}_i=\zerovector\text{,}\) \(1\leq i\leq k\text{.}\)

Each of the \(k\) equations, \(\vect{y}_i=\zerovector\text{,}\) is a relation of linear dependence on the corresponding set \(B_i\text{,}\) a set of basis vectors for the generalized eigenspace \(\eigenspace{A}{\lambda_i}\text{,}\) which is therefore linearly independent. From these relations of linear dependence on linearly independent sets we conclude that the scalars are all zero, more precisely, \(a_{ij}=0\text{,}\) \(1\leq j\leq m_i\text{,}\) \(1\leq i\leq k\text{.}\) This establishes that our original relation of linear dependence on \(B\) has only the trivial relation of linear dependence, and hence \(B\) is a linearly independent set, and a basis of \(\complex{n}\text{.}\)

The values of the geometric multiplicities and the algebraic multiplicities, collectively, determine if a matrix is diagonalizable or not, and can even be viewed as an explanation of why some matrices are diagonalizable and some are not.

Theorem DMFE. Diagonalizable Matrices have Full Eigenspaces.

Suppose \(A\) is a square matrix. Then \(A\) is diagonalizable if and only if \(\geomult{A}{\lambda}=\algmult{A}{\lambda}\) for every eigenvalue \(\lambda\) of \(A\text{.}\)

Proof.

Let \(\scalarlist{\lambda}{t}\) be the distinct eigenvalues of \(A\text{.}\)

(⇒)

If \(A\) is diagonalizable then by Theorem DC there is a set of \(n\) linearly independent eigenvectors of \(A\text{.}\) Let \(n_i\) be the number of eigenvectors from this set that are associated to eigenvalue \(\lambda_i\text{.}\) Each individual group of eigenvectors will be linearly independent, so by Theorem G its size cannot exceed the dimension of the eigenspace they belong to. So \(n_i\leq\geomult{A}{\lambda_i}\text{.}\)

Then, application of Theorem ME and Theorem NEM gives

\begin{equation*} n = n_1+n_2+\cdots+n_t \leq \geomult{A}{\lambda_1}+\geomult{A}{\lambda_2}+\cdots+\geomult{A}{\lambda_t} \leq \algmult{A}{\lambda_1}+\algmult{A}{\lambda_2}+\cdots+\algmult{A}{\lambda_t} = n\text{.} \end{equation*}

With \(n\) at each end of this string of inequalities, each expression must be equal to the others. In particular, the application of Theorem ME implies that \(\geomult{A}{\lambda_i}=\algmult{A}{\lambda_i}\) for \(1\leq i\leq t\text{.}\)

(⇐)

If \(\geomult{A}{\lambda_i}=\algmult{A}{\lambda_i}\text{,}\) then by Theorem EDYES \(\eigenspace{A}{\lambda_i}=\geneigenspace{A}{\lambda_i}\text{,}\) so every generalized eigenvector is a traditional eigenvector. Theorem GEB guarantees a basis of \(\complex{n}\) composed of generalized eigenvectors of \(A\text{.}\) This is a set of \(n\) linearly independent eigenvectors for \(A\text{,}\) so by Theorem DC, \(A\) is diagonalizable.

Subsection CP Characteristic Polynomial

In this “late determinant” version of the text, we have consciously chosen to develop as much of the theory of eigenvalues and eigenvectors as possible with the use of the determinant (Chapter D). An important consequence is that we have defined the algebraic multiplicity of an eigenvalue as the dimension of the generalized eigenspace, and we will now use this to define the characteristic polynomial. In the “early determinant” version, the determinant is used to define the characteristic polynomial and then the algebraic multiplicity is determined by factoring this polynomial.

Definition CP. Characteristic Polynomial.

Suppose that \(A\) is a square matrix of size \(n\) with distinct eigenvalues \(\scalarlist{\lambda}{t}\text{.}\) Then the characteristic polynomial of \(A\) is the polynomial \(\charpoly{A}{x}\) defined by

\begin{equation*} \charpoly{A}{x} = (x-\lambda_1)^{\algmult{A}{\lambda_1}} (x-\lambda_2)^{\algmult{A}{\lambda_2}} (x-\lambda_3)^{\algmult{A}{\lambda_3}} \cdots (x-\lambda_t)^{\algmult{A}{\lambda_t}}\text{.} \end{equation*}

Example CPMS3. Characteristic polynomial of a matrix, size 3.

Consider

\begin{equation*} F= \begin{bmatrix} -13 & -8 & -4\\ 12 & 7 & 4\\ 24 & 16 & 7 \end{bmatrix}\text{.} \end{equation*}

Then

\begin{align*} \charpoly{F}{x}&=\detname{F-xI_3}\\ &= \begin{vmatrix} -13-x & -8 & -4\\ 12 & 7-x & 4\\ 24 & 16 & 7-x \end{vmatrix}&& \knowl{./knowl/definition-CP.html}{\text{Definition CP}}\\ &= (-13-x) \begin{vmatrix} 7-x & 4\\ 16 & 7-x \end{vmatrix} +(-8)(-1) \begin{vmatrix} 12 & 4\\ 24 & 7-x \end{vmatrix}&& \knowl{./knowl/definition-DM.html}{\text{Definition DM}}\\ &\quad\quad +(-4) \begin{vmatrix} 12 & 7-x\\ 24 & 16 \end{vmatrix}\\ &=(-13-x)((7-x)(7-x)-4(16))&& \knowl{./knowl/theorem-DMST.html}{\text{Theorem DMST}}\\ &\quad\quad +(-8)(-1)(12(7-x)-4(24))\\ &\quad\quad +(-4)(12(16)-(7-x)(24))\\ &=3+5x+x^2-x^3\\ &=-(x-3)(x+1)^2\text{.} \end{align*}

As a consequence of its definition, the characteristic polynomial has roots that are the eigenvalues of its matrix.

Theorem EMRCP. Eigenvalues of a Matrix are Roots of Characteristic Polynomials.

Suppose \(A\) is a square matrix. Then \(\lambda\) is an eigenvalue of \(A\) if and only if \(\charpoly{A}{\lambda}=0\text{.}\)

Proof.

In Definition CP the charateristic polynomial is defined in a factored form, where every factor provides a solution of the polynomial equation \(\charpoly{A}{x}=0\) that is an eigenvalue of \(A\text{.}\) So every root is an eigenvalue. Furthermore, there is a factor for each eigenvalue of \(A\text{.}\) so every eigenvalue is a root.

Example EMS3. Eigenvalues of a matrix, size 3.

In Example CPMS3 we found the characteristic polynomial of

\begin{equation*} F= \begin{bmatrix} -13 & -8 & -4\\ 12 & 7 & 4\\ 24 & 16 & 7 \end{bmatrix} \end{equation*}

to be \(\charpoly{F}{x}=-(x-3)(x+1)^2\text{.}\) Factored, we can find all of its roots easily, they are \(x=3\) and \(x=-1\text{.}\) By Theorem EMRCP, \(\lambda=3\) and \(\lambda=-1\) are both eigenvalues of \(F\text{,}\) and these are the only eigenvalues of \(F\text{.}\) We have found them all.

Theorem DCP. Degree of the Characteristic Polynomial.

Suppose that \(A\) is a square matrix of size \(n\text{.}\) Then the characteristic polynomial of \(A\text{,}\) \(\charpoly{A}{x}\text{,}\) has degree \(n\text{.}\)

Proof.

From Definition CP, we see that the degree of \(\charpoly{A}{x}\) is \(\sum_{i=1}^{k}\algmult{A}{\lambda_i}\text{.}\) But by Theorem NEM this sum is \(n\text{.}\)

Sometimes we say that a root of a polynomial equation satisfies the equation. The next theorem is often paraphrased to say “A matrix satisfies its own characteristic equation,” where characteristic equation means to set the characteristic polynomial to zero (a construction motivated by Theorem EMRCP).

Theorem CHT. Cayley-Hamilton Theorem.

Suppose that \(A\) is a square matrix with characteristic polynomial \(\charpoly{A}{x}\text{.}\) Then \(\charpoly{A}{A}=\zeromatrix\text{,}\) the zero matrix.

Proof.

Let \(n\) denote the size of \(A\text{.}\) By Theorem UTEC, \(A\) is similar to an upper triangular matrix \(T\text{,}\) where the diagonal entries of \(T\) are the eigenvalues of \(A\) repeated as many times as their algebraic multiplicity. Let \(\scalarlist{\lambda}{n}\) be these diagonal entries of \(T\text{.}\) By Definition CP we have

\begin{equation*} \charpoly{A}{x} = (x-\lambda_1)(x-\lambda_2)(x-\lambda_3) \cdots (x-\lambda_n)\text{.} \end{equation*}

We prove the following claim by induction on \(k\text{.}\) For \(1\leq k\leq n\text{,}\)

\begin{equation*} (T-\lambda_1 I_n)(T-\lambda_2 I_n) \cdots (T-\lambda_k I_n)\vect{e}_i = \zerovector, \quad 1\leq i\leq k\text{.} \end{equation*}

Base Case.

When \(k=1\text{,}\) we need only show consider \((T-\lambda_1 I_n)\vect{e}_1\text{.}\) Owing to the upper triangular nature of \(T\text{,}\) the first column of \(T-\lambda_1 I_n\) is the zero vector and the matrix-vector product with \(\vect{e}_1\) is just this column.

Induction Step.

For \(1\leq i\leq k-1\text{,}\)

\begin{align*} &(T-\lambda_1 I_n)(T-\lambda_2 I_n) \cdots (T-\lambda_{k-1} I_n)(T-\lambda_k I_n)\vect{e}_i\\ &\quad=(T-\lambda_k I_n)(T-\lambda_1 I_n)(T-\lambda_2 I_n) \cdots (T-\lambda_{k-1} I_n)\vect{e}_i&&\knowl{./knowl/theorem-PMC.html}{\text{Theorem PMC}}\\ &\quad=(T-\lambda_k I_n)\zerovector&&\text{Induction Hypothesis}\\ &\quad=\zerovector&&\knowl{./knowl/theorem-MMZM.html}{\text{Theorem MMZM}}\text{.} \end{align*}

For \(i=k\text{,}\) notice that the upper triangular nature of \(T\) implies that column \(k\) of \(T-\lambda_k I_n\) begins with \(k-1\) entries of any variety, followed by \(n-k+1\) entries that are zero. So there are scalars \(\scalarlist{b}{k-1}\) such that

\begin{equation*} \left(T-\lambda_k I_n\right)\vect{e}_k = b_1\vect{e}_1 + b_2\vect{e}_2 + b_3\vect{e}_3 + b_{k-1}\vect{e}_{k-1} = \sum_{\ell=1}^{k-1}b_\ell\vect{e}_\ell\text{.} \end{equation*}

Then

\begin{align*} &(T-\lambda_1 I_n)(T-\lambda_2 I_n) \cdots (T-\lambda_{k-1} I_n)(T-\lambda_k I_n)\vect{e}_k\\ &\quad=(T-\lambda_1 I_n)(T-\lambda_2 I_n) \cdots (T-\lambda_{k-1} I_n)\sum_{\ell=1}^{k-1}b_\ell\vect{e}_\ell&&\text{Form of }T-\lambda_k I_n\\ &\quad=\sum_{\ell=1}^{k-1}(T-\lambda_1 I_n)(T-\lambda_2 I_n) \cdots (T-\lambda_{k-1} I_n)b_\ell\vect{e}_\ell&&\knowl{./knowl/theorem-MMDAA.html}{\text{Theorem MMDAA}}\\ &\quad=\sum_{\ell=1}^{k-1}b_\ell(T-\lambda_1 I_n)(T-\lambda_2 I_n) \cdots (T-\lambda_{k-1} I_n)\vect{e}_\ell&&\knowl{./knowl/theorem-MMSMM.html}{\text{Theorem MMSMM}}\\ &\quad=\sum_{\ell=1}^{k-1}b_\ell\zerovector&&\text{Induction Hypothesis}\\ &\quad=\sum_{\ell=1}^{k-1}\zerovector&&\knowl{./knowl/theorem-ZVSM.html}{\text{Theorem ZVSM}}\\ &\quad=\zerovector&&\knowl{./knowl/property-Z.html}{\text{Property Z}}\text{.} \end{align*}

So we have established the claim for all \(1\leq i\leq k\text{.}\)

Now

\begin{align*} p(T)&=p(T)I_n&&\knowl{./knowl/theorem-MMIM.html}{\text{Theorem MMIM}}\\ &=p(T)\matrixcolumns{e}{n}&&\knowl{./knowl/theorem-MMIM.html}{\text{Theorem MMIM}}\\ &=\left\lbrack p(T)\vect{e}_1|p(T)\vect{e}_2|p(T)\vect{e}_3|\ldots|p(T)\vect{e}_n\right\rbrack&&\knowl{./knowl/definition-MM.html}{\text{Definition MM}} \end{align*}

and then for \(1\leq k\leq n\text{,}\)

\begin{align*} p(T)\vect{e}_k&=(T-\lambda_1 I_n)(T-\lambda_2 I_n) \cdots (T-\lambda_n I_n)\vect{e}_k&&\knowl{./knowl/definition-CP.html}{\text{Definition CP}}\\ &=(T-\lambda_{k+1} I_n) \cdots (T-\lambda_{n} I_n) (T-\lambda_1 I_n) \cdots (T-\lambda_k I_n)\vect{e}_k&&\knowl{./knowl/theorem-PMC.html}{\text{Theorem PMC}}\\ &=(T-\lambda_{k+1} I_n) (T-\lambda_{k+2} I_n) \cdots (T-\lambda_{n} I_n)\zerovector&&\text{Induction above}\\ &=\zerovector&&\knowl{./knowl/theorem-MMZM.html}{\text{Theorem MMZM}}\text{.} \end{align*}

So we see that each column of \(p(T)\) is the zero vector and so \(p(T)=\zeromatrix\text{.}\) Because \(A\) and \(T\) are similar, Theorem PSMS says \(p(A)\) is similar to \(p(T)\) (the zero matrix), and thus \(p(A)\) must also be the zero matrix, as desired.

Reading Questions ME Reading Questions

1.

In the reading questions for Section IS, Reading Questions IS.IS.IS., we saw that one of the eigenvalues of the matrix \(A\) is \(\lambda = 2\text{.}\) Another eigenvalue of \(A\) is \(\lambda=-1\text{.}\) Now suppose that \(A\) is similar to an upper triangular matrix \(T\text{.}\) Describe the diagonal of \(T\text{.}\)

\begin{equation*} A = \begin{bmatrix} 20 & -69 & 54 \\ 6 & -22 & 18 \\ 1 & -5 & 5 \end{bmatrix}\text{.} \end{equation*}

2.

Compute the characteristic polynomial of \(A\text{,}\) \(\charpoly{A}{x}\text{.}\)

3.

Evaluate the characteristic polynomial, \(\charpoly{A}{x}\text{,}\) with the matrix \(A\text{.}\) Why is the answer both surprising and not surprising?