## Section 3.3 Jordan Canonical Form

¶Nilpotent matrices and generalized eigenspaces are the essential ingredients for a canonical form applicable to any square matrix. In this section will progress from the specialized case of a nilpotent matrix to the totally general case of any square matrix.

### Subsection 3.3.1 Canonical Form for Nilpotent Linear Transformations

¶Our main purpose in this section is to find a basis so that a nilpotent linear transformation will have a pleasing, nearly-diagonal matrix representation. Of course, we will not have a definition for “pleasing,” nor for “nearly-diagonal.” But the short answer is that our preferred matrix representation will be built up from Jordan blocks, \(\jordan{k}{0}\text{.}\) Here's the theorem. You will find Example (((example-canonical-form-nilpotent-size-6-index-4))), just following, helpful as you study this proof, since it uses the same notation, and is large enough to (barely) illustrate the full generality of the theorem.

###### Theorem 3.3.1. Canonical Form for Nilpotent Linear Transformations.

Suppose that \(\ltdefn{T}{V}{V}\) is a nilpotent linear transformation of index \(p\text{.}\) Then there is a basis for \(V\) so that the matrix representation, \(\matrixrep{T}{B}{B}\text{,}\) is block diagonal with each block being a Jordan block, \(\jordan{k}{0}\text{.}\) The size of the largest block is the index \(p\text{,}\) and the total number of blocks is the nullity of \(T\text{,}\) \(\nullity{T}\text{.}\)

###### Proof.

The proof is constructive, as we will explicitly manufacture the basis, and so can be used in practice. As we begin, the basis vectors will not be in the proper order, but we will rearrange them at the end of the proof. For convenience, define \(n_i=\nullity{T^i}\text{,}\) so for example, \(n_0=0\text{,}\) \(n_1=\nullity{T}\) and \(n_p=\nullity{T^p}=\dimension{V}\text{.}\) Define \(s_i=n_i-n_{i-1}\text{,}\) for \(1\leq i\leq p\text{,}\) so we can think of \(s_i\) as “how much bigger” \(\krn{T^i}\) is than \(\krn{T^{i-1}}\text{.}\) In particular, Theorem Theorem 3.2.6 implies that \(s_i\gt 0\) for \(1\leq i\leq p\text{.}\)

We build a set of vectors \(\vect{z}_{i,j}\text{,}\) \(1\leq i\leq p\text{,}\) \(1\leq j\leq s_i\text{.}\) Each \(\vect{z}_{i,j}\) will be an element of \(\krn{T^i}\) and not an element of \(\krn{T^{i-1}}\text{.}\) In total, we will obtain a linearly independent set of \(\sum_{i=1}^{p}s_i=\sum_{i=1}^{p}n_i-n_{i-1}=n_p-n_0=\dimension{V}\) vectors that form a basis of \(V\text{.}\) We construct this set in pieces, starting at the “wrong” end. Our procedure will build a series of subspaces, \(Z_i\text{,}\) each lying in between \(\krn{T^{i-1}}\) and \(\krn{T^i}\text{,}\) having bases \(\vect{z}_{i,j}\text{,}\) \(1\leq j\leq s_i\text{,}\) and which together equal \(V\) as a direct sum. Now would be a good time to review the results on direct sums collected in Section Section 1.2.

We build the subspace \(Z_p\) first (this is what we meant by “starting at the wrong end”). \(\krn{T^{p-1}}\) is a proper subspace of \(\krn{T^p}=V\) (Theorem Theorem 3.2.6). Theorem Theorem 1.2.4 says that there is a subspace of \(V\) that will pair with the subspace \(\krn{T^{p-1}}\) to form a direct sum of \(V\text{.}\) Call this subspace \(Z_p\text{,}\) and choose vectors \(\vect{z}_{p,j}\text{,}\) \(1\leq j\leq s_p\) as a basis of \(Z_p\text{,}\) which we will denote as \(B_p\text{.}\) Note that we have a fair amount of freedom in how to choose these first basis vectors. Several observations will be useful in the next step. First \(V=\krn{T^{p-1}}\ds Z_p\text{.}\) The basis \(B_p=\set{\vect{z}_{p,1},\,\vect{z}_{p,2},\,\vect{z}_{p,3},\,\dots,\,\vect{z}_{p,s_p}}\) is linearly independent. For \(1\leq j\leq s_p\text{,}\) \(\vect{z}_{p,j}\in\krn{T^p}=V\text{.}\) Since the two subspaces of a direct sum have no nonzero vectors in common, for \(1\leq j\leq s_p\text{,}\) \(\vect{z}_{p,j}\not\in\krn{T^{p-1}}\text{.}\) That was comparably easy.

If obtaining \(Z_p\) was easy, getting \(Z_{p-1}\) will be harder. We will repeat the next step \(p-1\) times, and so will do it carefully the first time. Eventually, \(Z_{p-1}\) will have dimension \(s_{p-1}\text{.}\) However, obtaining the first \(s_p\) vectors of a basis for \(Z_{p-1}\) are straightforward. Define \(\vect{z}_{p-1,j}=\lteval{T}{\vect{z}_{p,j}}\text{,}\) \(1\leq j\leq s_p\text{.}\) Notice that we have no choice in creating these vectors, they are a consequence of our choices for \(\vect{z}_{p,j}\text{.}\) In retrospect (i.e. on a second reading of this proof), you will recognize this as the key step in realizing a matrix representation of a nilpotent linear transformation with Jordan blocks. We need to know that this set of vectors is linearly independent. We consider a relation of linear dependence on \(\vect{z}_{p-1,j}\text{,}\) \(1\leq j\leq s_p\text{,}\) and massage it,

Define

The statement just above means that \(\vect{x}\in\krn{T}\subseteq\krn{T^{p-1}}\) (Theorem Theorem 3.2.6). As defined, \(\vect{x}\) is a linear combination of the basis vectors \(B_p\text{,}\) and therefore \(\vect{x}\in Z_p\text{.}\) Thus \(\vect{x}\in\krn{T^{p-1}}\cap Z_p\text{.}\) Because \(V=\krn{T^{p-1}}\ds Z_p\text{,}\) Theorem Theorem 1.2.6 tells us that \(\vect{x}=\zerovector\text{.}\) Now we recognize the definition of \(\vect{x}\) as a relation of linear dependence on the linearly independent set \(B_p\text{,}\) and therefore conclude that \(a_1=a_2=\cdots=a_{s_p}=0\text{.}\) This establishes the linear independence of \(\vect{z}_{p-1,j}\text{,}\) \(1\leq j\leq s_p\text{.}\)

We also need to know where the vectors \(\vect{z}_{p-1,j}\text{,}\) \(1\leq j\leq s_p\) live. First we demonstrate that they are members of \(\krn{T^{p-1}}\text{.}\)

So \(\vect{z}_{p-1,j}\in\krn{T^{p-1}}\text{,}\) \(1\leq j\leq s_p\text{.}\)

Moreover, these vectors are not elements of \(\krn{T^{p-2}}\text{.}\) Suppose to the contrary that \(\vect{z}_{p-1,j}\in\krn{T^{p-2}}\text{.}\) Then

which contradicts the earlier statement that \(\vect{z}_{p,j}\not\in\krn{T^{p-1}}\text{.}\) So \(\vect{z}_{p-1,j}\not\in\krn{T^{p-2}}\text{,}\) \(1\leq j\leq s_p\text{.}\)

Now choose any basis \(C_{p-2}=\set{\vectorlist{u}{n_{p-2}}}\) for \(\krn{T^{p-2}}\text{.}\) We want to extend this basis by adding in the \(\vect{z}_{p-1,j}\) to span a subspace of \(\krn{T^{p-1}}\text{.}\) But first we establish that this set is linearly independent. Let \(a_k\text{,}\) \(1\leq k\leq n_{p-2}\) and \(b_j\text{,}\) \(1\leq j\leq s_p\) be the scalars in a relation of linear dependence,

Then,

Define

The statement just above means that \(\vect{y}\in\krn{T^{p-1}}\text{.}\) As defined, \(\vect{y}\) is a linear combination of the basis vectors \(B_p\text{,}\) and therefore \(\vect{y}\in Z_p\text{.}\) Thus \(\vect{y}\in\krn{T^{p-1}}\cap Z_p\text{.}\) Because \(V=\krn{T^{p-1}}\ds Z_p\text{,}\) Theorem Theorem 1.2.6 tells us that \(\vect{y}=\zerovector\text{.}\) Now we recognize the definition of \(\vect{y}\) as a relation of linear dependence on the linearly independent set \(B_p\text{,}\) and therefore \(b_1=b_2=\cdots=b_{s_p}=0\text{.}\) Return to the full relation of linear dependence with both sets of scalars (the \(a_i\) and \(b_j\)). Now that we know that \(b_j=0\) for \(1\leq j\leq s_p\text{,}\) this relation of linear dependence simplifies to a relation of linear dependence on just the basis \(C_{p-1}\text{.}\) Therefore, \(a_i=0\text{,}\) \(1\leq a_i\leq n_{p-1}\) and we have the desired linear independence.

Define a new subspace of \(\krn{T^{p-1}}\) by

By Theorem Theorem 1.2.4 there exists a subspace of \(\krn{T^{p-1}}\) which will pair with \(Q_{p-1}\) to form a direct sum. Call this subspace \(R_{p-1}\text{,}\) so by definition, \(\krn{T^{p-1}}=Q_{p-1}\ds R_{p-1}\text{.}\) We are interested in the dimension of \(R_{p-1}\text{.}\) Note first, that since the spanning set of \(Q_{p-1}\) is linearly independent, \(\dimension{Q_{p-1}}=n_{p-2}+s_p\text{.}\) Then

Notice that if \(s_{p-1}=s_p\text{,}\) then \(R_{p-1}\) is trivial. Now choose a basis of \(R_{p-1}\text{,}\) and denote these \(s_{p-1}-s_p\) vectors as \(\vect{z}_{p-1,s_p+1}\text{,}\) \(\vect{z}_{p-1,s_p+2}\text{,}\) \(\vect{z}_{p-1,s_p+3}\text{,}\) \dots, \(\vect{z}_{p-1,s_{p-1}}\text{.}\) This is another occassion to notice that we have some freedom in this choice.

We now have \(\krn{T^{p-1}}=Q_{p-1}\ds R_{p-1}\text{,}\) and we have bases for each of the two subspaces. The union of these two bases will therefore be a linearly independent set in \(\krn{T^{p-1}}\) with size

So with the proper size the following set is a basis of \(\krn{T^{p-1}}\text{,}\)

We built up this basis in three parts, we will now split it in half. Define the subspace \(Z_{p-1}\) by

where we have implicitly denoted the basis as \(B_{p-1}\text{.}\) Then Theorem 1.2.3 allows us to split up the basis for \(\krn{T^{p-1}}\) as \(C_{p-1}\cup B_{p-1}\) and write

Whew! This is a good place to recap what we have achieved. The vectors \(\vect{z}_{i,j}\) form bases for the subspaces \(Z_i\) and right now

The key feature of this decomposition of \(V\) is that the first \(s_p\) vectors in the basis for \(Z_{p-1}\) are outputs of the linear transformation \(T\) using the basis vectors of \(Z_p\) as inputs.

Now we want to further decompose \(\krn{T^{p-2}}\text{,}\) into \(\krn{T^{p-3}}\) and \(Z_{p-2}\text{.}\) The procedure is the same as above, so we will only sketch the key steps. Checking the details proceeds in the same manner as above. Technically, we could have set up the preceding as the induction step in a proof by induction (Proof Technique I), but this probably would make the proof harder to understand.

Apply \(T\) to each element of \(B_{p-1}\text{,}\) to create vectors \(\vect{z}_{p-2,j}\text{,}\) \(1\leq j\leq s_{p-1}\text{.}\) These vectors form a linearly independent set, and each is an element of \(\krn{T^{p-2}}\text{,}\) but not an element of \(\krn{T^{p-3}}\text{.}\) Grab a basis \(C_{p-3}\) of \(\krn{T^{p-3}}\) and add on the newly-created vectors \(\vect{z}_{p-2,j}\text{,}\) \(1\leq j\leq s_{p-1}\text{.}\) This expanded set is linearly independent, and we can define a subspace \(Q_{p-2}\) with this set as its basis. Theorem Theorem 1.2.4 gives us a subspace \(R_{p-2}\) such that \(\krn{T^{p-2}}=Q_{p-2}\ds R_{p-2}\text{.}\) Vectors \(\vect{z}_{p-2,j}\text{,}\) \(s_{p-1}+1\leq j\leq s_{p-2}\) are chosen as a basis for \(R_{p-2}\) once the relevant dimensions have been verified. The union of \(C_{p-3}\) and \(\vect{z}_{p-2,j}\text{,}\) \(1\leq j\leq s_{p-2}\) then form a basis of \(\krn{T^{p-2}}\text{,}\) which can be split into two parts to yield the decomposition

Here \(Z_{p-2}\) is the subspace of \(\krn{T^{p-2}}\) with basis \(B_{p-2}=\setparts{\vect{z}_{p-2,j}}{1\leq j\leq s_{p-2}}\text{.}\) Finally,

Again, the key feature of this decomposition is that the first vectors in the basis of \(Z_{p-2}\) are outputs of \(T\) using vectors from the basis \(Z_{p-1}\) as inputs (and in turn, some of these inputs are outputs of \(T\) derived from inputs in \(Z_p\)).

Now assume we repeat this procedure until we decompose \(\krn{T^{2}}\) into subspaces \(\krn{T}\) and \(Z_2\text{.}\) Finally, decompose \(\krn{T}\) into subspaces \(\krn{T^0}=\krn{I_n}=\set{\zerovector}\) and \(Z_1\text{,}\) so that we recognize the vectors \(\vect{z}_{1,j}\text{,}\) \(1\leq j\leq s_1=n_1\) as elements of \(\krn{T}\text{.}\) The set

is linearly independent by Theorem Theorem 1.2.7 and has size

So \(B\) is a basis of \(V\text{.}\)

We desire a matrix representation of \(T\) relative to \(B\text{,}\) but first we will reorder the elements of \(B\text{.}\) The following display lists the elements of \(B\) in the desired order, when read across the rows left-to-right in the usual way. Notice that we established the existence of these vectors column-by-column, and beginning on the right.

It is difficult to layout this table with the notation we have been using, but it would not be especially useful to invent some notation to overcome the difficulty. (One approach would be to define something like the inverse of the nonincreasing function, \(i\rightarrow s_i\text{.}\)) Do notice that there are \(s_1=n_1\) rows, and \(p\) columns. Column \(i\) is the basis \(B_i\text{.}\) The vectors in the first column are elements of \(\krn{T}\text{.}\) Each row is the same length, or shorter, than the one above it. If we apply \(T\) to any vector in the table, other than those in the first column, the output is the preceding vector in the row.

Now contemplate the matrix representation of \(T\) relative to \(B\) as we read across the rows of the table above. In the first row, \(\lteval{T}{\vect{z}_{1,1}}=\zerovector\text{,}\) so the first column of the representation is the zero column. Next, \(\lteval{T}{\vect{z}_{2,1}}=\vect{z}_{1,1}\text{,}\) so the second column of the representation is a vector with a single one in the first entry, and zeros elsewhere. Next, \(\lteval{T}{\vect{z}_{3,1}}=\vect{z}_{2,1}\text{,}\) so column 3 of the representation is a zero, then a one, then all zeros. Continuing in this vein, we obtain the first \(p\) columns of the representation, which is the Jordan block \(\jordan{p}{0}\) followed by rows of zeros.

When we apply \(T\) to the basis vectors of the second row, what happens? Applying \(T\) to the first vector, the result is the zero vector, so the representation gets a zero column. Applying \(T\) to the second vector in the row, the output is simply the first vector in that row, making the next column of the representation all zeros plus a lone one, sitting just above the diagonal. Continuing, we create a Jordan block, sitting on the diagonal of the matrix representation. It is not possible in general to state the size of this block, but since the second row is no longer than the first, it cannot have size larger than \(p\text{.}\)

Since there are as many rows as the dimension of \(\krn{T}\text{,}\) the representation contains as many Jordan blocks as the nullity of \(T\text{,}\) \(\nullity{T}\text{.}\) Each successive block is smaller than the preceding one, with the first, and largest, having size \(p\text{.}\) The blocks are Jordan blocks since the basis vectors \(\vect{z}_{i,j}\) were often defined as the result of applying \(T\) to other elements of the basis already determined, and then we rearranged the basis into an order that placed outputs of \(T\) just before their inputs, excepting the start of each row, which was an element of \(\krn{T}\text{.}\)

The proof of Theorem Theorem 3.3.1 is constructive, so we can use it to create bases of nilpotent linear transformations with pleasing matrix representations. Recall that Theorem Theorem 3.2.5 told us that nilpotent linear transformations are almost never diagonalizable, so this is progress. As we have hinted before, with a nice representation of nilpotent matrices, it will not be difficult to build up representations of other non-diagonalizable matrices. Here is the promised example which illustrates the previous theorem. It is a useful companion to your study of the proof of Theorem Theorem 3.3.1.

###### Example 3.3.2. Canonical Form for a Nilpotent Linear Transformation.

The \(6\times 6\) matrix, \(A\text{,}\) of Example Example 3.2.2 is nilpotent. If we define the linear transformation \(\ltdefn{T}{\complex{6}}{\complex{6}}\) by \(\lteval{T}{\vect{x}}=A\vect{x}\text{,}\) then \(T\) is nilpotent and we can seek a basis of \(\complex{6}\) that yields a matrix representation with Jordan blocks on the diagonal. Since \(T\) has index \(4\) and nullity \(2\text{,}\) from Theorem Theorem 3.3.1 we can expect the largest Jordan block to be \(\jordan{4}{0}\text{,}\) and there will be just \(2\) blocks. This only leaves enough room for the second block to have size \(2\text{.}\)

To determine nullities, we will recycle the bases for the null spaces of the powers of \(A\) from Example Example 3.2.7, rather than recomputing them. We will also use the same notation used in the proof of Theorem Theorem 3.3.1.

To begin, \(s_4=n_4-n_3=6-5=1\text{,}\) so we need one vector of \(\krn{T^4}=\complex{6}\text{,}\) that is not in \(\krn{T^3}\text{,}\) to be a basis for \(Z_4\text{.}\) We have a lot of latitude in this choice, and we have not described any sure-fire method for constructing a vector *outside* of a subspace. Looking at the basis for \(\krn{T^3}\) we see that if a vector is in this subspace, and has a nonzero value in the first entry, then it must also have a nonzero value in the fourth entry. So the vector

will not be an element of \(\krn{T^3}\text{.}\) (Notice that many other choices could be made here, so our basis will not be unique.) This completes the determination of \(Z_p=Z_4\text{.}\)

Next, \(s_3=n_3-n_2=5-4=1\text{,}\) so we again need just a single basis vector for \(Z_3\text{.}\) We start by evaluating \(T\) with each basis vector of \(Z_4\text{,}\)

Since \(s_3=s_4\text{,}\) the subspace \(R_3\) is trivial, and there is nothing left to do, \(\vect{z}_{3,1}\) is the lone basis vector of \(Z_3\text{.}\)

Now \(s_2=n_2-n_1=4-2=2\text{,}\) so the construction of \(Z_2\) will not be as simple as the construction of \(Z_3\text{.}\) We first apply \(T\) to the basis vector of \(Z_2\text{,}\)

The two basis vectors of \(\krn{T^1}\text{,}\) together with \(\vect{z}_{2,1}\text{,}\) form a basis for \(Q_2\text{.}\) Because \(\dimension{\krn{T^2}}-\dimension{Q_2}=4-3=1\) we need only find a single basis vector for \(R_2\text{.}\) This vector must be an element of \(\krn{T^2}\text{,}\) but not an element of \(Q_2\text{.}\) Again, there is a variety of vectors that fit this description, and we have no precise algorithm for finding them. Since they are plentiful, they are not too hard to find. We add up the four basis vectors of \(\krn{T^2}\text{,}\) ensuring an element of \(\krn{T^2}\text{.}\) Then we check to see if the vector is a linear combination of three vectors: the two basis vectors of \(\krn{T^1}\) and \(\vect{z}_{2,1}\text{.}\) Having passed the tests, we have chosen

Thus, \(Z_2=\spn{\set{\vect{z}_{2,1},\,\vect{z}_{2,2}}}\text{.}\)

Lastly, \(s_1=n_1-n_0=2-0=2\text{.}\) Since \(s_2=s_1\text{,}\) we again have a trivial \(R_1\) and need only complete our basis by evaluating the basis vectors of \(Z_2\) with \(T\text{,}\)

Now we reorder these vectors as the desired basis,

We now apply Definition MR to build a matrix representation of \(T\) relative to \(B\text{,}\)

Installing these vectors as the columns of the matrix representation we have

which is a block diagonal matrix with Jordan blocks \(\jordan{4}{0}\) and \(\jordan{2}{0}\text{.}\)

If we construct the matrix \(S\) having the vectors of \(B\) as columns, then Theorem SCB tells us that a similarity transformation by \(S\) relates the original matrix representation of \(T\) with the matrix representation consisting of Jordan blocks, in other words, \(\similar{A}{S}=\matrixrep{T}{B}{B}\text{.}\)

Notice that constructing interesting examples of matrix representations requires domains with dimensions bigger than just two or three. Going forward our examples will get even bigger.

### Subsection 3.3.2 Restrictions to Generalized Eigenspaces

We now know how to make canonical matrix representations of a what seems to be a narrow class of linear transformations—the nilpotent ones (Theorem Theorem 3.3.1). However, since the restriction of any linear transformation to one of its generalized eigenspace is only a small adjustment away from being a nilpotent linear transformation (Theorem Theorem 3.2.8) we can extend the utility of our previous representation easily.

###### Theorem 3.3.3. Matrix Representation of a Restriction to a Generalized Eigenspace.

Suppose that \(\ltdefn{T}{V}{V}\) is a linear transformation with eigenvalue \(\lambda\text{.}\) Then there is a basis of the the generalized eigenspace \(\geneigenspace{T}{\lambda}\) such that the restriction \(\ltdefn{\restrict{T}{\geneigenspace{T}{\lambda}}}{\geneigenspace{T}{\lambda}}{\geneigenspace{T}{\lambda}}\) has a matrix representation that is block diagonal where each block is a Jordan block of the form \(\jordan{k}{\lambda}\) (with varying values of \(k\)).

###### Proof.

Theorem Theorem 3.2.8 tells us that \(\restrict{T}{\geneigenspace{T}{\lambda}}-\lambda I_{\geneigenspace{T}{\lambda}}\) is a nilpotent linear transformation. Theorem Theorem 3.3.1 tells us that a nilpotent linear transformation has a basis for its domain that yields a matrix representation that is block diagonal where the blocks are Jordan blocks of the form \(\jordan{k}{0}\text{.}\) So let \(B\) be a basis of \(\geneigenspace{T}{\lambda}\) that yields such a matrix representation for \(\restrict{T}{\geneigenspace{T}{\lambda}}-\lambda I_{\geneigenspace{T}{\lambda}}\text{.}\)

We can write

Then the matrix representation of \(\lambda I_{\geneigenspace{T}{\lambda}}\) relative to the basis \(B\) is then simply the diagonal matrix \(\lambda I_m\text{,}\) where \(m=\dimension{\geneigenspace{T}{\lambda}}\text{.}\) So we have the rather unwieldy expression,

The first representation in the final experssion has Jordan blocks with zero in every diagonal entry by Theorem Theorem 3.3.1, while the second representation has \(\lambda\) in every diagonal entry of the matrix. The result of adding the two representations is to convert the Jordan blocks from the form \(\jordan{k}{0}\) to the form \(\jordan{k}{\lambda}\text{.}\)

Of course, Theorem Theorem 3.3.1 provides some extra information on the sizes of the Jordan blocks in a representation and we could carry over this information to Theorem Theorem 3.2.8, but we will save this description and incorporate it into our final major result in the next section.

### Subsection 3.3.3 Jordan Canonical Form

¶Begin with any linear transformation that has an identical domain and codomain. Build a block diagonal representation from a direct sum decomposition into (invariant) generalized eigenspaces. For each generalized eigenspace, further refine the block into a sequence of Jordan blocks (with common diagonal elements) from the restriction to the generalized eigenspace, which is very nearly nilpotent. Then you have Jordan canonical form. Other than cosmetic reorderings, it is a unique representative of the equivalence class of similar matrices.

We remove the ambiguity from trivial reorderings of eigenvalues and Jordan blocks with a careful definition.

###### Definition 3.3.4. Jordan Canonical Form.

A square matrix is in Jordan canonical form if it meets the following requirements:

- The matrix is block diagonal.
- Each block is a Jordan block.
- If \(\rho\lt\lambda\) then the block \(\jordan{k}{\rho}\) occupies rows with indices greater than the indices of the rows occupied by \(\jordan{\ell}{\lambda}\text{.}\)
- If \(\rho=\lambda\) and \(\ell\lt k\text{,}\) then the block \(\jordan{\ell}{\lambda}\) occupies rows with indices greater than the indices of the rows occupied by \(\jordan{k}{\lambda}\text{.}\)

###### Theorem 3.3.5. Jordan Canonical Form for a Linear Transformation.

Suppose \(\ltdefn{T}{V}{V}\) is a linear transformation. Then there is a basis \(B\) for \(V\) such that the matrix representation of \(T\) with the following properties:

- The matrix representation is in Jordan canonical form.
- If \(\jordan{k}{\lambda}\) is one of the Jordan blocks, then \(\lambda\) is an eigenvalue of \(T\text{.}\)
- For each eigenvalue \(\lambda\text{,}\) the largest block of the form \(\jordan{k}{\lambda}\) has size equal to the index of \(\lambda\text{,}\) \(\indx{T}{\lambda}\text{.}\)
- For each eigenvalue \(\lambda\text{,}\) the number of blocks of the form \(\jordan{k}{\lambda}\) is the geometric multiplicity of \(\lambda\text{,}\) \(\geomult{T}{\lambda}\text{.}\)
- For each eigenvalue \(\lambda\text{,}\) the number of rows occupied by blocks of the form \(\jordan{k}{\lambda}\) is the algebraic multiplicity of \(\lambda\text{,}\) \(\algmult{T}{\lambda}\text{.}\)

###### Proof.

This theorem is really just the consequence of applying to \(T\text{,}\) consecutively, Theorem Theorem 3.1.10, Theorem Theorem 3.3.3 and Theorem Theorem 3.3.1.

Theorem Theorem 3.1.10 gives us a decomposition of \(V\) into generalized eigenspaces, one for each distinct eigenvalue. Since these generalized eigenspaces are invariant relative to \(T\text{,}\) this provides a block diagonal matrix representation where each block is the matrix representation of the restriction of \(T\) to the generalized eigenspace.

Restricting \(T\) to a generalized eigenspace results in a “nearly nilpotent” linear transformation, as stated more precisely in Theorem Theorem 3.2.8. We unravel Theorem Theorem 3.2.8 in the proof of Theorem Theorem 3.3.3 so that we can apply Theorem Theorem 3.3.1 about representations of nilpotent linear transformation.

We know the dimension of a generalized eigenspace is the algebraic multiplicity of the eigenvalue (Corollary Corollary 3.1.11), so the blocks associated with the generalized eigenspaces are square with a size equal to the algebraic multiplicity. In refining the basis for a block associatred with a single eigenvalue, and producing Jordan blocks, the results of Theorem Theorem 3.3.1 apply. The total number of blocks will be the nullity of \(\restrict{T}{\geneigenspace{T}{\lambda}}-\lambda I_{\geneigenspace{T}{\lambda}}\text{,}\) which is the geometric multiplicity of \(\lambda\) as an eigenvalue of \(T\) (Definition GME). The largest of the Jordan blocks will have size equal to the index of the nilpotent linear transformation \(\restrict{T}{\geneigenspace{T}{\lambda}}-\lambda I_{\geneigenspace{T}{\lambda}}\text{,}\) which is exactly the definition of the index of the eigenvalue \(\lambda\) (Definition Definition 3.2.9).

Before we do some examples of this result, notice how close Jordan canonical form is to a diagonal matrix. Or, equivalently, notice how close we have come to diagonalizing a matrix (Definition DZM). We have a matrix representation which has diagonal entries that are the eigenvalues of a matrix. Each occurs on the diagonal as many times as the algebraic multiplicity. However, when the geometric multiplicity is strictly less than the algebraic multiplicity, we have some entries in the representation just above the diagonal (the “superdiagonal”). Furthermore, we have some idea how often this happens if we know the geometric multiplicity and the index of the eigenvalue.

We now recognize just how plain a diagonalizable linear transformation really is. For each eigenvalue, the generalized eigenspace is just the regular eigenspace, and it decomposes into a direct sum of one-dimensional subspaces, each spanned by a different eigenvector chosen from a basis of eigenvectors for the eigenspace.

Some authors create matrix representations of nilpotent linear transformations where the Jordan block has the ones just below the diagonal (the “subdiagonal”). No matter, it is really the same, just different. We have also defined Jordan canonical form to place blocks for the larger eigenvalues earlier, and for blocks with the same eigenvalue, we place the larger sized blocks earlier. This is fairly standard, but there is no reason we could not order the blocks differently. It would be the same, just different. The reason for choosing *some* ordering is to be assured that there is just *one* canonical matrix representation for each linear transformation.

###### Example 3.3.6. Jordan Canonical Form, Size 10.

Suppose that \(\ltdefn{T}{\complex{10}}{\complex{10}}\) is the linear transformation defined by \(\lteval{T}{\vect{x}}=A\vect{x}\) where

We will find a basis for \(\complex{10}\) that will yield a matrix representation of \(T\) in Jordan canonical form. First we find the eigenvalues, and their multiplicities, with the techniques of Chapter E.

For each eigenvalue, we can compute a generalized eigenspace. By Theorem Theorem 3.1.10 we know that \(\complex{10}\) will decompose into a direct sum of these invariant eigenspaces, and we can restrict \(T\) to each part of this decomposition. At this stage we know that the Jordan canonical form will be block diagonal with blocks of size \(2\text{,}\) \(3\) and \(5\text{,}\) since the dimensions of the generalized eigenspaces are equal to the algebraic multiplicities of the eigenvalues (Theorem Corollary 3.1.11). The geometric multiplicities tell us how many Jordan blocks occupy each of the three larger blocks, but we will discuss this as we analyze each eigenvalue. We do not yet know the index of each eigenvalue (though we can easily infer it for \(\lambda=2\)) and even if we did have this information, it only determines the size of the largest Jordan block (per eigenvalue). We will press ahead, considering each eigenvalue one at a time.

The eigenvalue \(\lambda=2\) has “full” geometric multiplicity, and is not an impediment to diagonalizing \(T\text{.}\) We will treat it in full generality anyway. First we compute the generalized eigenspace. Since Theorem Theorem 3.1.6 says that \(\geneigenspace{T}{2}=\krn{\left(T-2I_{\complex{10}}\right)^{10}}\) we can compute this generalized eigenspace as a null space derived from the matrix \(A\text{,}\)

The restriction of \(T\) to \(\geneigenspace{T}{2}\) relative to the two basis vectors above has a matrix representation that is a \(2\times 2\) diagonal matrix with the eigenvalue \(\lambda=2\) as the diagonal entries. So these two vectors will be the first two vectors in our basis for \(\complex{10}\text{,}\)

Notice that it was not strictly necessary to compute the 10-th power of \(A-2I_{10}\text{.}\) With \(\algmult{T}{2}=\geomult{T}{2}\) the null space of the matrix \(A-2I_{10}\) contains *all* of the generalized eigenvectors of \(T\) for the eigenvalue \(\lambda=2\text{.}\) But there was no harm in computing the 10-th power either. This discussion is equivalent to the observation that the linear transformation \(\ltdefn{\restrict{T}{\geneigenspace{T}{2}}}{\geneigenspace{T}{2}}{\geneigenspace{T}{2}}\) is nilpotent of index \(1\text{.}\) In other words, \(\indx{T}{2}=1\text{.}\)

The eigenvalue \(\lambda=0\) will not be quite as simple, since the geometric multiplicity is strictly less than the geometric multiplicity. As before, we first compute the generalized eigenspace. Since Theorem Theorem 3.1.6 says that \(\geneigenspace{T}{0}=\krn{\left(T-0I_{\complex{10}}\right)^{10}}\) we can compute this generalized eigenspace as a null space derived from the matrix \(A\text{,}\)

So \(\dimension{\geneigenspace{T}{0}}=3=\algmult{T}{0}\text{,}\) as expected. We will use these three basis vectors for the generalized eigenspace to construct a matrix representation of \(\restrict{T}{\geneigenspace{T}{0}}\text{,}\) where \(F\) is being defined implicitly as the basis of \(\geneigenspace{T}{0}\text{.}\) We construct this representation as usual, applying Definition MR,

So we have the matrix representation

By Theorem Theorem 3.2.8 we can obtain a nilpotent matrix from this matrix representation by subtracting the eigenvalue from the diagonal elements, and then we can apply Theorem Theorem 3.3.1 to \(M-(0)I_3\text{.}\) First check that \(\left(M-(0)I_3\right)^2=\zeromatrix\text{,}\) so we know that the index of \(M-(0)I_3\) as a nilpotent matrix, and that therefore \(\lambda=0\) is an eigenvalue of \(T\) with index \(2\text{,}\) \(\indx{T}{0}=2\text{.}\) To determine a basis of \(\complex{3}\) that converts \(M-(0)I_3\) to canonical form, we need the null spaces of the powers of \(M-(0)I_3\text{.}\) For convenience, set \(N=M-(0)I_3\text{.}\)

Then we choose a vector from \(\nsp{N^2}\) that is not an element of \(\nsp{N^1}\text{.}\) Any vector with unequal first two entries will fit the bill, say

where we are employing the notation in Theorem Theorem 3.3.1. The next step is to multiply this vector by \(N\) to get part of the basis for \(\nsp{N^1}\text{,}\)

We need a vector to pair with \(\vect{z}_{1,1}\) that will make a basis for the two-dimensional subspace \(\nsp{N^1}\text{.}\) Examining the basis for \(\nsp{N^1}\) we see that a vector with its first two entries equal will do the job.

Reordering, we find the basis,

From this basis, we can get a matrix representation of \(N\) (when viewed as a linear transformation) relative to the basis \(C\) for \(\complex{3}\text{,}\)

Now we add back the eigenvalue \(\lambda=0\) to the representation of \(N\) to obtain a representation for \(M\text{.}\) Of course, with an eigenvalue of zero, the change is not apparent, so we will not display the same matrix again. This is the second block of the Jordan canonical form for \(T\text{.}\) However, the three vectors in \(C\) will not suffice as basis vectors for the domain of \(T\) — they have the wrong size! The vectors in \(C\) are vectors in the domain of a linear transformation defined by the matrix \(M\text{.}\) But \(M\) was a matrix representation of \(\restrict{T}{\geneigenspace{T}{0}}-0I_{\geneigenspace{T}{0}}\) relative to the basis \(F\) for \(\geneigenspace{T}{0}\text{.}\) We need to “uncoordinatize” each of the basis vectors in \(C\) to produce a linear combination of vectors in \(F\) that will be an element of the generalized eigenspace \(\geneigenspace{T}{0}\text{.}\) These will be the next three vectors of our final answer, a basis for \(\complex{10}\) that has a pleasing matrix representation.

Five down, five to go. Basis vectors, that is. \(\lambda=-1\) is the smallest eigenvalue, but it will require the most computation. First we compute the generalized eigenspace. Since Theorem Theorem 3.1.6 says that \(\geneigenspace{T}{-1}=\krn{\left(T-(-1)I_{\complex{10}}\right)^{10}}\) we can compute this generalized eigenspace as a null space derived from the matrix \(A\text{,}\)

So \(\dimension{\geneigenspace{T}{-1}}=5=\algmult{T}{-1}\text{,}\) as expected. We will use these five basis vectors for the generalized eigenspace to construct a matrix representation of \(\restrict{T}{\geneigenspace{T}{-1}}\text{,}\) where \(F\) is being recycled and defined now implicitly as the basis of \(\geneigenspace{T}{-1}\text{.}\)

We construct this representation as usual, applying Definition MR,

So we have the matrix representation of the restriction of \(T\) (again recycling and redefining the matrix \(M\))

Theorem Theorem 3.2.8 says we can obtain a nilpotent matrix from this matrix representation by subtracting the eigenvalue from the diagonal elements, and then we can apply Theorem Theorem 3.3.1 to \(M-(-1)I_5\text{.}\) First check that \(\left(M-(-1)I_5\right)^3=\zeromatrix\text{,}\) so we know that the index of \(M-(-1)I_5\) as a nilpotent matrix, and that therefore \(\lambda=-1\) is an eigenvalue of \(T\) with index \(3\text{,}\) \(\indx{T}{-1}=3\text{.}\) To determine a basis of \(\complex{5}\) that converts \(M-(-1)I_5\) to canonical form, we need the null spaces of the powers of \(M-(-1)I_5\text{.}\) Again, for convenience, set \(N=M-(-1)I_5\text{.}\)

Then we choose a vector from \(\nsp{N^3}\) that is not an element of \(\nsp{N^2}\text{.}\) The sum of the four basis vectors for \(\nsp{N^2}\) sum to a vector with all five entries equal to \(1\text{.}\) We will adjust with the first entry to create a vector not in \(\nsp{N^2}\text{,}\)

where we are employing the notation in Theorem Theorem 3.3.1. The next step is to multiply this vector by \(N\) to get a portion of the basis for \(\nsp{N^2}\text{,}\)

We have a basis for the two-dimensional subspace \(\nsp{N^1}\) and we can add to that the vector \(\vect{z}_{2,1}\) and we have three of four basis vectors for \(\nsp{N^2}\text{.}\) These three vectors span the subspace we call \(Q_2\text{.}\) We need a fourth vector outside of \(Q_2\) to complete a basis of the four-dimensional subspace \(\nsp{N^2}\text{.}\) Check that the vector

is an element of \(\nsp{N^2}\) that lies outside of the subspace \(Q_2\text{.}\) This vector was constructed by getting a nice basis for \(Q_2\) and forming a linear combination of this basis that specifies three of the five entries of the result. Of the remaining two entries, one was changed to move the vector outside of \(Q_2\) and this was followed by a change to the remaining entry to place the vector into \(\nsp{N^2}\text{.}\) The vector \(\vect{z}_{2,2}\) is the lone basis vector for the subspace we call \(R_2\text{.}\)

The remaining two basis vectors are easy to come by. They are the result of applying \(N\) to each of the two most recently determined basis vectors,

Now we reorder these basis vectors, to arrive at the basis

A matrix representation of \(N\) relative to \(C\) is

To obtain a matrix representation of \(M\text{,}\) we add back in the matrix \((-1)I_5\text{,}\) placing the eigenvalue back along the diagonal, and slightly modifying the Jordan blocks,

The basis \(C\) yields a pleasant matrix representation for the *restriction* of the linear transformation \(T-(-1)I\) to the generalized eigenspace \(\geneigenspace{T}{-1}\text{.}\) However, we must remember that these vectors in \(\complex{5}\) are representations of vectors in \(\complex{10}\) relative to the basis \(F\text{.}\) Each needs to be “un-coordinatized” before joining our final basis. Here we go,

To summarize, we list the entire basis \(B=\set{\vectorlist{v}{10}}\text{,}\)

The resulting matrix representation is

If you are not inclined to check all of these computations, here are a few that should convince you of the amazing properties of the basis \(B\text{.}\) Compute the matrix-vector products \(A\vect{v}_i\text{,}\) \(1\leq i\leq 10\text{.}\) In each case the result will be a vector of the form \(\lambda\vect{v}_i+\delta\vect{v}_{i-1}\text{,}\) where \(\lambda\) is one of the eigenvalues (you should be able to predict ahead of time *which* one) and \(\delta\in\set{0,1}\text{.}\)

Alternatively, if we can write inputs to the linear transformation \(T\) as linear combinations of the vectors in \(B\text{,}\) then the “action” of \(T\) is reduced to a matrix-vector product with the exceedingly simple matrix that is the Jordan canonical form. Wow!