Section O Orthogonality

From A First Course in Linear Algebra
Version 2.20
© 2004.
Licensed under the GNU Free Documentation License.
http://linear.ups.edu/

In this section we define a couple more operations with vectors, and prove a few theorems. At first blush these definitions and results will not appear central to what follows, but we will make use of them at key points in the remainder of the course (such as Section MINM, Section OD). Because we have chosen to use ℂ as our set of scalars, this subsection is a bit more, uh, … complex than it would be for the real numbers. We’ll explain as we go along how things get easier for the real numbers ℝ. If you haven’t already, now would be a good time to review some of the basic properties of arithmetic with complex numbers described in Section CNO. With that done, we can extend the basics of complex number arithmetic to our study of vectors in {ℂ}^{m}.

Subsection CAV: Complex Arithmetic and Vectors

We know how the addition and multiplication of complex numbers is employed in defining the operations for vectors in {ℂ}^{m} (Definition CVA and Definition CVSM). We can also extend the idea of the conjugate to vectors.

Definition CCCV
Complex Conjugate of a Column Vector
Suppose that u is a vector from {ℂ}^{m}. Then the conjugate of the vector, \overline{u}, is defined by

\eqalignno{ {\left [\overline{u}\right ]}_{i} & = \overline{{\left [u\right ]}_{i}} & &\text{$1 ≤ i ≤ m$} & & & & }

With this definition we can show that the conjugate of a column vector behaves as we would expect with regard to vector addition and scalar multiplication.

\eqalignno{ {\left [\overline{x + y}\right ]}_{i} & = \overline{{\left [x + y\right ]}_{i}} & &\text{@(a href="#definition.CCCV")Definition CCCV@(/a)} & & & & \cr & = \overline{{\left [x\right ]}_{i} +{ \left [y\right ]}_{i}} & &\text{@(a href="fcla-jsmath-2.20li23.html#definition.CVA")Definition CVA@(/a)} & & & & \cr & = \overline{{\left [x\right ]}_{i}} + \overline{{\left [y\right ]}_{i}} & &\text{@(a href="fcla-jsmath-2.20li69.html#theorem.CCRA")Theorem CCRA@(/a)} & & & & \cr & ={ \left [\overline{x}\right ]}_{i} +{ \left [\overline{y}\right ]}_{i} & &\text{@(a href="#definition.CCCV")Definition CCCV@(/a)} & & & & \cr & ={ \left [\overline{x} + \overline{y}\right ]}_{i} & &\text{@(a href="fcla-jsmath-2.20li23.html#definition.CVA")Definition CVA@(/a)} & & & & }

Then by Definition CVE we have \overline{x + y} = \overline{x} + \overline{y}. ■

\eqalignno{ {\left [\overline{αx}\right ]}_{i} & = \overline{{\left [αx\right ]}_{i}} & &\text{@(a href="#definition.CCCV")Definition CCCV@(/a)} & & & & \cr & = \overline{α{\left [x\right ]}_{i}} & &\text{@(a href="fcla-jsmath-2.20li23.html#definition.CVSM")Definition CVSM@(/a)} & & & & \cr & = \overline{α}\kern 1.95872pt \overline{{\left [x\right ]}_{i}} & &\text{@(a href="fcla-jsmath-2.20li69.html#theorem.CCRM")Theorem CCRM@(/a)} & & & & \cr & = \overline{α}\kern 1.95872pt {\left [\overline{x}\right ]}_{i} & &\text{@(a href="#definition.CCCV")Definition CCCV@(/a)} & & & & \cr & ={ \left [\overline{α}\kern 1.95872pt \overline{x}\right ]}_{i} & &\text{@(a href="fcla-jsmath-2.20li23.html#definition.CVSM")Definition CVSM@(/a)} & & & & \cr & & & & }

Then by Definition CVE we have \overline{αx} = \overline{α}\kern 1.95872pt \overline{x}. ■

These two theorems together tell us how we can “push” complex conjugation through linear combinations.

Subsection IP: Inner products

Definition IP
Inner Product
Given the vectors u,\kern 1.95872pt v ∈ {ℂ}^{m} the inner product of u and v is the scalar quantity in {ℂ}^{},

This operation is a bit different in that we begin with two vectors but produce a scalar. Computing one is straightforward.

\eqalignno{ u = \left [\array{ 2 + 3i \cr 5 + 2i \cr −3 + i } \right ] & &\text{and} & &v = \left [\array{ 1 + 2i \cr −4 + 5i \cr 0 + 5i } \right ] & & & & & & }

\eqalignno{ \left \langle u,\kern 1.95872pt v\right \rangle & = (2 + 3i)(\overline{1 + 2i}) + (5 + 2i)(\overline{ − 4 + 5i}) + (−3 + i)(\overline{0 + 5i}) & & \cr & = (2 + 3i)(1 − 2i) + (5 + 2i)(−4 − 5i) + (−3 + i)(0 − 5i) & & \cr & = (8 − i) + (−10 − 33i) + (5 + 15i) & & \cr & = 3 − 19i & & }

\eqalignno{ w = \left [\array{ 2\cr 4 \cr −3\cr 2 \cr 8 } \right ] & &\text{and} & &x = \left [\array{ 3\cr 1 \cr 0\cr −1 \cr −2 } \right ] & & & & & & }

In the case where the entries of our vectors are all real numbers (as in the second part of Example CSIP), the computation of the inner product may look familiar and be known to you as a dot product or scalar product. So you can view the inner product as a generalization of the scalar product to vectors from {ℂ}^{m} (rather than {ℝ}^{m}).

Also, note that we have chosen to conjugate the entries of the second vector listed in the inner product, while many authors choose to conjugate entries from the first component. It really makes no difference which choice is made, it just requires that subsequent definitions and theorems are consistent with the choice. You can study the conclusion of Theorem IPAC as an explanation of the magnitude of the difference that results from this choice. But be careful as you read other treatments of the inner product or its use in applications, and be sure you know ahead of time which choice has been made.

There are several quick theorems we can now prove, and they will each be useful later.

\eqalignno{ \text{1.}\quad \left \langle u + v,\kern 1.95872pt w\right \rangle & = \left \langle u,\kern 1.95872pt w\right \rangle + \left \langle v,\kern 1.95872pt w\right \rangle & & \cr \text{2.}\quad \left \langle u,\kern 1.95872pt v + w\right \rangle & = \left \langle u,\kern 1.95872pt v\right \rangle + \left \langle u,\kern 1.95872pt w\right \rangle & & }

Proof The proofs of the two parts are very similar, with the second one requiring just a bit more effort due to the conjugation that occurs. We will prove part 2 and you can prove part 1 (Exercise O.T10).

\eqalignno{ \left \langle u,\kern 1.95872pt v + w\right \rangle & ={ \mathop{∑ }}_{i=1}^{m}{\left [u\right ]}_{ i}\overline{{\left [v + w\right ]}_{i}} & &\text{@(a href="#definition.IP")Definition IP@(/a)} & & & & \cr & ={ \mathop{∑ }}_{i=1}^{m}{\left [u\right ]}_{ i}(\overline{{\left [v\right ]}_{i} +{ \left [w\right ]}_{i}}) & &\text{@(a href="fcla-jsmath-2.20li23.html#definition.CVA")Definition CVA@(/a)} & & & & \cr & ={ \mathop{∑ }}_{i=1}^{m}{\left [u\right ]}_{ i}(\overline{{\left [v\right ]}_{i}} + \overline{{\left [w\right ]}_{i}}) & &\text{@(a href="fcla-jsmath-2.20li69.html#theorem.CCRA")Theorem CCRA@(/a)} & & & & \cr & ={ \mathop{∑ }}_{i=1}^{m}{\left [u\right ]}_{ i}\overline{{\left [v\right ]}_{i}} +{ \left [u\right ]}_{i}\overline{{\left [w\right ]}_{i}} & &\text{@(a href="fcla-jsmath-2.20li69.html#property.DCN")Property DCN@(/a)} & & & & \cr & ={ \mathop{∑ }}_{i=1}^{m}{\left [u\right ]}_{ i}\overline{{\left [v\right ]}_{i}} +{ \mathop{∑ }}_{i=1}^{m}{\left [u\right ]}_{ i}\overline{{\left [w\right ]}_{i}} & &\text{@(a href="fcla-jsmath-2.20li69.html#property.CACN")Property CACN@(/a)} & & & & \cr & = \left \langle u,\kern 1.95872pt v\right \rangle + \left \langle u,\kern 1.95872pt w\right \rangle & &\text{@(a href="#definition.IP")Definition IP@(/a)} & & & & }

\eqalignno{ \text{1.}\quad \left \langle αu,\kern 1.95872pt v\right \rangle & = α\left \langle u,\kern 1.95872pt v\right \rangle & & \cr \text{2.}\quad \left \langle u,\kern 1.95872pt αv\right \rangle & = \overline{α}\left \langle u,\kern 1.95872pt v\right \rangle & & }

\eqalignno{ \left \langle u,\kern 1.95872pt αv\right \rangle & ={ \mathop{∑ }}_{i=1}^{m}{\left [u\right ]}_{ i}\overline{{\left [αv\right ]}_{i}} & &\text{@(a href="#definition.IP")Definition IP@(/a)} & & & & \cr & ={ \mathop{∑ }}_{i=1}^{m}{\left [u\right ]}_{ i}\overline{α{\left [v\right ]}_{i}} & &\text{@(a href="fcla-jsmath-2.20li23.html#definition.CVSM")Definition CVSM@(/a)} & & & & \cr & ={ \mathop{∑ }}_{i=1}^{m}{\left [u\right ]}_{ i}\overline{α}\kern 1.95872pt \overline{{\left [v\right ]}_{i}} & &\text{@(a href="fcla-jsmath-2.20li69.html#theorem.CCRM")Theorem CCRM@(/a)} & & & & \cr & ={ \mathop{∑ }}_{i=1}^{m}\overline{α}{\left [u\right ]}_{ i}\overline{{\left [v\right ]}_{i}} & &\text{@(a href="fcla-jsmath-2.20li69.html#property.CMCN")Property CMCN@(/a)} & & & & \cr & = \overline{α}{\mathop{∑ }}_{i=1}^{m}{\left [u\right ]}_{ i}\overline{{\left [v\right ]}_{i}} & &\text{@(a href="fcla-jsmath-2.20li69.html#property.DCN")Property DCN@(/a)} & & & & \cr & = \overline{α}\left \langle u,\kern 1.95872pt v\right \rangle & &\text{@(a href="#definition.IP")Definition IP@(/a)} & & & & }

Theorem IPAC
Inner Product is Anti-Commutative
Suppose that u and v are vectors in {ℂ}^{m}. Then \left \langle u,\kern 1.95872pt v\right \rangle = \overline{\left \langle v,\kern 1.95872pt u\right \rangle }. □

\eqalignno{ \left \langle u,\kern 1.95872pt v\right \rangle & ={ \mathop{∑ }}_{i=1}^{m}{\left [u\right ]}_{ i}\overline{{\left [v\right ]}_{i}} & &\text{@(a href="#definition.IP")Definition IP@(/a)} & & & & \cr & ={ \mathop{∑ }}_{i=1}^{m}\overline{\overline{{\left [u\right ]}_{ i}}}\kern 1.95872pt \overline{{\left [v\right ]}_{i}} & &\text{@(a href="fcla-jsmath-2.20li69.html#theorem.CCT")Theorem CCT@(/a)} & & & & \cr & ={ \mathop{∑ }}_{i=1}^{m}\overline{\overline{{\left [u\right ]}_{ i}}{\left [v\right ]}_{i}} & &\text{@(a href="fcla-jsmath-2.20li69.html#theorem.CCRM")Theorem CCRM@(/a)} & & & & \cr & = \overline{\left ({\mathop{∑ }}_{i=1}^{m}\overline{{\left [u\right ]}_{ i}}{\left [v\right ]}_{i}\right )} & &\text{@(a href="fcla-jsmath-2.20li69.html#theorem.CCRA")Theorem CCRA@(/a)} & & & & \cr & = \overline{\left ({\mathop{∑ }}_{i=1}^{m}{\left [v\right ]}_{ i}\overline{{\left [u\right ]}_{i}}\right )} & &\text{@(a href="fcla-jsmath-2.20li69.html#property.CMCN")Property CMCN@(/a)} & & & & \cr & = \overline{\left \langle v,\kern 1.95872pt u\right \rangle } & &\text{@(a href="#definition.IP")Definition IP@(/a)} & & & & \cr & & & & }

Subsection N: Norm

If treating linear algebra in a more geometric fashion, the length of a vector occurs naturally, and is what you would expect from its name. With complex numbers, we will define a similar function. Recall that if c is a complex number, then \left \vert c\right \vert denotes its modulus (Definition MCN).

Definition NV
Norm of a Vector
The norm of the vector u is the scalar quantity in {ℂ}^{}

Notice how the norm of a vector with real number entries is just the length of the vector. Inner products and norms are related by the following theorem.

Theorem IPN
Inner Products and Norms
Suppose that u is a vector in {ℂ}^{m}. Then {\left \Vert u\right \Vert }^{2} = \left \langle u,\kern 1.95872pt u\right \rangle . □

\eqalignno{ {\left \Vert u\right \Vert }^{2} & ={ \left (\sqrt{{\mathop{∑ }}_{i=1}^{m}{\left \vert {\left [u\right ]}_{i}\right \vert }^{2}}\right )}^{2} & &\text{@(a href="#definition.NV")Definition NV@(/a)} & & & & \cr & ={ \mathop{∑ }}_{i=1}^{m}{\left \vert {\left [u\right ]}_{ i}\right \vert }^{2} & & & & \cr & ={ \mathop{∑ }}_{i=1}^{m}{\left [u\right ]}_{ i}\overline{{\left [u\right ]}_{i}} & &\text{@(a href="fcla-jsmath-2.20li69.html#definition.MCN")Definition MCN@(/a)} & & & & \cr & = \left \langle u,\kern 1.95872pt u\right \rangle & &\text{@(a href="#definition.IP")Definition IP@(/a)} & & & & }

When our vectors have entries only from the real numbers Theorem IPN says that the dot product of a vector with itself is equal to the length of the vector squared.

Theorem PIP
Positive Inner Products
Suppose that u is a vector in {ℂ}^{m}. Then \left \langle u,\kern 1.95872pt u\right \rangle ≥ 0 with equality if and only if u = 0. □

Since each modulus is squared, every term is positive, and the sum must also be positive. (Notice that in general the inner product is a complex number and cannot be compared with zero, but in the special case of \left \langle u,\kern 1.95872pt u\right \rangle the result is a real number.) The phrase, “with equality if and only if” means that we want to show that the statement \left \langle u,\kern 1.95872pt u\right \rangle = 0 (i.e. with equality) is equivalent (“if and only if”) to the statement u = 0.

If u = 0, then it is a straightforward computation to see that \left \langle u,\kern 1.95872pt u\right \rangle = 0. In the other direction, assume that \left \langle u,\kern 1.95872pt u\right \rangle = 0. As before, \left \langle u,\kern 1.95872pt u\right \rangle is a sum of moduli. So we have

Now we have a sum of squares equaling zero, so each term must be zero. Then by similar logic, \left \vert {\left [u\right ]}_{i}\right \vert = 0 will imply that {\left [u\right ]}_{i} = 0, since 0 + 0i is the only complex number with zero modulus. Thus every entry of u is zero and so u = 0, as desired. ■

\eqalignno{ u ∈ {ℂ}^{m} & ⇒\left \langle u,\kern 1.95872pt u\right \rangle ≥ 0 & & \cr u = 0 & ⇒\left \langle u,\kern 1.95872pt u\right \rangle = 0 & & \cr \left \langle u,\kern 1.95872pt u\right \rangle = 0 & ⇒ u = 0 & & }

The results contained in Theorem PIP are summarized by saying “the inner product is positive definite.”

Subsection OV: Orthogonal Vectors

“Orthogonal” is a generalization of “perpendicular.” You may have used mutually perpendicular vectors in a physics class, or you may recall from a calculus class that perpendicular vectors have a zero dot product. We will now extend these ideas into the realm of higher dimensions and complex scalars.

Definition OV
Orthogonal Vectors
A pair of vectors, u and v, from {ℂ}^{m} are orthogonal if their inner product is zero, that is, \left \langle u,\kern 1.95872pt v\right \rangle = 0. △

\eqalignno{ u & = \left [\array{ 2 + 3i \cr 4 − 2i \cr 1 + i \cr 1 + i } \right ] &v & = \left [\array{ 1 − i \cr 2 + 3i \cr 4 − 6i\cr 1 } \right ] & & & & }

\eqalignno{ \left \langle u,\kern 1.95872pt v\right \rangle & = (2 + 3i)(1 + i) + (4 − 2i)(2 − 3i) + (1 + i)(4 + 6i) + (1 + i)(1) & & \cr & = (−1 + 5i) + (2 − 16i) + (−2 + 10i) + (1 + i) & & \cr & = 0 + 0i. & & }

We extend this definition to whole sets by requiring vectors to be pairwise orthogonal. Despite using the same word, careful thought about what objects you are using will eliminate any source of confusion.

Definition OSV
Orthogonal Set of Vectors
Suppose that S = \left \{{u}_{1},\kern 1.95872pt {u}_{2},\kern 1.95872pt {u}_{3},\kern 1.95872pt \mathop{\mathop{…}},\kern 1.95872pt {u}_{n}\right \} is a set of vectors from {ℂ}^{m}. Then S is an orthogonal set if every pair of different vectors from S is orthogonal, that is \left \langle {u}_{i},\kern 1.95872pt {u}_{j}\right \rangle = 0 whenever i\mathrel{≠}j. △

We now define the prototypical orthogonal set, which we will reference repeatedly.

Definition SUV
Standard Unit Vectors
Let {e}_{j} ∈ {ℂ}^{m}, 1 ≤ j ≤ m denote the column vectors defined by

\eqalignno{ {\left [{e}_{j}\right ]}_{i} & = \left \{\array{ 0\quad &\text{if $i\mathrel{≠}j$}\cr 1\quad &\text{if $i = j$} } \right . & & }

\eqalignno{ \left \{{e}_{1},\kern 1.95872pt {e}_{2},\kern 1.95872pt {e}_{3},\kern 1.95872pt \mathop{\mathop{…}},\kern 1.95872pt {e}_{m}\right \} & = \left \{{e}_{j}\mathrel{∣}1 ≤ j ≤ m\right \} & & }

Notice that {e}_{j} is identical to column j of the m × m identity matrix {I}_{m} (Definition IM). This observation will often be useful. It is not hard to see that the set of standard unit vectors is an orthogonal set. We will reserve the notation {e}_{i} for these vectors.

\eqalignno{ \left \langle {e}_{i},\kern 1.95872pt {e}_{j}\right \rangle & = 0\overline{0} + 0\overline{0} + \mathrel{⋯} + 1\overline{0} + \mathrel{⋯} + 0\overline{0} + \mathrel{⋯} + 0\overline{1} + \mathrel{⋯} + 0\overline{0} + 0\overline{0} & & \cr & = 0(0) + 0(0) + \mathrel{⋯} + 1(0) + \mathrel{⋯} + 0(1) + \mathrel{⋯} + 0(0) + 0(0) & & \cr & = 0 & & }

So the set \left \{{e}_{1},\kern 1.95872pt {e}_{2},\kern 1.95872pt {e}_{3},\kern 1.95872pt \mathop{\mathop{…}},\kern 1.95872pt {e}_{m}\right \} is an orthogonal set. ⊠

is an orthogonal set. Since the inner product is anti-commutative (Theorem IPAC) we can test pairs of different vectors in any order. If the result is zero, then it will also be zero if the inner product is computed in the opposite order. This means there are six pairs of different vectors to use in an inner product computation. We’ll do two and you can practice your inner products on the other four.

\eqalignno{ \left \langle {x}_{1},\kern 1.95872pt {x}_{3}\right \rangle & = (1 + i)(−7 − 34i) + (1)(−8 + 23i) + (1 − i)(−10 − 22i) + (i)(30 − 13i)&& \cr & = (27 − 41i) + (−8 + 23i) + (−32 − 12i) + (13 + 30i) && \cr & = 0 + 0i && \text{and} \cr \left \langle {x}_{2},\kern 1.95872pt {x}_{4}\right \rangle & = (1 + 5i)(−2 + 4i) + (6 + 5i)(6 − i) + (−7 − i)(4 − 3i) + (1 − 6i)(6 + i)&& \cr & = (−22 − 6i) + (41 + 24i) + (−31 + 17i) + (12 − 35i) && \cr & = 0 + 0i &&

}

So far, this section has seen lots of definitions, and lots of theorems establishing un-surprising consequences of those definitions. But here is our first theorem that suggests that inner products and orthogonal vectors have some utility. It is also one of our first illustrations of how to arrive at linear independence as the conclusion of a theorem.

Proof Let S = \left \{{u}_{1},\kern 1.95872pt {u}_{2},\kern 1.95872pt {u}_{3},\kern 1.95872pt \mathop{\mathop{…}},\kern 1.95872pt {u}_{n}\right \} be an orthogonal set of nonzero vectors. To prove the linear independence of S, we can appeal to the definition (Definition LICV) and begin with an arbitrary relation of linear dependence (Definition RLDCV),

\eqalignno{ {α}_{i}& = {1\over \left \langle {u}_{i},\kern 1.95872pt {u}_{i}\right \rangle }\left ({α}_{i}\left \langle {u}_{i},\kern 1.95872pt {u}_{i}\right \rangle \right ) &&\text{@(a href="#theorem.PIP")Theorem PIP@(/a)} &&&& \cr & = {1\over \left \langle {u}_{i},\kern 1.95872pt {u}_{i}\right \rangle }\left ({α}_{1}(0) + {α}_{2}(0) + \mathrel{⋯} + {α}_{i}\left \langle {u}_{i},\kern 1.95872pt {u}_{i}\right \rangle + \mathrel{⋯} + {α}_{n}(0)\right )&&\text{@(a href="fcla-jsmath-2.20li69.html#property.ZCN")Property ZCN@(/a)} &&&& \cr & = {1\over \left \langle {u}_{i},\kern 1.95872pt {u}_{i}\right \rangle }\left ({α}_{1}\left \langle {u}_{1},\kern 1.95872pt {u}_{i}\right \rangle + \mathrel{⋯} + {α}_{i}\left \langle {u}_{i},\kern 1.95872pt {u}_{i}\right \rangle + \mathrel{⋯} + {α}_{n}\left \langle {u}_{n},\kern 1.95872pt {u}_{i}\right \rangle \right ) &&\text{@(a href="#definition.OSV")Definition OSV@(/a)} &&&& \cr & = {1\over \left \langle {u}_{i},\kern 1.95872pt {u}_{i}\right \rangle }\left (\left \langle {α}_{1}{u}_{1},\kern 1.95872pt {u}_{i}\right \rangle + \left \langle {α}_{2}{u}_{2},\kern 1.95872pt {u}_{i}\right \rangle + \mathrel{⋯} + \left \langle {α}_{n}{u}_{n},\kern 1.95872pt {u}_{i}\right \rangle \right ) &&\text{@(a href="#theorem.IPSM")Theorem IPSM@(/a)} &&&& \cr & = {1\over \left \langle {u}_{i},\kern 1.95872pt {u}_{i}\right \rangle }\left \langle {α}_{1}{u}_{1} + {α}_{2}{u}_{2} + {α}_{3}{u}_{3} + \mathrel{⋯} + {α}_{n}{u}_{n},\kern 1.95872pt {u}_{i}\right \rangle &&\text{@(a href="#theorem.IPVA")Theorem IPVA@(/a)} &&&& \cr & = {1\over \left \langle {u}_{i},\kern 1.95872pt {u}_{i}\right \rangle }\left \langle 0,\kern 1.95872pt {u}_{i}\right \rangle &&\text{@(a href="fcla-jsmath-2.20li26.html#definition.RLDCV")Definition RLDCV@(/a)}&&&& \cr & = {1\over \left \langle {u}_{i},\kern 1.95872pt {u}_{i}\right \rangle }\kern 1.95872pt 0 &&\text{@(a href="#definition.IP")Definition IP@(/a)} &&&& \cr & = 0 &&\text{@(a href="fcla-jsmath-2.20li69.html#property.ZCN")Property ZCN@(/a)} &&&& }

So we conclude that {α}_{i} = 0 for all 1 ≤ i ≤ n in any relation of linear dependence on S. But this says that S is a linearly independent set since the only way to form a relation of linear dependence is the trivial way (Definition LICV). Boom! ■

Subsection GSP: Gram-Schmidt Procedure

The Gram-Schmidt Procedure is really a theorem. It says that if we begin with a linearly independent set of p vectors, S, then we can do a number of calculations with these vectors and produce an orthogonal set of p vectors, T, so that \left \langle S\right \rangle = \left \langle T\right \rangle . Given the large number of computations involved, it is indeed a procedure to do all the necessary computations, and it is best employed on a computer. However, it also has value in proofs where we may on occasion wish to replace a linearly independent set by an orthogonal set.

This is our first occasion to use the technique of “mathematical induction” for a proof, a technique we will see again several times, especially in Chapter D. So study the simple example described in Technique I first.

Theorem GSP
Gram-Schmidt Procedure
Suppose that S = \left \{{v}_{1},\kern 1.95872pt {v}_{2},\kern 1.95872pt {v}_{3},\kern 1.95872pt \mathop{\mathop{…}},\kern 1.95872pt {v}_{p}\right \} is a linearly independent set of vectors in {ℂ}^{m}. Define the vectors {u}_{i}, 1 ≤ i ≤ p by

Then if T = \left \{{u}_{1},\kern 1.95872pt {u}_{2},\kern 1.95872pt {u}_{3},\kern 1.95872pt \mathop{\mathop{…}},\kern 1.95872pt {u}_{p}\right \}, then T is an orthogonal set of non-zero vectors, and \left \langle T\right \rangle = \left \langle S\right \rangle . □

Proof We will prove the result by using induction on p (Technique I). To begin, we prove that T has the desired properties when p = 1. In this case {u}_{1} = {v}_{1} and T = \left \{{u}_{1}\right \} = \left \{{v}_{1}\right \} = S. Because S and T are equal, \left \langle S\right \rangle = \left \langle T\right \rangle . Equally trivial, T is an orthogonal set. If {u}_{1} = 0, then S would be a linearly dependent set, a contradiction.

Suppose that the theorem is true for any set of p − 1 linearly independent vectors. Let S = \left \{{v}_{1},\kern 1.95872pt {v}_{2},\kern 1.95872pt {v}_{3},\kern 1.95872pt \mathop{\mathop{…}},\kern 1.95872pt {v}_{p}\right \} be a linearly independent set of p vectors. Then {S}^{′} = \left \{{v}_{ 1},\kern 1.95872pt {v}_{2},\kern 1.95872pt {v}_{3},\kern 1.95872pt \mathop{\mathop{…}},\kern 1.95872pt {v}_{p−1}\right \} is also linearly independent. So we can apply the theorem to {S}^{′} and construct the vectors {T}^{′} = \left \{{u}_{ 1},\kern 1.95872pt {u}_{2},\kern 1.95872pt {u}_{3},\kern 1.95872pt \mathop{\mathop{…}},\kern 1.95872pt {u}_{p−1}\right \}. {T}^{′} is therefore an orthogonal set of nonzero vectors and \left \langle {S}^{′}\right \rangle = \left \langle {T}^{′}\right \rangle . Define

and let T = {T}^{′}∪\left \{{u}_{ p}\right \}. We need to now show that T has several properties by building on what we know about {T}^{′}. But first notice that the above equation has no problems with the denominators (\left \langle {u}_{i},\kern 1.95872pt {u}_{i}\right \rangle ) being zero, since the {u}_{i} are from {T}^{′}, which is composed of nonzero vectors.

We show that \left \langle T\right \rangle = \left \langle S\right \rangle , by first establishing that \left \langle T\right \rangle ⊆\left \langle S\right \rangle . Suppose x ∈\left \langle T\right \rangle , so

The term {a}_{p}{u}_{p} is a linear combination of vectors from {T}^{′} and the vector {v}_{p}, while the remaining terms are a linear combination of vectors from {T}^{′}. Since \left \langle {T}^{′}\right \rangle = \left \langle {S}^{′}\right \rangle , any term that is a multiple of a vector from {T}^{′} can be rewritten as a linear combination of vectors from {S}^{′}. The remaining term {a}_{p}{v}_{p} is a multiple of a vector in S. So we see that x can be rewritten as a linear combination of vectors from S, i.e. x ∈\left \langle S\right \rangle .

To show that \left \langle S\right \rangle ⊆\left \langle T\right \rangle , begin with y ∈\left \langle S\right \rangle , so

Rearrange our defining equation for {u}_{p} by solving for {v}_{p}. Then the term {a}_{p}{v}_{p} is a multiple of a linear combination of elements of T. The remaining terms are a linear combination of {v}_{1},\kern 1.95872pt {v}_{2},\kern 1.95872pt {v}_{3},\kern 1.95872pt \mathop{\mathop{…}},\kern 1.95872pt {v}_{p−1}, hence an element of \left \langle {S}^{′}\right \rangle = \left \langle {T}^{′}\right \rangle . Thus these remaining terms can be written as a linear combination of the vectors in {T}^{′}. So y is a linear combination of vectors from T, i.e. y ∈\left \langle T\right \rangle .

The elements of {T}^{′} are nonzero, but what about {u}_{p}? Suppose to the contrary that {u}_{p} = 0,

\eqalignno{ 0 & = {u}_{p} = {v}_{p} −{\left \langle {v}_{p},\kern 1.95872pt {u}_{1}\right \rangle \over \left \langle {u}_{1},\kern 1.95872pt {u}_{1}\right \rangle }{u}_{1} −{\left \langle {v}_{p},\kern 1.95872pt {u}_{2}\right \rangle \over \left \langle {u}_{2},\kern 1.95872pt {u}_{2}\right \rangle }{u}_{2} −{\left \langle {v}_{p},\kern 1.95872pt {u}_{3}\right \rangle \over \left \langle {u}_{3},\kern 1.95872pt {u}_{3}\right \rangle }{u}_{3} −\mathrel{⋯} − {\left \langle {v}_{p},\kern 1.95872pt {u}_{p−1}\right \rangle \over \left \langle {u}_{p−1},\kern 1.95872pt {u}_{p−1}\right \rangle }{u}_{p−1} & & \cr &{v}_{p} = {\left \langle {v}_{p},\kern 1.95872pt {u}_{1}\right \rangle \over \left \langle {u}_{1},\kern 1.95872pt {u}_{1}\right \rangle }{u}_{1} + {\left \langle {v}_{p},\kern 1.95872pt {u}_{2}\right \rangle \over \left \langle {u}_{2},\kern 1.95872pt {u}_{2}\right \rangle }{u}_{2} + {\left \langle {v}_{p},\kern 1.95872pt {u}_{3}\right \rangle \over \left \langle {u}_{3},\kern 1.95872pt {u}_{3}\right \rangle }{u}_{3} + \mathrel{⋯} + {\left \langle {v}_{p},\kern 1.95872pt {u}_{p−1}\right \rangle \over \left \langle {u}_{p−1},\kern 1.95872pt {u}_{p−1}\right \rangle }{u}_{p−1} & & }

Since \left \langle {S}^{′}\right \rangle = \left \langle {T}^{′}\right \rangle we can write the vectors {u}_{1},\kern 1.95872pt {u}_{2},\kern 1.95872pt {u}_{3},\kern 1.95872pt \mathop{\mathop{…}},\kern 1.95872pt {u}_{p−1} on the right side of this equation in terms of the vectors {v}_{1},\kern 1.95872pt {v}_{2},\kern 1.95872pt {v}_{3},\kern 1.95872pt \mathop{\mathop{…}},\kern 1.95872pt {v}_{p−1} and we then have the vector {v}_{p} expressed as a linear combination of the other p − 1 vectors in S, implying that S is a linearly dependent set (Theorem DLDS), contrary to our lone hypothesis about S.

Finally, it is a simple matter to establish that T is an orthogonal set, though it will not appear so simple looking. Think about your objects as you work through the following — what is a vector and what is a scalar. Since {T}^{′} is an orthogonal set by induction, most pairs of elements in T are already known to be orthogonal. We just need to test “new” inner products, between {u}_{p} and {u}_{i}, for 1 ≤ i ≤ p − 1. Here we go, using summation notation,

\eqalignno{ \left \langle {u}_{p},\kern 1.95872pt {u}_{i}\right \rangle & = \left \langle {v}_{p} −{\mathop{∑ }}_{k=1}^{p−1} {\left \langle {v}_{p},\kern 1.95872pt {u}_{k}\right \rangle \over \left \langle {u}_{k},\kern 1.95872pt {u}_{k}\right \rangle }{u}_{k},\kern 1.95872pt {u}_{i}\right \rangle & & & & \cr & = \left \langle {v}_{p},\kern 1.95872pt {u}_{i}\right \rangle −\left \langle {\mathop{∑ }}_{k=1}^{p−1} {\left \langle {v}_{p},\kern 1.95872pt {u}_{k}\right \rangle \over \left \langle {u}_{k},\kern 1.95872pt {u}_{k}\right \rangle }{u}_{k},\kern 1.95872pt {u}_{i}\right \rangle & &\text{@(a href="#theorem.IPVA")Theorem IPVA@(/a)} & & & & \cr & = \left \langle {v}_{p},\kern 1.95872pt {u}_{i}\right \rangle −{\mathop{∑ }}_{k=1}^{p−1}\left \langle {\left \langle {v}_{p},\kern 1.95872pt {u}_{k}\right \rangle \over \left \langle {u}_{k},\kern 1.95872pt {u}_{k}\right \rangle }{u}_{k},\kern 1.95872pt {u}_{i}\right \rangle & &\text{@(a href="#theorem.IPVA")Theorem IPVA@(/a)} & & & & \cr & = \left \langle {v}_{p},\kern 1.95872pt {u}_{i}\right \rangle −{\mathop{∑ }}_{k=1}^{p−1} {\left \langle {v}_{p},\kern 1.95872pt {u}_{k}\right \rangle \over \left \langle {u}_{k},\kern 1.95872pt {u}_{k}\right \rangle }\left \langle {u}_{k},\kern 1.95872pt {u}_{i}\right \rangle & &\text{@(a href="#theorem.IPSM")Theorem IPSM@(/a)} & & & & \cr & = \left \langle {v}_{p},\kern 1.95872pt {u}_{i}\right \rangle −{\left \langle {v}_{p},\kern 1.95872pt {u}_{i}\right \rangle \over \left \langle {u}_{i},\kern 1.95872pt {u}_{i}\right \rangle }\left \langle {u}_{i},\kern 1.95872pt {u}_{i}\right \rangle −{\mathop{∑ }}_{k\mathrel{≠}i}{\left \langle {v}_{p},\kern 1.95872pt {u}_{k}\right \rangle \over \left \langle {u}_{k},\kern 1.95872pt {u}_{k}\right \rangle }(0) & &\text{Induction Hypothesis} & & & & \cr & = \left \langle {v}_{p},\kern 1.95872pt {u}_{i}\right \rangle −\left \langle {v}_{p},\kern 1.95872pt {u}_{i}\right \rangle −{\mathop{∑ }}_{k\mathrel{≠}i}0 & & & & \cr & = 0 & & & & }

Example GSTV
Gram-Schmidt of three vectors
We will illustrate the Gram-Schmidt process with three vectors. Begin with the linearly independent (check this!) set

\eqalignno{ {u}_{1} & = {v}_{1} = \left [\array{ 1\cr 1 + i \cr 1 } \right ] & & \cr {u}_{2} & = {v}_{2} −{\left \langle {v}_{2},\kern 1.95872pt {u}_{1}\right \rangle \over \left \langle {u}_{1},\kern 1.95872pt {u}_{1}\right \rangle }{u}_{1} = {1\over 4}\left [\array{ −2 − 3i \cr 1 − i \cr 2 + 5i } \right ] & & \cr {u}_{3} & = {v}_{3} −{\left \langle {v}_{3},\kern 1.95872pt {u}_{1}\right \rangle \over \left \langle {u}_{1},\kern 1.95872pt {u}_{1}\right \rangle }{u}_{1} −{\left \langle {v}_{3},\kern 1.95872pt {u}_{2}\right \rangle \over \left \langle {u}_{2},\kern 1.95872pt {u}_{2}\right \rangle }{u}_{2} = {1\over 11}\left [\array{ −3 − i \cr 1 + 3i \cr −1 − i } \right ] & & }

is an orthogonal set (which you can check) of nonzero vectors and \left \langle T\right \rangle = \left \langle S\right \rangle (all by Theorem GSP). Of course, as a by-product of orthogonality, the set T is also linearly independent (Theorem OSLI). ⊠

Definition ONS
OrthoNormal Set
Suppose S = \left \{{u}_{1},\kern 1.95872pt {u}_{2},\kern 1.95872pt {u}_{3},\kern 1.95872pt \mathop{\mathop{…}},\kern 1.95872pt {u}_{n}\right \} is an orthogonal set of vectors such that \left \Vert {u}_{i}\right \Vert = 1 for all 1 ≤ i ≤ n. Then S is an orthonormal set of vectors. △

Once you have an orthogonal set, it is easy to convert it to an orthonormal set — multiply each vector by the reciprocal of its norm, and the resulting vector will have norm 1. This scaling of each vector will not affect the orthogonality properties (apply Theorem IPSM).

\eqalignno{ \left \Vert {u}_{1}\right \Vert = 2 & &\left \Vert {u}_{2}\right \Vert = {1\over 2}\sqrt{11} & &\left \Vert {u}_{3}\right \Vert = {\sqrt{2}\over \sqrt{11}} & & & & & & }

\eqalignno{ {w}_{1} & = {1\over 2}\left [\array{ 1\cr 1 + i \cr 1 } \right ] & & \cr {w}_{2} & = {1\over {1\over 2}\sqrt{11}} {1\over 4}\left [\array{ −2 − 3i \cr 1 − i \cr 2 + 5i } \right ] = {1\over 2\sqrt{11}}\left [\array{ −2 − 3i \cr 1 − i \cr 2 + 5i } \right ] & & \cr {w}_{3} & = {1\over {\sqrt{2}\over \sqrt{11}}} {1\over 11}\left [\array{ −3 − i \cr 1 + 3i \cr −1 − i } \right ] = {1\over \sqrt{22}}\left [\array{ −3 − i \cr 1 + 3i \cr −1 − i } \right ] & & }

to an orthogonal set via the Gram-Schmidt Process (Theorem GSP) and then scale the vectors to norm 1 to create an orthonormal set. You should get the same set you would if you scaled the orthogonal set of Example AOS to become an orthonormal set. ⊠

It is crazy to do all but the simplest and smallest instances of the Gram-Schmidt procedure by hand. Well, OK, maybe just once or twice to get a good understanding of Theorem GSP. After that, let a machine do the work for you. That’s what they are for. See: Computation GSP.MMA

We will see orthonormal sets again in Subsection MINM.UM. They are intimately related to unitary matrices (Definition UM) through Theorem CUMOS. Some of the utility of orthonormal sets is captured by Theorem COB in Subsection B.OBC. Orthonormal sets appear once again in Section OD where they are key in orthonormal diagonalization.

Subsection READ: Reading Questions

Subsection EXC: Exercises

C20 Complete Example AOS by verifying that the four remaining inner products are zero.

C21 Verify that the set T created in Example GSTV by the Gram-Schmidt Procedure is an orthogonal set.
Contributed by Robert Beezer

M60 Suppose that \left \{u,\kern 1.95872pt v,\kern 1.95872pt w\right \} ⊆ {ℂ}^{n} is an orthonormal set. Prove that u + v is not orthogonal to v + w.
Contributed by Manley Perkel

T20 Suppose that u,\kern 1.95872pt v,\kern 1.95872pt w ∈ {ℂ}^{n}, α,\kern 1.95872pt β ∈ ℂ and u is orthogonal to both v and w. Prove that u is orthogonal to αv + βw.
Contributed by Robert Beezer Solution [546]

T30 Suppose that the set S in the hypothesis of Theorem GSP is not just linearly independent, but is also orthogonal. Prove that the set T created by the Gram-Schmidt procedure is equal to S. (Note that we are getting a stronger conclusion than \left \langle T\right \rangle = \left \langle S\right \rangle — the conclusion is that T = S.) In other words, it is pointless to apply the Gram-Schmidt procedure to a set that is already orthogonal.
Contributed by Steve Canfield

Subsection SOL: Solutions

\eqalignno{ \left \langle αv + βw,\kern 1.95872pt u\right \rangle & = \left \langle αv,\kern 1.95872pt u\right \rangle + \left \langle βw,\kern 1.95872pt u\right \rangle & &\text{@(a href="#theorem.IPVA")Theorem IPVA@(/a)} & & & & \cr & = α\left \langle v,\kern 1.95872pt u\right \rangle + β\left \langle w,\kern 1.95872pt u\right \rangle & &\text{@(a href="#theorem.IPSM")Theorem IPSM@(/a)} & & & & \cr & = α\left (0\right ) + β\left (0\right ) & &\text{@(a href="#definition.OV")Definition OV@(/a)} & & & & \cr & = 0 & & & & }

Annotated Acronyms V: Vectors

Theorem VSPCV
These are the fundamental rules for working with the addition, and scalar multiplication, of column vectors. We will see something very similar in the next chapter (Theorem VSPM) and then this will be generalized into what is arguably our most important definition, Definition VS.

Theorem SLSLC
Vector addition and scalar multiplication are the two fundamental operations on vectors, and linear combinations roll them both into one. Theorem SLSLC connects linear combinations with systems of equations. This one we will see often enough that it is worth memorizing.

Theorem PSPHS
This theorem is interesting in its own right, and sometimes the vaugeness surrounding the choice of z can seem mysterious. But we list it here because we will see an important theorem in Section ILT which will generalize this result (Theorem KPI).

Theorem LIVRN
If you have a set of column vectors, this is the fastest computational approach to determine if the set is linearly independent. Make the vectors the columns of a matrix, row-reduce, compare r and n. That’s it — and you always get an answer. Put this one in your toolkit.

Theorem BNS
We will have several theorems (all listed in these “Annotated Acronyms” sections) whose conclusions will provide a linearly independent set of vectors whose span equals some set of interest (the null space here). While the notation in this theorem might appear gruesome, in practice it can become very routine to apply. So practice this one — we’ll be using it all through the book.

Theorem BS
As promised, another theorem that provides a linearly independent set of vectors whose span equals some set of interest (a span now). You can use this one to clean up any span.