From A First Course in Linear Algebra
Version 2.30
© 2004.
Licensed under the GNU Free Documentation License.
http://linear.ups.edu/
In this section we collect many short essays designed to help you understand
how to read, understand and construct proofs. Some are very factual, while others
consist of advice. They appear in the order that they are first needed (or
advisable) in the text, and are meant to be self-contained. So you should not
think of reading through this section in one sitting as you begin this course. But
be sure to head back here for a first reading whenever the text suggests
it. Also think about returning to browse at various points during the
course, and especially as you struggle with becoming an accomplished
mathematician who is comfortable with the difficult process of designing new
proofs.
A definition is a made-up term, used as a kind of shortcut for some typically more complicated idea. For example, we say a whole number is even as a shortcut for saying that when we divide the number by two we get a remainder of zero. With a precise definition, we can answer certain questions unambiguously. For example, did you ever wonder if zero was an even number? Now the answer should be clear since we have a precise definition of what we mean by the term even.
A single term might have several possible definitions. For example, we could say that the whole number n is even if there is another whole number k such that n = 2k. We say this is an equivalent definition since it categorizes even numbers the same way our first definition does.
Definitions are like two-way streets — we can use a definition to replace something rather complicated by its definition (if it fits) and we can replace a definition by its more complicated description. A definition is usually written as some form of an implication, such as “If something-nice-happens, then blatzo.” However, this also means that “If blatzo, then something-nice-happens,” even though this may not be formally stated. This is what we mean when we say a definition is a two-way street — it is really two implications, going in opposite “directions.”
Anybody (including you) can make up a definition, so long as it is unambiguous, but the real test of a definition’s utility is whether or not it is useful for describing interesting or frequent situations.
We will talk about theorems later (and especially equivalences). For now, be sure not to confuse the notion of a definition with that of a theorem.
In this book, we will display every new definition carefully set-off from the text, and the term being defined will be written thus: definition. Additionally, there is a full list of all the definitions, in order of their appearance located at the front of the book (Definitions). Finally, the acronym for each definition can be found in the index (Index). Definitions are critical to doing mathematics and proving theorems, so we’ve given you lots of ways to locate a definition should you forget its…uh, well, …definition.
Can you formulate a precise definition for what it means for a number to be odd? (Don’t just say it is the opposite of even. Act as if you don’t have a definition for even yet.) Can you formulate your definition a second, equivalent, way? Can you employ your definition to test an odd and an even number for “odd-ness”?
Higher mathematics is about understanding theorems. Reading them, understanding them, applying them, proving them. Every theorem is a shortcut — we prove something in general, and then whenever we find a specific instance covered by the theorem we can immediately say that we know something else about the situation by applying the theorem. In many cases, this new information can be gained with much less effort than if we did not know the theorem.
The first step in understanding a theorem is to realize that the statement of every theorem can be rewritten using statements of the form “If something-happens, then something-else-happens.” The “something-happens” part is the hypothesis and the “something-else-happens” is the conclusion. To understand a theorem, it helps to rewrite its statement using this construction. To apply a theorem, we verify that “something-happens” in a particular instance and immediately conclude that “something-else-happens.” To prove a theorem, we must argue based on the assumption that the hypothesis is true, and arrive through the process of logic that the conclusion must then also be true.
Like any science, the language of math must be understood before further study can continue.
Erin Wilson, Student
September, 2004
Mathematics is a language. It is a way to express complicated ideas clearly, precisely, and unambiguously. Because of this, it can be difficult to read. Read slowly, and have pencil and paper at hand. It will usually be necessary to read something several times. While reading can be difficult, it is even harder to speak mathematics, and so that is the topic of this technique.
“Natural” language, in the present case English, is fraught with ambiguity. Consider the possible meanings of the sentence: The fish is ready to eat. One fish, or two fish? Are the fish hungry, or will the fish be eaten? (See Exercise SSLE.M10, Exercise SSLE.M11, Exercise SSLE.M12, Exercise SSLE.M13.) In your daily interactions with others, give some thought to how many mis-understandings arise from the ambiguity of pronouns, modifiers and objects.
I am going to suggest a simple modification to the way you use language that will make it much, much easier to become proficient at speaking mathematics and eventually it will become second nature. Think of it as a training aid or practice drill you might use when learning to become skilled at a sport.
First, eliminate pronouns from your vocabulary when discussing linear algebra, in class or with your colleagues. Do not use: it, that, those, their or similar sources of confusion. This is the single easiest step you can take to make your oral expression of mathematics clearer to others, and in turn, it will greatly help your own understanding.
Now rid yourself of the word “thing” (or variants like “something”). When you are tempted to use this word realize that there is some object you want to discuss, and we likely have a definition for that object (see the discussion at Technique D). Always “think about your objects” and many aspects of the study of mathematics will get easier. Ask yourself: “Am I working with a set, a number, a function, an operation, a differential equation, or what?” Knowing what an object is will allow you to narrow down the procedures you may apply to it. If you have studied an object-oriented computer programming language, then you will already have experience identifying objects and thinking carefully about what procedures are allowed to be applied to them.
Third, eliminate the verb “works” (as in “the equation works”) from your vocabulary. This term is used as a substitute when we are not sure just what we are trying to accomplish. Usually we are trying to say that some object fulfills some condition. The condition might even have a definition associated with it, making it even easier to describe.
Last, speak slooooowly and thoughtfully as you try to get by without all these lazy words. It is hard at first, but you will get better with practice. Especially in class, when the pressure is on and all eyes are on you, don’t succumb to the temptation to use these weak words. Slow down, we’d all rather wait for a slow, well-formed question or answer than a fast, sloppy, incomprehensible one.
You will find the improvement in your ability to speak clearly about complicated ideas will greatly improve your ability to think clearly about complicated ideas. And I believe that you cannot think clearly about complicated ideas if you cannot formulate questions or answers clearly in the correct language. This is as applicable to the study of law, economics or philosophy as it is to the study of science or mathematics.
In this spirit, Dupont Hubert has contributed the following quotation, which is widely used in French mathematics courses (and which might be construed as the contrapositive of Technique CP)
Ce que l’on concoit bien s’enonce clairement,
Et les mots pour le dire arrivent aisement.
— Nicolas Boileau, L’art poétique, Chant I, 1674
which translates as
Whatever is well conceived is clearly said,
And the words to say it flow with ease.
So when you come to class, check your pronouns at the door, along with other weak words. And when studying with friends, you might make a game of catching one another using pronouns, “thing,” or “works.” I know I’ll be calling you on it!
“I don’t know how to get started!” is often the lament of the novice proof-builder. Here are a few pieces of advice.
Conclusions of proofs come in a variety of types. Often a theorem will simply assert that something exists. The best way, but not the only way, to show something exists is to actually build it. Such a proof is called constructive. The thing to realize about constructive proofs is that the proof itself will contain a procedure that might be used computationally to construct the desired object. If the procedure is not too cumbersome, then the proof itself is as useful as the statement of the theorem.
When a theorem uses the phrase “if and only if” (or the abbreviation “iff”) it is a shorthand way of saying that two if-then statements are true. So if a theorem says “P if and only if Q,” then it is true that “if P, then Q” while it is also true that “if Q, then P.” For example, it may be a theorem that “I wear bright yellow knee-high plastic boots if and only if it is raining.” This means that I never forget to wear my super-duper yellow boots when it is raining and I wouldn’t be seen in such silly boots unless it was raining. You never have one without the other. I’ve got my boots on and it is raining or I don’t have my boots on and it is dry.
The upshot for proving such theorems is that it is like a 2-for-1 sale, we get to do two proofs. Assume P and conclude Q, then start over and assume Q and conclude P. For this reason, “if and only if” is sometimes abbreviated by \kern 3.26288pt \mathrel{⇔}\kern 3.26288pt , while proofs indicate which of the two implications is being proved by prefacing each with ⇒ or ⇐. A carefully written proof will remind the reader which statement is being used as the hypothesis, a quicker version will let the reader deduce it from the direction of the arrow. Tradition dictates we do the “easy” half first, but that’s hard for a student to know until you’ve finished doing both halves! Oh well, if you rewrite your proofs (a good habit), you can then choose to put the easy half first.
Theorems of this type are called “equivalences” or “characterizations,” and they are some of the most pleasing results in mathematics. They say that two objects, or two situations, are really the same. You don’t have one without the other, like rain and my yellow boots. The more different P and Q seem to be, the more pleasing it is to discover they are really equivalent. And if P describes a very mysterious solution or involves a tough computation, while Q is transparent or involves easy computations, then we’ve found a great shortcut for better understanding or faster computation. Remember that every theorem really is a shortcut in some form. You will also discover that if proving P ⇒ Q is very easy, then proving Q ⇒ P is likely to be proportionately harder. Sometimes the two halves are about equally hard. And in rare cases, you can string together a whole sequence of other equivalences to form the one you’re after and you don’t even need to do two halves. In this case, the argument of one half is just the argument of the other half, but in reverse.
One last thing about equivalences. If you see a statement of a theorem that says two things are “equivalent,” translate it first into an “if and only if” statement.
When we construct the contrapositive of a theorem (Technique CP), we need to negate the two statements in the implication. And when we construct a proof by contradiction (Technique CD), we need to negate the conclusion of the theorem. One way to construct a converse (Technique CV) is to simultaneously negate the hypothesis and conclusion of an implication (but remember that this is not guaranteed to be a true statement). So we often have the need to negate statements, and in some situations it can be tricky.
If a statement says that a set is empty, then its negation is the statement that the set is nonempty. That’s straightforward. Suppose a statement says “something-happens” for all i, or every i, or any i. Then the negation is that “something-doesn’t-happen” for at least one value of i. If a statement says that there exists at least one “thing,” then the negation is the statement that there is no “thing.” If a statement says that a “thing” is unique, then the negation is that there is zero, or more than one, of the “thing.”
We are not covering all of the possibilities, but we wish to make the point that logical qualifiers like “there exists” or “for every” must be handled with care when negating statements. Studying the proofs which employ contradiction (as listed in Technique CD) is a good first step towards understanding the range of possibilities.
The contrapositive of an implication P ⇒ Q is the implication not(Q) ⇒ not(P), where “not” means the logical negation, or opposite. An implication is true if and only if its contrapositive is true. In symbols, (P ⇒ Q)\kern 3.26288pt \mathrel{⇔}\kern 3.26288pt (not(Q) ⇒ not(P)) is a theorem. Such statements about logic, that are always true, are known as tautologies.
For example, it is a theorem that “if a vehicle is a fire truck, then it has big tires and has a siren.” (Yes, I’m sure you can conjure up a counterexample, but play along with me anyway.) The contrapositive is “if a vehicle does not have big tires or does not have a siren, then it is not a fire truck.” Notice how the “and” became an “or” when we negated the conclusion of the original theorem.
It will frequently happen that it is easier to construct a proof of the contrapositive than of the original implication. If you are having difficulty formulating a proof of some implication, see if the contrapositive is easier for you. The trick is to construct the negation of complicated statements accurately. More on that later.
The converse of the implication P ⇒ Q is the implication Q ⇒ P. There is no guarantee that the truth of these two statements are related. In particular, if an implication has been proven to be a theorem, then do not try to use its converse too, as if it were a theorem. Sometimes the converse is true (and we have an equivalence, see Technique E). But more likely the converse is false, especially if it wasn’t included in the statement of the original theorem.
For example, we have the theorem, “if a vehicle is a fire truck, then it is has big tires and has a siren.” The converse is false. The statement that “if a vehicle has big tires and a siren, then it is a fire truck” is false. A police vehicle for use on a sandy public beach would have big tires and a siren, yet is not equipped to fight fires.
We bring this up now, because Theorem CSRN has a tempting converse. Does this theorem say that if r < n, then the system is consistent? Definitely not, as Archetype E has r = 3 < 4 = n, yet is inconsistent. This example is then said to be a counterexample to the converse. Whenever you think a theorem that is an implication might actually be an equivalence, it is good to hunt around for a counterexample that shows the converse to be false (the archetypes, Appendix A, can be a good hunting ground).
Another proof technique is known as “proof by contradiction” and it can be a powerful (and satisfying) approach. Simply put, suppose you wish to prove the implication, “If A, then B.” As usual, we assume that A is true, but we also make the additional assumption that B is false. If our original implication is true, then these twin assumptions should lead us to a logical inconsistency. In practice we assume the negation of B to be true (see Technique N). So we argue from the assumptions A and \text{not}(B) looking for some obviously false conclusion such as 1 = 6, or a set is simultaneously empty and nonempty, or a matrix is both nonsingular and singular.
You should be careful about formulating proofs that look like proofs by contradiction, but really aren’t. This happens when you assume A and \text{not}(B) and proceed to give a “normal” and direct proof that B is true by only using the assumption that A is true. Your last step is to then claim that B is true and you then appeal to the assumption that \text{not}(B) is true, thus getting the desired contradiction. Instead, you could have avoided the overhead of a proof by contradiction and just run with the direct proof. This stylistic flaw is known, quite graphically, as “setting up the strawman to knock him down.”
Here is a simple example of a proof by contradiction. There are direct proofs that are just about as easy, but this will demonstrate the point, while narrowly avoiding knocking down the straw man.
Theorem: If a and b are odd integers, then their product, ab, is odd.
Proof: To begin a proof by contradiction, assume the hypothesis, that a and b are odd. Also assume the negation of the conclusion, in this case, that ab is even. Then there are integers, j, k, ℓ so that a = 2j + 1, b = 2k + 1, ab = 2ℓ. Then
Notice how we used both our hypothesis and the negation of the conclusion in the second line. Now divide the integer on each end of this string of equalities by 2. On the left we get a remainder of 0, while on the right we see that the remainder will be 1. Both remainders cannot be correct, so this is our desired contradiction. Thus, the conclusion (that ab is odd) is true.
Again, we do not offer this example as the best proof of this fact about even and odd numbers, but rather it is a simple illustration of a proof by contradiction. You can find examples of proofs by contradiction in Theorem RREFU, Theorem NMUS, Theorem NPNT, Theorem TTMI, Theorem GSP, Theorem ELIS, Theorem EDYES, Theorem EMHE, Theorem EDELI, and Theorem DMFE, in addition to several examples and solutions to exercises.
A theorem will sometimes claim that some object, having some desirable property, is unique. In other words, there should be only one such object. To prove this, a standard technique is to assume there are two such objects and proceed to analyze the consequences. The end result may be a contradiction (Technique CD), or the conclusion that the two allegedly different objects really are equal.
A very specialized form of a theorem begins with the statement “The following are equivalent…,” which is then followed by a list of statements. Informally, this lead-in sometimes gets abbreviated by “TFAE.” This formulation means that any two of the statements on the list can be connected with an “if and only if” to form a theorem. So if the list has n statements then, there are { n(n−1) \over 2} possible equivalences that can be constructed (and are claimed to be true).
Suppose a theorem of this form has statements denoted as A, B, C,…Z. To prove the entire theorem, we can prove A ⇒ B, B ⇒ C, C ⇒ D,…, Y ⇒ Z and finally, Z ⇒ A. This circular chain of n equivalences would allow us, logically, if not practically, to form any one of the { n(n−1) \over 2} possible equivalences by chasing the equivalences around the circle as far as required.
Many theorems have conclusions that say two objects are equal. Perhaps one object is hard to compute or understand, while the other is easy to compute or understand. This would make for a pleasing theorem. Whether the result is pleasing or not, we take the same approach to formulate a proof. Sometimes we need to employ specialized notions of equality, such as Definition SE or Definition CVE, but in other cases we can string together a list of equalities.
The wrong way to prove an identity is to begin by writing it down and then beating on it until it reduces to an obvious identity. The first flaw is that you would be writing down the statement you wish to prove, as if you already believed it to be true. But more dangerous is the possibility that some of your maneuvers are not reversible. Here’s an example. Let’s prove that 3 = −3.
So because 0 = 0 is a true statement, does it follow that 3 = −3 is a true statement? Nope. Of course, we didn’t really expect a legitimate proof of 3 = −3, but this attempt should illustrate the dangers of this (incorrect) approach.
What you have just seen in the proof of Theorem VSPCV, and what you will see consistently throughout this text, is proofs of the following form. To prove that A = D we write
In your scratch work exploring possible approaches to proving a theorem you may massage a variety of expressions, sometimes making connections to various bits and pieces, while some parts get abandoned. Once you see a line of attack, rewrite your proof carefully mimicking this style.
Much of your mathematical upbringing, especially once you began a study of algebra, revolved around simplifying expressions — combining like terms, obtaining common denominators so as to add fractions, factoring in order to solve polynomial equations. However, as often as not, we will do the opposite. Many theorems and techniques will revolve around taking some object and “decomposing” it into some combination of other objects, ostensibly in a more complicated fashion. When we say something can “be written as” something else, we mean that the one object can be decomposed into some combination of other objects. This may seem unnatural at first, but results of this type will give us insight into the structure of the original object by exposing its inner workings. An appropriate analogy might be stripping the wallboards away from the interior of a building to expose the structural members supporting the whole building.
Perhaps you have studied integral calculus, or a pre-calculus course, where you learned about partial fractions. This is a technique where a fraction of two polynomials is decomposed (written as, expressed as) a sum of simpler fractions. The purpose in calculus is to make finding an antiderivative simpler. For example, you can verify the truth of the expression
In an early course in algebra, you might be expected to combine the four terms on the right over a common denominator to create the “simpler” expression on the left. Going the other way, the partial fraction technique would allow you to systematically decompose the fraction of polynomials on the left into the sum of the four (arguably) simpler fractions of polynomials on the right.
This is a major shift in thinking, so come back here often, especially when we say “can be written as”, or “can be expressed as,” or “can be decomposed as.”
“Induction” or “mathematical induction” is a framework for proving statements that are indexed by integers. In other words, suppose you have a statement to prove that is really multiple statements, one for n = 1, another for n = 2, a third for n = 3, and so on. If there is enough similarity between the statements, then you can use a script (the framework) to prove them all at once.
For example, consider the theorem
Theorem 1 + 2 + 3 + \mathrel{⋯} + n = {n(n + 1)\over 2} for n ≥ 1.
This is shorthand for the many statements 1 = {1(1+1)\over 2} , 1 + 2 = {2(2+1)\over 2} , 1 + 2 + 3 = {3(3+1)\over 2} , 1 + 2 + 3 + 4 = {4(4+1)\over 2} , and so on. Forever. You can do the calculations in each of these statements and verify that all four are true. We might not be surprised to learn that the fifth statement is true as well (go ahead and check). However, do we think the theorem is true for n = 872? Or n = 1, 234, 529?
To see that these questions are not so ridiculous, consider the following example from Rotman’s Journey into Mathematics. The statement “{n}^{2} − n + 41 is prime” is true for integers 1 ≤ n ≤ 40 (check a few). However, when we check n = 41 we find 4{1}^{2} − 41 + 41 = 4{1}^{2}, which is not prime.
So how do we prove infinitely many statements all at once? More formally, lets denote our statements as P(n). Then, if we can prove the two assertions
then it follows that P(n) is true for all n ≥ 1. To understand this, I liken the process to climbing an infinitely long ladder with equally spaced rungs. Confronted with such a ladder, suppose I tell you that you are able to step up onto the first rung, and if you are on any particular rung, then you are capable of stepping up to the next rung. It follows that you can climb the ladder as far up as you wish. The first formal assertion above is akin to stepping onto the first rung, and the second formal assertion is akin to assuming that if you are on any one rung then you can always reach the next rung.
In practice, establishing that P(1) is true is called the “base case” and in most cases is straightforward. Establishing that P(k) ⇒ P(k + 1) is referred to as the “induction step,” or in this book (and elsewhere) we will typically refer to the assumption of P(k) as the “induction hypothesis.” This is perhaps the most mysterious part of a proof by induction, since it looks like you are assuming (P(k)) what you are trying to prove (P(n)). Sometimes it is even worse, since as you get more comfortable with induction, we often don’t bother to use a different letter (k) for the index (n) in the induction step. Notice that the second formal assertion never says that P(k) is true, it simply says that if P(k) were true, what might logically follow. We can establish statements like “If I lived on the moon, then I could pole-vault over a bar 12 meters high.” This may be a true statement, but it does not say we live on the moon, and indeed we may never live there.
Enough generalities. Let’s work an example and prove the theorem above about sums of integers. Formally, our statement is P(n) :\ 1 + 2 + 3 + \mathrel{⋯} + n = {n(n + 1)\over 2} .
Proof: Base Case. P(1) is the statement 1 = {1(1+1)\over 2} , which we see simplifies to the true statement 1 = 1.
Induction Step: We will assume P(k) is true, and will try to prove P(k + 1). Given what we want to accomplish, it is natural to begin by examining the sum of the first k + 1 integers.
We then recognize the two ends of this chain of equalities as P(k + 1). So, by mathematical induction, the theorem is true for all n.
How do you recognize when to use induction? The first clue is a statement that is really many statements, one for each integer. The second clue would be that you begin a more standard proof and you find yourself using words like “and so on” (as above!) or lots of ellipses (dots) to establish patterns that you are convinced continue on and on forever. However, there are many minor instances where induction might be warranted but we don’t bother.
Induction is important enough, and used often enough, that it
appears in various variations. The base case sometimes begins with
n = 0, or perhaps an integer greater
than n. Some formulate
the induction step as P(k − 1) ⇒ P(k).
There is also a “strong form” of induction where we assume all of
P(1),
P(2),
P(3),
…P(k) as a hypothesis for
showing the conclusion P(k + 1).
You can find examples of induction in the proofs of Theorem GSP, Theorem DER,
Theorem DT, Theorem DIM, Theorem EOMP, Theorem DCP, and
Theorem KPLT.
Here is a technique used by many practicing mathematicians when they are teaching themselves new mathematics. As they read a textbook, monograph or research article, they attempt to prove each new theorem themselves, before reading the proof. Often the proofs can be very difficult, so it is wise not to spend too much time on each. Maybe limit your losses and try each proof for 10 or 15 minutes. Even if the proof is not found, it is time well-spent. You become more familiar with the definitions involved, and the hypothesis and conclusion of the theorem. When you do work through the proof, it might make more sense, and you will gain added insight about just how to construct a proof.
Theorems often go by different titles. Two of the most popular being “lemma” and “corollary.” Before we describe the fine distinctions, be aware that lemmas, corollaries, propositions, claims and facts are all just theorems. And every theorem can be rephrased as an “if-then” statement, or perhaps a pair of “if-then” statements expressed as an equivalence (Technique E).
A lemma is a theorem that is not too interesting in its own right, but is important for proving other theorems. It might be a generalization or abstraction of a key step of several different proofs. For this reason you often hear the phrase “technical lemma” though some might argue that the adjective “technical” is redundant.
A corollary is a theorem that follows very easily from another theorem. For this reason, corollaries frequently do not have proofs. You are expected to easily and quickly see how a previous theorem implies the corollary.
A proposition or fact is really just a codeword for a theorem. A claim might be similar, but some authors like to use claims within a proof to organize key steps. In a similar manner, some long proofs are organized as a series of lemmas.
In order to not confuse the novice, we have just called all our theorems
theorems. It is also an organizational convenience. With only theorems and
definitions, the theoretical backbone of the course is laid bare in the two lists of
Definitions and Theorems.