Section 4.1 Least Squares
¶Subsection 4.1.1 Theory of Least Squares
Solving a linear system of equations is a fundamental use of the tools of linear algebra. You know from introductory linear algebra that a linear system may have no solution. In an applied situation there could be many reasons for this, and it begs the question: what to do next? We often construct mathematical models of practical situations, frequently in an effort to measure various parameters of the model. Suppose we think that interest rates R, as measured by the rate on one-year government bonds, are a linear function of construction activity C, as measured by the number of permits issued for new construction in the last 30 days. So the “hotter” the construction market, the greater the demand for loans, so the cost of money (the interest rate) is greater. With a good model, we might be able to predict interest rates by examining public records for changes in the number of construction permits issued. So we have a mathematical modelExample 4.1.1. Trivial Least Squares.
Consider the extremely trivial system of equations in one variable, \(A\vect{x}=\vect{b}\) with
Quite clearly this system has no solution. Forming the normal equations we get the silly system
which has the unique solution \(x_1=7\text{.}\) What is more interesting is that
This is a vector in \(\csp{A}\) and the difference
is orthogonal to the columns of \(A\text{,}\) in other words \(\vect{r}\) is in the orthogonal complement of the column space of \(A\text{.}\) Geometrically, the point \((5,15)\) is not on the line \(y=2x\text{,}\) but \((7,14)\) is. The line has slope \(2\text{,}\) while the line segment joining \((5,15)\) to \((7,14)\) has slope \(-\frac{1}{2}\text{,}\) making it perpendicular to the line. So \((7,14)\) is the point on the line closest to \((5,15)\) and leads to the solution \(x_1=7\) of the resulting system.
Example 4.1.2. Simple Least Squares.
Consider the simple system of equations in two variables, \(A\vect{x}=\vect{b}\) with
If you try to solve this system, say by row-reducing an augmented matrix, you will discover that there is no solution. The vector \(\vect{b}\) is not in the column space of \(A\text{.}\) The normal equations give the system
which has the unique solution \(x_1=-\frac{4}{23}\text{,}\) \(x_2=\frac{171}{115}\text{.}\) What is more interesting is that
This is a vector in \(\csp{A}\) and the difference
is orthogonal to both of the columns of \(A\text{,}\) in other words \(\vect{r}\) is in the orthogonal complement of the column space of \(A\text{.}\) Geometrically, the point \((1,5,-2)\) is not on the plane spanned by the columns of \(A\text{,}\) which has equation \(9x+3y-5z=0\text{,}\) but \((-\frac{191}{115},\,\frac{473}{115},\,-\frac{12}{23})\) is on the plane. The plane has a normal vector \(9\vec{i}+3\vec{j}-5\vec{k}\text{,}\) while the vector joining \((1,5,-2)\) to \((-\frac{191}{115},\,\frac{473}{115},\,-\frac{12}{23})\) is \(\frac{306}{115}\vec{i}+\frac{102}{115}\vec{j}-\frac{34}{23}\vec{k}\text{,}\) which is a scalar multiple of the normal vector to the plane. So \((-\frac{191}{115},\,\frac{473}{115},\,-\frac{12}{23})\) is the point on the plane closest to \((1,5,-2)\) and leads to the solution \(x_1=-\frac{4}{23}\text{,}\) \(x_2=\frac{171}{115}\) of the resulting system.
Checkpoint 4.1.3.
Suppose we have \(n\) pairs of a dependent variable \(y\) that varies linearly according to the independent variable \(x\text{,}\) \(\left(x_i,y_i\right)\text{,}\) \(1\leq i\leq n\text{.}\) Model the data by the linear equation \(y=a + bx\text{.}\) Solve the normal equations to find expressions that estimate the parameters \(a\) and \(b\text{.}\)
Checkpoint 4.1.4.
Find the least-squares estimate obtained from data modeled by the linear equation \(y=bx\) (a situation where we know there is no \(y\)-intercept).
Checkpoint 4.1.5.
Suppose you have data modeled by a single quantity, \(z\text{,}\) that depends on two independent variables, \(x\) and \(y\text{,}\) linearly according to the model \(z = a+bx+cy\text{.}\) So your data points are triples \(\left(x_i,y_i,z_i\right)\text{,}\) \(1\leq i\leq n\text{.}\) Can you solve the normal equations to obtain expressions estimating \(a,b,c\text{?}\) This might not be a very instructive exercise, but perhaps determine \(\adjoint{A}A\) and \(\adjoint{A}\vect{b}\) before letting the matrix inverse dissuade you. What does the geometric picture of the data and your resulting estimates look like?
Theorem 4.1.6.
Suppose that A is an m×n matrix and b∈Cm. Then r(x)=‖Ax−b‖ is minimized by a solution, ˆx, to the normal equations, A∗Ax=A∗b.
Proof.
For any vector \(\vect{x}\) we can write
Clearly the first term is a vector in the column space of \(A\text{.}\) The second vector, by virtue of \(\hat{\vect{x}}\) being a solution to the normal equations, is an element of \(\nsp{\adjoint{A}}\text{,}\) the orthogonal complement of the column space (Theorem (((result on orthogonal complement of colum space)))). So this is the promised decomposition of \(A\vect{x}-\vect{b}\) into the element of a subspace (the column space of \(A\) here) and the orthogonal complement of the subspace. Since these two vectors are orthogonal, we can apply the generalized version of the Pythagorean Theorem ((((Pythagorean theorem, perhaps into FCLA))))
This inequality establishes that \(\hat{\vect{x}}\) minimizes \(r\left(\vect{x}\right)\text{.}\)
Subsection 4.1.2 Computing Least-Squares Solutions
¶In Exercise Checkpoint 4.1.5 we suggest formulating general expressions for least-squares estimates of three parameters of a model. This begins to get a bit absurd when you invert a matrix larger than size 2. Instead it makes sense to formulate from the data the matrix A∗A and the vector A∗b, and then proceed to a numerical solution. As A∗A is a Hermitian positive definite matrix (Theorem 1.8.2, it can be decomposed into a Cholesky factorization twice as fast as we can decompose an arbitrary matrix into an LU factorization (((cost to form Cholesky))). The Cholesky factorization allows a round of forward-solving followed by a round of back-solving to find a solution to the normal equations. But it gets better. Consider solving the system Ax=b. Suppose A has rank n and we have a thin QR decomposition of A, A=QR where Q has orthogonal columns and R is upper triangular with positive diagonal entries. Notice in particular that R is invertible. ThenCheckpoint 4.1.7.
Compute the least-squares solution to the system \(A\vect{x}=\vect{b}\). Compute the residual vector associated with your solution.
We compute the pieces of the normal equations
So the solution is
The residual is the difference
Notice that residual is indeed orthogonal to the column space of \(A\text{.}\)