# Least Squares

## Table of Contents

(ns morelinear.leastsquares (:require [clojure.core.matrix :refer :all] [clojure.core.matrix.linear :as linear] [morelinear.gauss :as gauss] [morelinear.householder :as householder])) (set-current-implementation :vectorz)

## The A^{T}Ax=A^{T}b form

Now that we have many ways to solve square linear systems we need to extend **Ax=b** to the overdefined case where we have more linear equations than parameters. This situation will come up constantly, generally in situations where you only have a few inputs you want to solve for, but you have a lot of redundant measurements.

If the measurements were ideal then the rows of **A** will not be independent and you will be able to find an explicit solution though Gaussian Elimination. But in the general case (with the addition of noise and rounding errors) no explicit solution will exist for **A _{skinny}x=b** . You could try to find a set of independent equations and throw away the extra equation to try to make a square matrix, but this is throwing information away.

Since there is no **x** for which **A _{skinny}x=b** holds true we instead solve for a "close" system

**A**which does have a solution. In other words we want to find an

_{skinny}x=b_{close}**x**that will give us a

**b**which is as close as possible to

_{close}**b**. When we say "close" what we mean mathematically is that we want to minimize the sum of the difference between

**b**and

_{close}**b**- ie.

**sum**.

_{of}_{all}_{values}(b_{close}-b)
The problem is that sums don't express very easily in matrix form. Fortunately we do have a mechanism which is really close - the **inner product**. The inner product of a vector **x** with itself - **x ^{T}x** - is the sum of the squares of the values of

**x**. While this is not equivalent, it actually does not change the solution b/c the minimum stays the minimum. So instead of

**sum**. we can work with

_{of}_{all}_{values}(b_{close}-b)**(b**and guarantee the same solution

_{close}-b)^{T}(b_{close}-b)
If we plug in our equation for **b _{close}** for we get

**(A**.

_{skinny}x-b)^{T}(A_{skinny}x-b)
As we learn in calculus, minimizing a function is done by take its derivative with respect to the parameter we are changing (here that's `x`

), setting it equal to zero and then solving for that parameter (b/c the minimum point has zero slope). What *page 226-227* shows is that the derivative of our difference equation give us the equation **A ^{T}Ax=A^{T}b**. The right hand side

**A**is a vector, and in-fact the whole equation is in the form

^{T}b**Ax=b**which we know how to solve (again, solving for

`x`

here). Also note that **A**is always square and singlular - and that b/c

^{T}A**A**was skinny it's actually smaller than

**A**.

## The LU case

### Direct method

We take the product **A ^{T}A** and the product

**A**and solve

^{T}b**A**just like we solve an

^{T}Ax=A^{T}b**Ax=b**system

(defn lu-direct "Solve ATA=ATb directly" [linear-system output-vector] (let [AT (transpose linear-system) ATA (mmul AT linear-system) ATb (mmul AT output-vector)] (gauss/solve-with-lu ATA ATb)))

;; missile problem from page 232 (let [A [[1.0 0.0 0.0] [1.0 0.25 0.0625] [1.0 0.50 0.25] [1.0 0.75 0.5625] [1.0 1.00 1.0]] b [0.000 0.008 0.015 0.019 0.02] x (lu-direct A b)] (pm x) (pm (mmul A x)))

[-0.000 0.040 -0.019] [-0.000 0.009 0.015 0.019 0.020]

### The numerically stable solution

;; [-0.000 0.040 -0.019] ;; [-2.2857142857142797E-4 ;; 0.008514285714285717 ;; 0.01482857142857143 ;; 0.018714285714285718 ;; 0.020171428571428573]

Unfortunately while the previous solutions is very simple, it does involve a matrix multiplication and therefore has some numerical issues. A nice trick is presented in **Exercise 4.6.9** with the equations

This special block has the property that it'll always be square no matter what shape **A** is. If you multiply out the block matrices you will get two equations

Note how the second equation tell us **x _{1}** is in the

**N(A**

^{T})
Now we multiple both sides of our first equation by **A ^{T}**

We just used the fact that **x _{1}** was in the nullspace of

**A**to drop the first term and now we see how

^{T}**x**is also a solution to the

_{2}*least squares*equation

So we can use any method that works on solving square matrices, use it on our jumbo matrix and get a solution for the least squares problem. All without computing **A ^{T}A**.

(defn- jumbo-matrix "Use the big block matrix method to compute the least squares solution" [input-matrix] (let [num-rows (row-count input-matrix) num-columns (column-count input-matrix) identity (identity-matrix num-rows) AT (transpose input-matrix) zeroes (zero-matrix num-columns num-columns)] (join-along 0 (join-along 1 identity input-matrix) (join-along 1 AT zeroes)))) (defn- pad-with-zeroes "Takes the INPUT-VECTOR and make it longer with some zeroes padded on the end" [input-vector number-of-zeroes] (join-along 0 input-vector (zero-vector number-of-zeroes))) (defn lu-jumbo "" [input-matrix output-vector] (subvector (gauss/solve-with-lu (jumbo-matrix input-matrix) (pad-with-zeroes output-vector (column-count input-matrix))) (row-count input-matrix) (column-count input-matrix)))

;; missile problem from page 232 but redone with the big matrix method (let [A [[1.0 0.0 0.0] [1.0 0.25 0.0625] [1.0 0.50 0.25] [1.0 0.75 0.5625] [1.0 1.00 1.0]] b [0.000 0.008 0.015 0.019 0.02] x (lu-jumbo A b)] (pm x) (pm (mmul A x)))

[-0.000 0.040 -0.019] [-0.000 0.009 0.015 0.019 0.020]

(let [A [[1.000 0.000 0.000] [1.000 250.000 62500.000] [1.000 500.000 250000.000] [1.000 750.000 562500.000] [1.000 1000.000 1000000.000]] b [0.0 8.0 15.0 19.0 20.0] x (lu-jumbo A b)] (pm x) (pm (mmul A x)))

[-7.456 0.048 -0.000] [-7.456 2.643 9.071 11.828 10.914]

**TODO** This should be numerically stable but it's not. Need to double check everything again.. Maybe compare results to `elinear`

## The QR case

With the **QR** decomposition we similarly get to avoid computing the **A ^{T}A** product. Since

**A=QR**we can write rewrite

**A**:

^{T}Ax=A^{T}b
Now since **Q ^{T}** is equivalent to

**Q**(see: Householder transform) this simplifies further to:

^{-1}
The **R** matrix is upper triangular so the left side solves directly through backsubsitution, while the right side evaluates to some column vector.

(defn householder-qr-long "" [A b] (let [R (mutable A) Q (householder/qr! R) QT (transpose Q) QTb (mmul QT b)] (gauss/backward-substitution R QTb)))

However when we test the result…

;; missile problem from page 232 (let [A [[1.0 0.0 0.0] [1.0 0.25 0.0625] [1.0 0.50 0.25] [1.0 0.75 0.5625] [1.0 1.00 1.0]] b [0.000 0.008 0.015 0.019 0.02] x (householder-qr-long A b)] (pm x) (pm (mmul A x)))

[0.003 0.016 0.005] [0.003 0.007 0.012 0.017 0.023]

The values are .. in the right direction but clearly much more off the mark than in the previous solutions. The trick, as mentioned in the Householder transform section, is to reduce **A** and **b** simultaneously with reflectors and skip forming **Q** entirely.

(defn householder-qr "" [A b] (householder/reduction A b) (gauss/backward-substitution A b))

;; missile problem from page 232 (let [A [[1.0 0.0 0.0] [1.0 0.25 0.0625] [1.0 0.50 0.25] [1.0 0.75 0.5625] [1.0 1.00 1.0]] R (mutable A) b (mutable [0.000 0.008 0.015 0.019 0.02]) x (householder-qr R b)] (pm x) (pm (mmul A x)))

[-0.000 0.040 -0.019] [-0.000 0.009 0.015 0.019 0.020]

And now the results match closely to what we got before