GS 2006/Math - Collins

Lab 15
Least-Squares Fit


A common task in science and engineering is to analyze and quantify relationships among several quantities. The easiest example of this and one you've probably done is when you have a set of data point (xi,yi), i = 1, ..., n and you want to find the equation of the line that best fits the data. You've either used some well-known formulas or just entered the data in your calculator and had it compute the slope and intercept. This lab will show another way to solve this problem and extend it to other curves besides lines.

Math Stuff

Start with a set of data points: (xi,yi), i = 1, ..., n. Now, suppose we have the equation of a candidate line to fit our data: y = mx + b. Then for each xi the line predicts the value mxi+b. This is supposed to be the same as our observed value yi, but it may differ. This difference is called the residual and we write ri = yi - mxi - b. Let r be a vector of all the residual values, then our goal is to find values for m and b so that ||r|| is minimized (this is the normal Euclidean norm).

We can reformulate this problem into a matrix problem as follows: Let A be the n by 2 matrix like this:

A = [ x1  1
      x2  1
      x3  1
      x4  1
      xn  1]
and let y be the column vector of all the yi values. Then if we set c = [m; b] (column vector of the coefficients) then r = y - Ac, and thus we are trying to find the vector c which minimizes ||y - Ac||.

As discussed in class and in the text (pg. 333) this is called a least-squares problem and the solution comes from realizing that the residual must be orthogonal to the column space of A or that ATr = 0. This works out to form the normal equations:

    ATAc = ATy.
If A has full rank (in this case, 2, which means that we have at least 2 distinct x values), then ATA is invertible, so this has a unique solution.

Example with MATLAB

Suppose we have the following data points (2,1), (5,2), (7,3), (8,3) and we want to find the best-fit line to this data set. Then in MATLAB we'd do the following:
x = [2 5 7 8]';      % column of x-values
y = [1 2 3 3]';      % column of y-values
A = [x, ones(4,1)];  % form matrix A, add column of 4 1s
ATA = A'*A;          % form ATA for normal equations and
b = A'*y;            % form ATy
c = inv(ATA)*b;
then c(1) would be the slope (should be 0.3571) and c(2) the intercept (0.2857).

Once we have A and y, we can do this more efficiently with

c = (A'*A)\(A'*y);
If you want to see the data and the results you can plot the points and the curve as follows:
predy = A*c;           % compute predicted values
plot(x,y,'o',x,predy)  % plot data as circles and draw best-fit line

Besides Lines

This approach works with any curve for which is linear in its coefficients. For example, you can fit a quadratic curve y = ax2 + bx + c to a set of data or any other polynomial. You can even do crazy curves that combine polynomials, exponentials, logs and trig functions like this: y = ax + b + c sin(x) + d cos(2x) + e log(x) + f ex. Anything can work as long as it is linear in the coefficients.

Here's how it works for a quadratic (and, in general, for the others). The residual is still observed - predicted and it is still orthogonal to the column space of a matrix, so we first look at the residual: r = y - Ac where y is the column vector of y-values, c is the column vector of coefficients we're solving for (for the quadratic, c = [a b c]'), and A is the n by 3 matrix of the form

A = [ x12  x1  1
      x22  x2  1
      x32  x3  1
      xn2  xn  1]
Note that if you take a row of A and multiply it by the column c you get axi2 + bxi + c.
Since the residual is orthogonal to A, we again get the normal equations:
   ATA c = ATy

How do we do a fit to other types of equations? The key thing to note is that the columns in A correspond to the functions in the equation we are trying to fit. For example, for the line the functions are x and 1 and the first column of A is xi and the 2nd column are all 1s. Thus if we wanted to fit the crazy function above to some data, we'd form the matrix A with columns corresponding to the functions x, 1, sin(x), cos(2x), log(x) and ex.
Once we have the matrix A, we just use the normal equations to solve for the coefficients.


1. Find the equation of the line of best-fit to the data:
   x = [0 1 2 3 4 5 6 7 8]'
   y = [1 1 2 2 3 3 4 5 7]'

2. Take the same data and find the best-fit quadratic curve to it.

3. One application of these curves is in approximating other functions. For the data take 11 equally spaced points from to 0 to 2pi and the value of cosine at each point:

x = 2*pi*[0:10]'/10;     % 11 equally spaced points
y = cos(x);
Find the polynomials of degree 2 (quadratic) and 4 (quartic) that best fit this data.

4. Sales models are usually linear functions, but it makes more sense to include some seasonal fluctuations. Suppose over a year, monthly sales amounts (in $1K) are recorded as follows:

 s = [20 15 12 8 13 22 28 33 35 45 56 62]
and we think the data follows the function s = a t + b + c sin (2*pi*t/12) where t is measured in months. Find the values of a, b and c so that this curve is the best fit to the data.