Least squares in a non-ordinary sense

Simple ordinary least squares regression (SOLSR) means the following. Given data (x_i,y_i)\in \mathbb{R}^2, i = 1, \ldots,N, find a line in \mathbb{R}^2 represented by y = mx+c that fits the data in the following sense. The loss of each data point (x_i,y_i) to the line is

|y_i - (mx_i + c) |          for every i,

so we find (m,c) that minimizes the loss function

\displaystyle \sum_{i=1}^N [y_i - (mx_i + c)]^2.

See here for a closed-form of the minimizer (m,c). Instead of SOLSR, one can consider the distance between a data point and a line in \mathbb{R}^2 as the loss. Notice that |y_i - (mx_i + c) | is not the distance from (x_i,y_i) to y = mx+c unless m = 0. Then the new least squares problem can be formulated as follows.

A general line in \mathbb{R}^2 can be expressed as \sin(\theta)x-\cos(\theta)y+a=0 where (\theta,a)\in \mathbb{R}^2. Thus the distance between (x_i,y_i) and this line is

|\sin(\theta) x_i - \cos(\theta) y_i + a|          for every i.

Hence we want to find (\theta,a) that minimizes the loss function

\displaystyle L(\theta,a) :=\sum_{i=1}^N (\sin(\theta) x_i - \cos(\theta) y_i + a)^2.

Notice that, not only SOLSR, this problem also has a lot of real-life applications. It turns out there is still a closed-form for (\theta,a), and we will derive it. There is a high chance that the answer can be found somewhere, but we could not find it so far. Also it is a good exercise for Hong Kong students who know Additional Maths, although the subject disappeared.

Setting the partial derivative L_a to be zero one has

\displaystyle \sum_{i=1}^N (\sin(\theta) x_i - \cos(\theta) y_i + a) = 0

which implies

\displaystyle a = \frac{\cos(\theta)}{N} \sum_{i=1}^N y_i - \frac{\sin(\theta)}{N} \sum_{i=1}^N x_i := \cos(\theta) \overline{Y} - \sin(\theta) \overline{X}.

Using the formulae \sin(2\theta) = 2\sin(\theta) \cos(\theta) and \cos(2\theta) = \cos^2 (\theta) - \sin^2 (\theta), one has

\displaystyle L_\theta = \frac{\sin(2\theta)}{2}(\overline{X^2} - \overline{Y^2}) - \cos(2\theta) \overline{XY} + a (\cos(\theta) \overline{X} + \sin(\theta) \overline{Y}) = 0


\displaystyle \overline{X^2} := \frac{1}{N}\sum_{i=1}^N x_i^2, \ \overline{Y^2} := \frac{1}{N}\sum_{i=1}^N y_i^2 \ \text{ and } \ \overline{XY}:= \frac{1}{N}\sum_{i=1}^N x_iy_i.

Plugging in the above expression of a into the above formula, we reach

\sin(2\theta) (\text{Var}(X)-\text{Var}(Y)) = 2\cos(2\theta) \text{Cov}(X,Y)


\text{Var}(X) := \overline{X^2} - \overline{X}^2, \ \text{Var}(Y) := \overline{Y^2} - \overline{Y}^2 \  \text{ and } \  \text{Cov}(X,Y):= \overline{XY} - \overline{X} \ \overline{Y}.

It implies the formula

\displaystyle \tan(2\theta) = \frac{2 \ \text{Cov}(X,Y)}{\text{Var}(X) - \text{Var}(Y)},

which concludes the result:

Theorem 1.         The minimizer of the loss function L(\theta,a) satisfies

\displaystyle \tan(2\theta) = \frac{2 \ \text{Cov}(X,Y)}{\text{Var}(X) - \text{Var}(Y)}     and     \displaystyle a = \cos(\theta) \overline{Y} - \sin(\theta) \overline{X}.

In particular the point (\overline{X},\overline{Y}) lies on the best fitted line \sin(\theta)x - \cos(\theta) y + a = 0.

We leave to the readers to work on the high-dimensional cases, and the case using weighted data.

This entry was posted in Applied mathematics, Calculus, Optimization, Statistics. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s