## Least squares in a non-ordinary sense

Simple ordinary least squares regression (SOLSR) means the following. Given data $(x_i,y_i)\in \mathbb{R}^2$, $i = 1, \ldots,N$, find a line in $\mathbb{R}^2$ represented by $y = mx+c$ that fits the data in the following sense. The loss of each data point $(x_i,y_i)$ to the line is

$|y_i - (mx_i + c) |$          for every $i$,

so we find $(m,c)$ that minimizes the loss function

$\displaystyle \sum_{i=1}^N [y_i - (mx_i + c)]^2.$

See here for a closed-form of the minimizer $(m,c)$. Instead of SOLSR, one can consider the distance between a data point and a line in $\mathbb{R}^2$ as the loss. Notice that $|y_i - (mx_i + c) |$ is not the distance from $(x_i,y_i)$ to $y = mx+c$ unless $m = 0$. Then the new least squares problem can be formulated as follows.

A general line in $\mathbb{R}^2$ can be expressed as $\sin(\theta)x-\cos(\theta)y+a=0$ where $(\theta,a)\in \mathbb{R}^2$. Thus the distance between $(x_i,y_i)$ and this line is

$|\sin(\theta) x_i - \cos(\theta) y_i + a|$          for every $i$.

Hence we want to find $(\theta,a)$ that minimizes the loss function

$\displaystyle L(\theta,a) :=\sum_{i=1}^N (\sin(\theta) x_i - \cos(\theta) y_i + a)^2$.

Notice that, not only SOLSR, this problem also has a lot of real-life applications. It turns out there is still a closed-form for $(\theta,a)$, and we will derive it. There is a high chance that the answer can be found somewhere, but we could not find it so far. Also it is a good exercise for Hong Kong students who know Additional Maths, although the subject disappeared.

Setting the partial derivative $L_a$ to be zero one has

$\displaystyle \sum_{i=1}^N (\sin(\theta) x_i - \cos(\theta) y_i + a) = 0$

which implies

$\displaystyle a = \frac{\cos(\theta)}{N} \sum_{i=1}^N y_i - \frac{\sin(\theta)}{N} \sum_{i=1}^N x_i := \cos(\theta) \overline{Y} - \sin(\theta) \overline{X}$.

Using the formulae $\sin(2\theta) = 2\sin(\theta) \cos(\theta)$ and $\cos(2\theta) = \cos^2 (\theta) - \sin^2 (\theta)$, one has

$\displaystyle L_\theta = \frac{\sin(2\theta)}{2}(\overline{X^2} - \overline{Y^2}) - \cos(2\theta) \overline{XY} + a (\cos(\theta) \overline{X} + \sin(\theta) \overline{Y}) = 0$

where

$\displaystyle \overline{X^2} := \frac{1}{N}\sum_{i=1}^N x_i^2, \ \overline{Y^2} := \frac{1}{N}\sum_{i=1}^N y_i^2 \ \text{ and } \ \overline{XY}:= \frac{1}{N}\sum_{i=1}^N x_iy_i$.

Plugging in the above expression of $a$ into the above formula, we reach

$\sin(2\theta) (\text{Var}(X)-\text{Var}(Y)) = 2\cos(2\theta) \text{Cov}(X,Y)$

where

$\text{Var}(X) := \overline{X^2} - \overline{X}^2, \ \text{Var}(Y) := \overline{Y^2} - \overline{Y}^2 \ \text{ and } \ \text{Cov}(X,Y):= \overline{XY} - \overline{X} \ \overline{Y}$.

It implies the formula

$\displaystyle \tan(2\theta) = \frac{2 \ \text{Cov}(X,Y)}{\text{Var}(X) - \text{Var}(Y)}$,

which concludes the result:

Theorem 1.         The minimizer of the loss function $L(\theta,a)$ satisfies

$\displaystyle \tan(2\theta) = \frac{2 \ \text{Cov}(X,Y)}{\text{Var}(X) - \text{Var}(Y)}$     and     $\displaystyle a = \cos(\theta) \overline{Y} - \sin(\theta) \overline{X}$.

In particular the point $(\overline{X},\overline{Y})$ lies on the best fitted line $\sin(\theta)x - \cos(\theta) y + a = 0$.

We leave to the readers to work on the high-dimensional cases, and the case using weighted data.