Least squares in a non-ordinary sense

Simple ordinary least squares regression (SOLSR) means the following. Given data $(x_i,y_i)\in \mathbb{R}^2$ , $i = 1, \ldots,N$ , find a line in $\mathbb{R}^2$ represented by $y = mx+c$ that fits the data in the following sense. The loss of each data point $(x_i,y_i)$ to the line is

$|y_i - (mx_i + c) |$ for every $i$ ,

so we find $(m,c)$ that minimizes the loss function

$\displaystyle \sum_{i=1}^N [y_i - (mx_i + c)]^2.$

See here for a closed-form of the minimizer $(m,c)$ . Instead of SOLSR, one can consider the distance between a data point and a line in $\mathbb{R}^2$ as the loss. Notice that $|y_i - (mx_i + c) |$ is not the distance from $(x_i,y_i)$ to $y = mx+c$ unless $m = 0$ . Then the new least squares problem can be formulated as follows.

A general line in $\mathbb{R}^2$ can be expressed as $\sin(\theta)x-\cos(\theta)y+a=0$ where $(\theta,a)\in \mathbb{R}^2$ . Thus the distance between $(x_i,y_i)$ and this line is

$|\sin(\theta) x_i - \cos(\theta) y_i + a|$ for every $i$ .

Hence we want to find $(\theta,a)$ that minimizes the loss function

$\displaystyle L(\theta,a) :=\sum_{i=1}^N (\sin(\theta) x_i - \cos(\theta) y_i + a)^2$ .

Notice that, not only SOLSR, this problem also has a lot of real-life applications. It turns out there is still a closed-form for $(\theta,a)$ , and we will derive it. There is a high chance that the answer can be found somewhere, but we could not find it so far. Also it is a good exercise for Hong Kong students who know Additional Maths, although the subject disappeared.

Setting the partial derivative $L_a$ to be zero one has

$\displaystyle \sum_{i=1}^N (\sin(\theta) x_i - \cos(\theta) y_i + a) = 0$

which implies

$\displaystyle a = \frac{\cos(\theta)}{N} \sum_{i=1}^N y_i - \frac{\sin(\theta)}{N} \sum_{i=1}^N x_i := \cos(\theta) \overline{Y} - \sin(\theta) \overline{X}$ .

Using the formulae $\sin(2\theta) = 2\sin(\theta) \cos(\theta)$ and $\cos(2\theta) = \cos^2 (\theta) - \sin^2 (\theta)$ , one has

$\displaystyle L_\theta = \frac{\sin(2\theta)}{2}(\overline{X^2} - \overline{Y^2}) - \cos(2\theta) \overline{XY} + a (\cos(\theta) \overline{X} + \sin(\theta) \overline{Y}) = 0$

where

$\displaystyle \overline{X^2} := \frac{1}{N}\sum_{i=1}^N x_i^2, \ \overline{Y^2} := \frac{1}{N}\sum_{i=1}^N y_i^2 \ \text{ and } \ \overline{XY}:= \frac{1}{N}\sum_{i=1}^N x_iy_i$ .

Plugging in the above expression of $a$ into the above formula, we reach

$\sin(2\theta) (\text{Var}(X)-\text{Var}(Y)) = 2\cos(2\theta) \text{Cov}(X,Y)$

where

$\text{Var}(X) := \overline{X^2} - \overline{X}^2, \ \text{Var}(Y) := \overline{Y^2} - \overline{Y}^2 \ \text{ and } \ \text{Cov}(X,Y):= \overline{XY} - \overline{X} \ \overline{Y}$ .

It implies the formula

$\displaystyle \tan(2\theta) = \frac{2 \ \text{Cov}(X,Y)}{\text{Var}(X) - \text{Var}(Y)}$ ,

which concludes the result:

Theorem 1. The minimizer of the loss function $L(\theta,a)$ satisfies

$\displaystyle \tan(2\theta) = \frac{2 \ \text{Cov}(X,Y)}{\text{Var}(X) - \text{Var}(Y)}$ and $\displaystyle a = \cos(\theta) \overline{Y} - \sin(\theta) \overline{X}$ .

In particular the point $(\overline{X},\overline{Y})$ lies on the best fitted line $\sin(\theta)x - \cos(\theta) y + a = 0$ .

We leave to the readers to work on the high-dimensional cases, and the case using weighted data.

	Anonymous on Closed Graph Theorem implies O…
	Anonymous on Closed Graph Theorem implies O…
	Anonymous on Closed Graph Theorem implies O…
	Anonymous on Closed Graph Theorem implies O…
	Anonymous on Closed Graph Theorem implies O…
	Anonymous on Closed Graph Theorem implies O…
	Anonymous on Closed Graph Theorem implies O…
	Anonymous on Area of triangle on spher…
	Finite dimensional $… on Closed subspaces of a reflexiv…
	Christoffel Symbols… on Exponential maps of Lie g…

Least squares in a non-ordinary sense

Leave a comment Cancel reply

Recent Posts

Meta

Recent Comments

Categories

Top Posts

Archives

Blogroll

Email Subscription

Least squares in a non-ordinary sense

Share this:

Related

Leave a comment Cancel reply

Recent Posts

Meta

Recent Comments

Categories

Top Posts

Archives

Blogroll

Email Subscription