This page was written so I could practise using LaTeX/TeX on Github Pages. Thanks to Vincent Tam's
page
for showing me how to do it!
Question
Suppose we have some data $x_1, x_2, \dots, x_n$ and $y_1, y_2, \dots y_n$ ($x_i, y_i \in \mathbb{R}$).
Then we can calculate the (ordinary least squares) regression line of $y$ on $x$.
Let's say that we get $y=mx+c$, and rearrange to get $x=\frac{1}{m}y - \frac{c}{m}$.
Is this the regression line of $x$ on $y$?
Answer
In general, no. For simplicity assume that $\sum x_i = \sum y_i = 0$. One can show that regression lines
always pass through the mean point $(\frac{\sum x_i}{n}, \frac{\sum y_i}{n})=(0,0)$. So the regression line of
$y$ on $x$ must be of the form $y = m x$ for some $m \in \mathbb{R}$. We get
\[ m = \text{argmin}_a(\sum_i (y_i - a x_{i})^2) = \frac{\sum x_i y_i}{\sum x_i x_i}
= \frac{\sum x_i y_i}{\sqrt{ \sum x_i x_i} \sqrt{ \sum y_i y_i}} \sqrt{\frac{\sum y_i y_i}{\sum x_i x_i}}
=r(x,y) \sqrt{\frac{\sum y_i y_i}{\sum x_i x_i}} \]
Swapping $x$ and $y$ around we get the regression line for $x$ on $y$ in the form $x=m' y$, where
\[ m' = r(y,x) \sqrt{\frac{\sum x_i x_i}{\sum y_i y_i}} \]
Since the correlation $r(x,y)$ is symmetric in $x$ and $y$, a necessary and sufficient condition for $m'$ to be the reciprocal of $m$
is that $r(x,y) = \pm 1$, which in general does not hold.
To Think About
What is wrong with the following argument for why $m'$ and $m$ should always be reciprocal?
"Comparing the scatterplots of $x$ vs $y$ and $y$ vs $x$, we see that they are
related by reflecting in the line $y=x$. This is an isometry, and so preserves all lengths. Since regression is about
minimising a sum of squares of lengths, the best line for regressing $y$ on $x$ will be
transformed to the best line for regressing $x$ on $y$. Reflection transforms $y=mx$ to $x=my$, or $y=\frac{1}{m}x$. This
appears to prove that $m$ and $m'$ are always reciprocal."
Hint: Are the line segments whose squares we are minimising horizontal or vertical?