covariance

Covariance measures the strength of the linear relationship between two random variables $X$ and $Y$. If both variables tend to deviate in the same direction then the covariance is positive. If the opposite is true, the covariance will be negative. When the two variables are not strongly (linearly) related, the covariance will be near 0. When $X$ and $Y$ are [[independent]], $Cov(X,Y) =0$, however note that just because the covariance is 0 does *not* prove that the two variables are independent. To demonstrate that two variables are in fact independent you must show for all $x,y$ pairs that $P(X=x,Y=y) = P(X=x) * P(Y=y)$. Covariance measures the strength of a linear relationship, and can be misleading for two variables that are related in nonlinear relationships. It is possible to have a strong relationship between $X$ and $Y$ with covariance close to 0 when the relationship is strongly non-linear. The definitional formula for covariance is $Cov(X,Y) = E[(X-\mu_x) * (Y-\mu_y)]$ The computational formula for covariance is $Cov(X,Y) = E(XY) - E(X)E(Y)$ ## R Covariance is `cov()` in R. Note that R uses $n-1$ as the denominator rather than $n$ when calculating covariance for an array of length $n$. ## Discrete Joint Random Variables For discrete random variables, covariance can be expressed as $\sum_x \sum_y (x - \mu_x)(y - \mu_y)P(X = x, Y = y)$ When the random variables $X$ and $Y$ are independent. ## Continuous Joint Random Variables For continuous random variables, covariance can be expressed as $\int\limits_{-\infty}^{\infty}\int\limits_{-\infty}^{\infty}(x-\mu_x)(y-\mu_y)f(x,y)dxdy$