Given a joint probability table of $x,y$ pairs, to calculate covariance and correlation
- calculate the expected value of $x$
- calculate the expected value of $y$
- create a table like below to calculate for each $x,y$ pairs
| x | y | $x-\mu_x$ | $y - \mu_y$ | $P(X=x,Y=y)$ | $cov(x,y)$ |
|:---:|:---:|:---------:|:-----------:|:------------:|:----------:|
| 100 | 0 | -75 | -125 | 0.2 | -40 |
where
$cov(x,y) = (x-\mu_x) * (y - \mu_y) * P(X=x,Y=y)$
- sum down the $cov(x,y)$ column to calculate the total covariance.
To get correlation coefficient, square and sum the columns $x - \mu_x$ to calculate the variance of $x$, repeat for $y$, calculate the square root of each to get their standard deviations, and divide the covariance by their product.
This can be efficiently calculated in R.
```R
# create table of values x, y, prob tuples
A <- matrix(
c(
1, 1, 0.2,
1, 4, 0.25,
1, 16, 0.05,
5, 1, 0.1,
5, 4, 0.15,
5, 16, 0.25
),
ncol=3,
byrow=TRUE
)
# Calculate expected value for x and y
mu.x <- sum(A[,1] * A[,3])
mu.y <- sum(A[,2] * A[,3])
# Calculate covariance
cov.XY <- sum((A[,1] - mu.x) * (A[,2] - mu.y) * A[,3])
# Calculate individual standard deviations
var.X <- sum((A[,1] - mu.x)**2 * A[,3])
sd.X <- sqrt(var.X)
var.Y <- sum((A[,2] - mu.y)**2 * A[,3])
sd.Y <- sqrt(var.Y)
# Calculate correlation
cor.XY <- cov.XY / (sd.X * sd.Y)
cov.XY
cor.XY
> 5.4
> 0.435
```