The Bayesian regression model is given by
$
\beta \ \Big | \ Y, X, \sigma_n \sim N \Big (S \cdot(\Sigma_p^{-1} \mu + X^T \Sigma_N^{-1} Y), \ S \Big)
$
where $S=(X^T \Sigma_n^{-1} X + \Sigma_p^{-1})^{-1}$.
Note that as $\Sigma_p \to \infty$, $\mu = 0$ and $\Sigma_n$ is proportional to the identity matrix (it will drop out through cancellation provided it's a constant), then the Bayesian model approaches the frequentist model where the [[least squares estimator]] is $\hat \beta = (X^T X)^{-1} X^T Y$. This corresponds with a very uninformative prior (high variance and zero mean) under the standard assumptions of linear regression.
Because the LSE is unbiased, under these conditions the Bayesian estimator would also be unbiased. However, this is not strictly possible because the variance can never be truly infinite and so the Bayesian estimator will always have some bias. (In the frequentist paradigm, this can be thought of as analogous to [[ridge regression]]).
The [[expectation]] of the Bayesian estimator is
$E(\beta) = \Big [ X^T X + \Sigma_p^{-1} \Big ]^{-1} \Big(X^TX \beta + \Sigma_p^{-1} \mu \Big)$
notice again that if $\Sigma_p^{-1}$ drops out the expectation of $\beta$ is $\beta$, and hence unbiased.
## R
Bayesian regression in [[R]] should follow this procedure.
1. Scale all variables to center on 0 and scale so variance is 1.
2. Generate a random matrix that is symmetric positive definite with all entries close to zero for $\Sigma_p$
3. Assume that $Y | X, \beta \sim N(X \beta, \Sigma_n)$ where $\Sigma_n \propto I_n$ and $\beta \sim N(\beta_{ML}, \Sigma_p)$. Compute the posterior mean and variance-covariance matrix.
```R
# Read data
marketing = read.table(
url(
"https://raw.githubusercontent.com/bzaharatos/-Statistical-Modeling-for-Data-Science-Applications/master/Modern%20Regression%20Analysis%20/Datasets/marketing.txt"
), sep = "")
n = dim(marketing)[1]; p = dim(marketing)[2] - 1
# Scale all variables
marketing_normalized = scale(marketing)
X = as.matrix(marketing_normalized[, 1:3])
y = as.matrix(marketing_normalized[, 4])
# Get frequentist values
MLE = solve(t(X) %*% X) %*% t(X) %*% y
y_pred = X %*% MLE
RSS = sum((y_pred - y)**2)
sig2_hat = RSS / (n - p)
# Generate a random matrix for Sigma_p
A = matrix(runif(p**2, 0, 10), ncol=p)
Sigma_p = t(A) %*% A
# Generate the matrix for Sigma_n
Sigma_n = sig2_hat * diag(1, ncol=n, nrow=n)
# Solve for betas
Sigma_beta = solve(t(X) %*% solve(Sigma_n) %*% X + solve(Sigma_p))
mu_1 = (Sigma_beta) %*% (t(X) %*% solve(Sigma_n) %*% y + solve(Sigma_p) %*% MLE)
```
Note that `solve` solves systems of linear equations in R and in this case gives the inverse of a square matrix.