The normal (or Gaussian) distribution is probably the most important and widely used distribution in probability. Many populations have distributions that can be modeled very well by a normal distribution. In general, anytime data are generated by the summation of many random processes (multiple random variables of unknown distribution), the result is a normal distribution (e.g., height is the sum of genetic and environmental factors, fruit production is the sum of shading, soils, and moisture). This is also the reason for the [[Central Limit Theorem]]. Properties of the normal distribution include - $f(x)$ is symmetric about the line $x=\mu$ - $f(x) > 0$ and $\int\limits_{-\infty}^{\infty}f(x)dx=1$ - $\mu+\sigma$ and $\mu - \sigma$ are the inflection points for $f(x)$ ## Notation $X \sim N(\mu, \sigma^2)$ ## Probability Density Function $f(x) = \frac{1}{\sqrt{2\pi \sigma^2}} \exp \Big \{{-\frac{(x-\mu)^2}{2\sigma^2} } \Big \}$ Note that the core function is $e^{-x^2}$ and $\sigma$ controls the spread while $\mu$ controls the center. The scalar out front ensures it sums to $1$. For an intuition on where $\pi$ comes from, see [[Herschel-Maxwell]]. ## Expected Value $E(X) = \mu$ ## Variance $V(X) = \sigma^2$ where $\sigma$ is the standard deviation and is given when specifying the distribution. ## Joint PDF $ f(\vec x) = \frac{1}{\sqrt{2 \pi \frac{\sigma^2}{n}}} \exp \Big \{ \frac{1}{2 \sigma^2} n (\bar x - \mu)^2\Big \} $ ## Multivariate form The multivariate form arises for example in the context of [[base/Linear Regression/linear regression|linear regression]]. $ f(\vec \beta) = \frac{1}{\sqrt{(2 \pi)^p \ \Sigma_p}} \exp \Big \{ -\frac12 (\vec \beta - \vec \mu)^T \ \Sigma_p^{-1} (\vec \beta - \vec \mu) \Big \} $ for $p$ parameters (the dimension of vector $\beta$). ## Bivariate form Assume that $(X,Y)$ follow a bivariate normal distribution with $E(X) = E(Y) = 0$ and $Var(X) = Var(Y) = 1$. The pdf of $(X,Y)$ is given as $ f(x,y|\rho) = \frac{1}{2\pi\sqrt{1-\rho^2}} \exp\left(-\frac{x^2 - 2\rho xy + y^2}{2(1-\rho^2)}\right) $ where $\rho$ is the correlation coefficient for $X$ and $Y$. ## R ```R # Probability mass function pmf.norm <- function(x, mu, sigma){ f.x <- (1 / (sqrt(2 * pi) * sigma)) * exp(1) ** (-(x-mu)**2 / 2 * sigma**2) return(f.x) } # Parameterize with standard deviation not variance # Probability density function dnorm(x, mu, sqrt(var)) # Probability density function prob <- dnorm(x, mu, sqrt(var)) # Cumulative distribution function cum_prob <- pnorm(x, mu, sqrt(var)) # Quantile function quantile_val <- qnorm(x, mu, sqrt(var)) # Random number generation random_values <- rnorm(x, mu, sqrt(var)) ``` > [!warning] > The normal distribution is parameterized with the standard deviation $\sigma$ not variance $\sigma^2$ in R!