solving problems with unknown distributions numerically

If you are given a pmf/pdf or when encountering a new distribution, take these steps to understand it. First, list the sample space. What are all of the possible outcomes (i.e., the "support" of the distribution)? You may want to note the cardinality of the space, however if the outcomes are not equally likely or the random variable is continuous you will not use the cardinality in calculating probabilities. Next, consider the probabilities of the first few outcomes. You should see a pattern emerge that will orient you to the likely distribution and if necessary will help you derive the pmf/pdf. If provided, apply the pmf to the sample space. Otherwise, derive the general formula for the pmf and then apply it to the sample space. Ensure the pmf sums to 1, otherwise the pmf is not correct. Calculate the expected value. Multiply the PMF by the possible values and sum the vector. Calculate the Variance. Calculate the second moment and subtract from the expected value. ## R You can use R to numerically confirm the pmf/pdf for any function by confirming the density function sums to 1. For a bounded function, we'll create a sequence of smaller and smaller rectangles to sum as a "poor man's" derivative. Note we need to weight the probabilities by the width of the rectangle. Increment N by 10x to confirm that the density function does sum to 1. ```R # setup for a bounded function from a to b N <- 100 a <- 10 b <- 20 width <- (b - a)/N x <- seq(a, b, width) f.x <- function(x){ # specify pmf here } probabilities <- f.x(x) # confirm the PMF sums to 1 sum(probabilities * width) > 1.000... # Note that this should approach 1 as N grows ``` For an an $[0, \infty)$ bounded function, we'll grow the x space towards the limit by incrementing N. ```R # Set up for an [0, inft) unbounded function N <- 100 x <- seq(0, N, 1) f.x <- function(x){ # specify pmf here } probabilities <- f.x(x) # confirm the PMF sums to 1 sum(probabilities) / N > 1.000... # Note that this should approach 1 as N grows ``` Using the values calculated, you can now approximate the expected value, variance and standard deviation. ```R # calculate the expected value exp.X <- sum(x * probabilities * width) # calculate the variance var.X <- sum((x - exp.X)**2 * probabilities * width) # calcualte the standard deviation sd.X <- sqrt(var.X) ```