Softmax is useful in multi-class classification problems and is given by $ P(y = j | x) = \frac{e^{x^T w_j}}{\sum^K_{k=1}e^{x^Tw_k}} $ Softmax will give the probability for each category and is typically used to resolve to only one class. Softmax is used to resolve multi-label classification probabilities. $ P(y=c|x;w) = \frac{e^{z_c}}{\sum_{j=1}^k e^{z_j}} $ where $z_c = W_c x$ for class $c$. Softmax is analagous to the [[sigmoid]] for single class classification problems,