Softmax is useful in multi-class classification problems and is given by
$
P(y = j | x) = \frac{e^{x^T w_j}}{\sum^K_{k=1}e^{x^Tw_k}}
$
Softmax will give the probability for each category and is typically used to resolve to only one class.
Softmax is used to resolve multi-label classification probabilities.
$
P(y=c|x;w) = \frac{e^{z_c}}{\sum_{j=1}^k e^{z_j}}
$
where $z_c = W_c x$ for class $c$.
Softmax is analagous to the [[sigmoid]] for single class classification problems,