Softmax regression is a generalized form of logistic regression which can be used in multi-class classification problems where the classes are mutually exclusive. The hand-written digit dataset used in this tutorial is a perfect example. A softmax regression classifier trained on the hand written digits will output a separate probability for each of the ten digits, and the probabilities will all add up to 1.

Softmax regression consists of ten linear classifiers of the form:

The output of this equation is a vector, with one value for each hand-written digit. The first component is the probability that an input image of a digit, x, is the number “1” (belongs to class y = 1).

The tutorial notes don’t show this classifier is derived, and the relationship between this equation and the original binary logistic regression classifier is certainly not obvious.

One bit of intuition you can derive about it, though…

View original post 652 more words