Machine Learning Notes 03

Multi Class Classification

With binary classification, you can only classify one class. It's either positive or negative outcome. For example, classifying images into $images\_with\_dogs$ and not $images\_with\_dogs$. In this case the dataset would contain dog images, and images of anything else.

$$ dataset \in \{ dog\_images, cat\_images, tree\_images, any\_other\_image\} $$

Binary classifier is unable to classify images into $images\_with\_dogs$ and $images\_with\_cats$. But however, if we constrain our dataset such that

$$ dataset \in \{ dog\_images, cat\_images\} $$

then our negative outcome equates to $cat\_images$. In multi class classification we have two strategies to classify items into $k$ classes using binary classifiers.

  • One vs Rest (OVR)
  • One vs One (OVO)
One vs Rest

Here one binary classifier is used to classify each class $C_i\quad where\quad i\in \{1..k\}$ yielding $k$ binary classifiers. This scheme (assigning 50 instances per label) requires $50\times k$ data instances per training set to train one classifier.

One vs One

Here one binary classifier is used to classify two classes (by using a training set containing only those two classes). In order to classify $k$ classes, we would need $k(k-1)/2$ classifiers. Again data requirement would $50\times 2$ data instances per training set to train one classifier.

Testing One vs One classifier

We have 3 classes, therefore we need $3\times(3-1)/2$ classifiers. The following table shows an example where a data instance from the validation set was evaluated to have classifier outputs as shown.

        Classifier 1 | Classifier 2 | Classifier 3 |  Votes
Class 1       +      |       -      |              |   1
Class 2       -      |              |       +      |   1
Class 3              |       +      |       -      |   1

At this point, we can deduce that all three classes are equally probable, and pick one in random. If each classifier has a probability distribution associated with each outcome, we can use this probability as a weight applied to the votes. This may give a an outcome which returns one class as the most probable.

So far we didn't discuss any specific classifier algorithm. But there are some classifiers such as Decision Trees which do multi class classification naturally.