Multilevel classification, Cohen kappa and Krippendorff alpha
I was facing an interesting problem last week. Playing with data from The Genome Cancer Atlas (full genetic and clinical data for thousands of patients) I was building a classifier that predicts the type of cancer based on sets of genetic signatures.
In the PANCAN33 subset there are samples for 33 different types of cancer. And the classifier shall be able to classify a new sample to one of these 33 classes.
I’ve tried different methods like random forest, svm, bgmm and few others, and end up with collection of classifiers. How to choose the best one?
We need a method that computes an agreement between classifier predictions and true labels/cancer types. For binary classifiers there is a lot of commonly used metrics like precision, recall, accuracy etc. But here we have 33 classes. The confusion matrix is 33×33 cells large, a lot of number to compare.
Of course there are some straightforward solutions like fraction of samples on which classifier correctly guesses true labels. But such easy solutions suffer a lot if there is unequal distribution of classes (quite common). Such metrics may be high for dummy classifier like: always vote for most common class. It is better to avoid such metrics.
Are they other measures of agreement that we can use?
Actually I used two interesting ones – Cohen Kappa and Krippendorff Alpha. They take into account the distribution of votes for each rater. Moreover Krippendorff alpha takes into account missing data (find more information here).
Both coefficients are widely used by psychometricians (e.g. to asses how two psychiatrists agree on a diagnosis). We use them in order to estimate the performance of the classifier. Both coefficients are implemented in the irr package.
Below you will find an example application.
kappa2(cbind(predictions, trueLabels)) # Cohen's Kappa for 2 Raters (Weights: unweighted) # # Subjects = 3599 # Raters = 2 # Kappa = 0.941 # # z = 160 # p-value = 0 kripp.alpha(rbind(predictions, trueLabels)) # Krippendorff's alpha # # Subjects = 3599 # Raters = 2 # alpha = 0.941