PRSD Studio Documentation development version 2.2.3 (29-July-2010)

kb13: How to find samples with a specific type of error in a confusion matrix?

Keywords: confusion matrices

Problem: To find out what samples suffer from a specific type of error (defined by a confusion matrix)

Solution: Use the sdconfmatind function to find indices of samples in a specific cell of a confusion matrix.

Let us assume a two class banana dataset split into a training and test set:

>> load fruit; a=a(:,:,[1 2])

'Fruit set' 200 by 2 sddata, 2 classes: 'apple'(100) 'banana'(100)

>> [tr,ts]=randsubset(a,0.5)
Banana Set, 50 by 2 dataset with 2 classes: [25  25]
Banana Set, 50 by 2 dataset with 2 classes: [25  25]

We train a Gaussian model, apply the model to the test set, and obtain the classifier decisions (dec) at the default operating point.

>> p=sdgauss(tr);
>> pd=p*sddecide;
>> dec= ts*pd;;
+dec(1:10)

apple 
apple 
apple 
apple 
apple 
apple 
apple 
banana
apple 
apple 

The confusion matrix compares the ground-truth labels, stored in the test dataset ts, to the decisions dec:

>> sdconfmat(ts.lab,dec)

ans =

 True      | Decisions
 Labels    | apple  banana  | Totals
-------------------------------------
 apple     |    22      3   |    25
 banana    |     0     25   |    25
-------------------------------------
 Totals    |    22     28   |    50

We would now like to find out, what are the 3 apple samples that are misclassified as banana by our classifier. We use the sdconfmatind function providing it with ground truth labels, decisions and the true and estimated class defining the confusion matrix cell (here 'apple' and 'banana'):

>> ind=sdconfmatind(ts.lab,dec,'apple','banana')

ind =

     8
    17
    23
>> +dec(ind)

ans =

banana
banana
banana