PRSD Studio Documentation development version 2.2.3 (29-July-2010)

Chapter 4: Labels

This chapter describes how to handle labels and indexed properties.

Table of contents

4.1. Introduction ↩

Label is an assignment of an object into specific category. For example, an image of a road sign may be assigned into a class 'no stopping'. In pattern recognition we usually work with sets of labels corresponding to sets of data samples. In addition to common class label per data sample, PRSD Studio also uses labels for any indexable meta-data such as patient identifiers or video frames. Labels help us to categorize observations and define concepts.

In PRSD Studio, a set of labels is represented by an object of class sdlab. In this manual, we refer to one sdlab object by the plural 'labels'.

The sdlab object is composed of two components, namely the label list and the string vector representation.

The label list is a table containing all information on concepts represented by the label set. These may be individual classes, patients, or video frames. Each concept has a unique string name. Each entry of the label represents one object. For example, if a label set describes sample class labels, it shows to the user one string per sample. Internally a numerical representation is used to enhance speed in the operations.

4.2. Basic handling of labels ↩

Set of labels may be constructed by providing the class names per sample:

>> lab=sdlab({'banana','banana','apple','apple','banana'})
sdlab with 5 entries, 2 groups: 'apple'(2) 'banana'(3) 

We have created a label object lab holding information on 5 data samples. The samples are labeled into two classes, 'apple' and 'banana'. Label objects may be created in many different ways, see Section Creating labels.

Class name in PRSD Studio is always a string and is unique within the label set.

In order to view per-sample labels, we may use the getnames method or the unary plus operator:

>> +lab
ans =
banana
banana
apple 
apple 
banana

The classes are described in the list object stored within the label set:

>> lab.list
sdlist (2 entries)
 ind name
   1 apple 
   2 banana

As we can see, the list has two entries, as two are the concepts described.

In PRSD Studio, classes may be always accessed by name or index in the list. For example, we may use the find function to obtain indices of label entries for a specific class:

>> find(lab=='apple')
ans =
 3
 4

>> find(lab==1)
ans =
 3
 4

4.3. Creating labels ↩

4.3.1. Providing per-sample class names ↩

Label objects may be constructed by providing per-sample string labels in a cell array

>> t={'apple','banana','apple'};
>> lab=sdlab(t)
sdlab with 3 entries, 2 groups: 'apple'(2) 'banana'(1) 

or in string array:

>> t=strvcat({'apple','banana','apple'})
t =
apple 
banana
apple 

>> lab=sdlab(t)
sdlab with 3 entries, 2 groups: 'apple'(2) 'banana'(1) 

We may also provide per-sample labels directly into sdlab:

>> lab=sdlab('apple','banana','apple')
sdlab with 3 entries, 2 groups: 'apple'(2) 'banana'(1) 

We created a label set for three observations. The set contains two classes (in general called groups), namely 'apple' and 'banana'. Note that for small number of classes, the label set shows also the number of samples available per group.

When given a numerical vector, sdlab converts the numbers into strings:

>> lab=sdlab([15 10 10 20 15 15])
sdlab with 6 entries, 3 groups: '10'(2) '15'(3) '20'(1) 

4.3.2. Consecutive labeling ↩

Often, we need to create labels for sets of observations grouped by class. For example, we know there are first 3 apples, followed by 2 lemons and 2 bananas. We may create sdlab object by providing name and count for each class:

>> lab=sdlab('apple',3,'lemon',2,'banana',2)
sdlab with 7 entries, 3 groups: 'apple'(3) 'lemon'(2) 'banana'(2) 

To display the content of the sdlab object, we may use unary plus operator or getnames function:

>> +lab
ans =
apple 
apple 
apple 
lemon 
lemon 
banana
banana

When constructing the consecutive labeling inside Matlab functions, we may directly provide a cell array with names and a vector with class sizes:

>> lab=sdlab({'apple','lemon','banana'},[3 2 2])
sdlab with 7 entries, 3 groups: 'apple'(3) 'lemon'(2) 'banana'(2) 
>> +lab
ans =
apple 
apple 
apple 
lemon 
lemon 
banana
banana

4.3.3. One entry per class ↩

Sometimes, we need to create labels for a set of concepts with a single entry per concept. For example, we need to construct labels for 5 clusters. We may provide the base name and index to be appended

>> lab=sdlab('Cluster ',1:5)
sdlab with 5 entries, 5 groups: 
'Cluster 1'(1) 'Cluster 2'(1) 'Cluster 3'(1) 'Cluster 4'(1) 'Cluster 5'(1) 

We may, of course, provide directly the cluster identifiers, if needed:

>> lab=sdlab('Cluster ',[123 152 182])
sdlab with 3 entries, 3 groups: 'Cluster 123'(1) 'Cluster 152'(1) 'Cluster 182'(1) 

4.3.4. Using label list ↩

List object sdlist in PRSD Studio describes individual concepts such as classes. Labels may be also created using from any sdlist object:

>> ll=sdlist('apple','lemon','banana')
sdlist (3 entries)
 ind name
   1 apple 
   2 lemon 
   3 banana

For detailed information on creating and using sdlist object, see this Section. The simplest possibility is construction of a label set with one entry per list entry:

>> lab=sdlab(ll)
sdlab with 3 entries, 3 groups: 'apple'(1) 'lemon'(1) 'banana'(1) 

We can also construct labels by supplying the sdlist object and a vector of indices to the list.

>> lab=sdlab(ll,[1 1 3 2 2 2 2])
sdlab with 7 entries, 3 groups: 'apple'(2) 'lemon'(4) 'banana'(1) 

>> +lab
ans =
apple 
apple 
banana
lemon 
lemon 
lemon 
lemon 

Note that sdlab discards class entries not present in the vector with indexes. This is illustrating the basic principle that sdlab always contains only classes with at least one sample.

>> lab=sdlab(ll,[3 2 2])
sdlab with 3 entries, 2 groups: 'lemon'(1) 'banana'(2) 

Construction of label set from indexes is useful when working with classifier decisions. Decisions may represent only a subset of trained classes. The decision label list is therefore automatically reduced. % TODO ref needed.

4.4. Operations on labels ↩

4.4.1. Accessing label information ↩

Per-sample class names may be accessed using the plus operator or getnames method:

>> lab=sdlab('apple',3,'lemon',2,'banana',2)
sdlab with 7 entries, 3 groups: 'apple'(3) 'lemon'(2) 'banana'(2) 

>> getnames(lab)
ans =
apple 
apple 
apple 
lemon 
lemon 
banana
banana

Number of samples is equal to length of the label object:

>> length(lab)
ans =
 7

Number of classes present in the label set may be retrieved as a length of the label list:

>> length(lab.list)
ans =
 3

Per-sample class indices may be accessed using getindices method or the unary minus operator:

>> getindices(lab)
ans =
     1
     1
     1
     2
     2
     3
     3

>> -lab(1:4)
ans =
     1
     1
     1
     2

Label object contains only entries for classes with at least one sample. Empty classes are removed from the list.

4.4.2. Retrieving class sizes and priors ↩

The sdlab object keeps track of number of sample in each class. Use getsizes method to retrieve vector of class sizes:

>> getsizes(lab)
ans =
 3     2     2

Sizes are presented in the class order defined in the label list:

>> lab.list
sdlist (3 entries)
 ind name
   1 apple 
   2 lemon 
   3 banana

Related to class sizes are the class prior probabilities accessible by the getpriors method:

>> getpriors(lab)
ans =
0.4286    0.2857    0.2857

Handy shortcut to list the classes, their sizes and priors is the transpose operator:

>> lab'
 ind name               size percentage
   1 apple                 3 (42.9%)
   2 lemon                 2 (28.6%)
   3 banana                2 (28.6%)

4.4.3. Searching for samples with specific labels ↩

The find method helps us to identify label entries assigned to a specific class. We may use both equal or not-equal operators on labels.

>> find(lab=='banana')
ans =
 6
 7

% all samples that are not labeled as 'lemon'
>> find(lab~='lemon')
ans =
 1
 2
 3
 6
 7

In addition to the class name, we may also use relative index in the list.

4.4.4. Subsets of labels ↩

Given a list of sample indices, we can get a subset of the labels:

>> lab
sdlab with 7 entries, 3 groups: 'apple'(3) 'lemon'(2) 'banana'(2) 

>> lab(2:5)
sdlab with 4 entries, 2 groups: 'apple'(2) 'lemon'(2) 

>> +lab(2:5)
ans =
apple 
apple 
lemon 
lemon 

4.4.5. Relabeling: Changing class names and defining meta-classes ↩

The sdrelab function allows us to redefine the labeling by changing the class names or defining meta-classes. It accepts label object and a cell array with relabeling rules. Each rule is composed of source specifier defining the classes to be changed and a new class name.

For example, lets say we want to handle lemons and bananas together as 'yellow fruit'.

>> lab
sdlab with 7 entries, 3 groups: 'apple'(3) 'lemon'(2) 'banana'(2) 
>> lab.list
sdlist (3 entries)
 ind name
   1 apple 
   2 lemon 
   3 banana

>> lab2=sdrelab(lab,{[2 3] 'yellow fruit'})
  1: apple  -> apple       
  2: lemon  -> yellow fruit
  3: banana -> yellow fruit
sdlab with 7 entries, 2 groups: 'apple'(3) 'yellow fruit'(4) 

We have used the indices of 'lemon' and 'banana' classes as the source and 'yellow fruit' as the new class name. Because class name is always unique, we ended up with a single 'yellow fruit' class with four samples.

Multiple rules may be specified in one sdrelab command. Source specifiers always refer to the original situation in the input data set:

>> lab2=sdrelab(lab,{[2 3] 'yellow fruit' 'apple' 'red fruit'})
  1: apple  -> red fruit   
  2: lemon  -> yellow fruit
  3: banana -> yellow fruit
sdlab with 7 entries, 2 groups: 'red fruit'(3) 'yellow fruit'(4) 

We are often interested in a particular class and need to relabel all remaining classes. To achieve that, we may use the tilde ~ character at the beginning of the source class name. Such rule will be applied on the rest of classes:

>> lab2=sdrelab(lab,{'~apple' 'yellow fruit' })
  1: apple  -> apple       
  2: lemon  -> yellow fruit
  3: banana -> yellow fruit
sdlab with 7 entries, 2 groups: 'apple'(3) 'yellow fruit'(4) 

By default sdrelab shows the translation table. When executed from inside our routines, we may want to suppress this extra information by adding the 'nodisplay' option:

>> lab2=sdrelab(lab,{'~apple' 'yellow fruit' },'nodisplay')
sdlab with 7 entries, 2 groups: 'apple'(3) 'yellow fruit'(4) 

4.4.6. Concatenating label sets ↩

Labels may be concatenated vertically or horizontally.

Vertical concatenation means we are adding labels of other objects.

>> lab1=sdlab('apple',5,'banana',6)
sdlab with 11 entries, 2 groups: 'apple'(5) 'banana'(6) 

>> lab2=sdlab('lemon',2,'apple',10)
sdlab with 12 entries, 2 groups: 'lemon'(2) 'apple'(10) 

>> L=[lab1; lab2]
sdlab with 23 entries, 3 groups: 'apple'(15) 'banana'(6) 'lemon'(2) 

The resulting label set now represent 23 objects of three classes.

Horizontal concatenation appends the class names for the same set of objects. This is useful for construction of unique labels. Lets say we have a data set describing a set of pixels from multiple images. We added an image label to distinguish pixels from different images. We clustered pixels in each image and for each pixel stored the cluster identifier. We have, therefore, 'Cluster 1' label assigned to some pixels in each image. To construct a unique cluster label we use the horizontal concatenation:

% image labels for 20 pixels:
>> lab1=sdlab('Image 1',20)
sdlab with 20 entries from 'Image 1'

% clustering result for the same 20 pixels:
>> lab2=sdlab('Cluster 1',8,'Cluster 2',7,'Cluster 3',5)
sdlab with 20 entries, 3 groups: 'Cluster 1'(8) 'Cluster 2'(7) 'Cluster 3'(5) 

>> L=[lab1 lab2]
sdlab with 20 entries, 3 groups: 'Image 1Cluster 1'(8) 'Image 1Cluster 2'(7) 'Image 1Cluster 3'(5) 

We may also provide a string separator between the image name and cluster name during the concatenation:

>> L=[lab1 '-' lab2]
sdlab with 20 entries, 3 groups: 'Image 1-Cluster 1'(8) 'Image 1-Cluster 2'(7) 'Image 1-Cluster 3'(5) 

The label set L now uniquely identifies the cluster in a specific image.

4.5. Creating label lists ↩

List object sdlist describes a set of concepts. Each concept has a string name.

4.5.1. From list of class names ↩

List object may be created by providing the class names directly as parameters:

>> ll=sdlist('apple','lemon','banana')
sdlist (3 entries)
 ind name
   1 apple 
   2 lemon 
   3 banana

or providing the cell array with names:

>> ll=sdlist({'apple','lemon','banana'})
sdlist (3 entries)
 ind name
   1 apple 
   2 lemon 
   3 banana

Alternative is to use a character array:

>> ll=sdlist(strvcat({'apple','lemon','banana'}))
sdlist (3 entries)
 ind name
   1 apple 
   2 lemon 
   3 banana

Note that the user-defined order of classes in the list is preserved.

4.5.2. Using string prefix ↩

The sdlist may be also created given prefix string and numbers to be appended. For example, we want to define a list representing five features:

>> sdlist('Feature ',1:5)
sdlist (5 entries)
 ind name
   1 Feature 1
   2 Feature 2
   3 Feature 3
   4 Feature 4
   5 Feature 5

The prefix string may be also omitted whatsoever:

>> sdlist(1:4)
sdlist (4 entries)
 ind name
   1 1
   2 2
   3 3
   4 4

4.6. Operations on label lists ↩

4.6.1. Accessing list content ↩

List entries may be retrieved using getnames method or the plus operator:

>> ll
sdlist (3 entries)
 ind name
   1 apple 
   2 lemon 
   3 banana

>> getnames(ll)
ans =
apple 
lemon 
banana

>> +ll
ans =
apple 
lemon 
banana

4.6.2. Converting between names and indices ↩

sdlist class provides two methods for conversions between class names and indices in the list:

We may convert the per-sample indices into names:

>> ll=lab.list
sdlist (3 entries)
 ind name
   1 apple 
   2 lemon 
   3 banana

>> name2ind(ll,{'banana','lemon'})
ans =
           3
           2   

>> ind2name(ll,[1 2 2 2 1 3])
ans =
apple 
lemon 
lemon 
lemon 
apple 
banana

If a non-existent name is used, the conversion routine returns empty matrix []. If index is out of bounds, an error is raised.