PRSD Studio Documentation development version 2.2.3 (29-July-2010)

Chapter 7: Models and decisions

This chapter explains how to train classifiers and convert their output into decisions.

Table of contents

7.1. Introduction ↩

Statistical classifiers are trained from labeled examples to discriminate between user-defined classes. When applied to a new observation, a trained classifier returns a crisp decision. In PRSD Studio, we consider a classifier to be composed of two components as illustrated on the image below. The first one is the statistical model estimating a confidence that a data sample belongs to each of the considered classes. The second step is a decision function which converts the estimated confidences into a crisp decision.

The distinction between statistical model and decision function is important for practical applications as it enables fine-tuning of the trained models based on specific performance requirements. We will discuss this process later in the Chapter on ROC analysis.

We use the term soft output for the real-value result of the statistical model. Depending on a type of model, the soft output may take different forms. For a probabilistic classifier, it may be the posterior probability of class membership; for the nearest neighbor classifier the distance to the nearest neighbor; for the neural network simply the network output.

7.1.1. Training a statistical model ↩

Let us illustrate training of a statistical classifier on a simple example. We load an artificial "fruit" data set with two features and three classes, called apple, banana and stone.

In this example, we train linear classifier assuming normal densities using the sdlinear function. It estimates a Gaussian model for each of the classes assuming that they share the same covariance matrix:

>> load fruit
260 by 2 sddata, 3 classes: 'apple'(100) 'banana'(100) 'stone'(60) 

>> p=sdlinear(a)
sequential pipeline     2x3 'Linear discriminant'
 1  Gauss eq.cov.           2x3  3 classes, 3 components (sdp_normal)
 2  Output normalization    3x3  (sdp_norm)

The result of the training is a pipeline object p. In PRSD Studio, pipelines serve as a basis for training and execution of pre-defined pattern recognition algorithms.

The pipeline p has two inputs corresponding to the two input features in our fruit problem and three outputs representing the three classes.

The pipeline comprises two steps. The first is a Gaussian model and the second is output normalization assuring that the pipeline output is estimate of posterior probability of class membership.

When applied to the data, this pipeline will produce soft outputs of the model.

>> out=a*p
260 by 3 sddata, 3 classes: 'apple'(100) 'banana'(100) 'stone'(60) 

Looking inside the output set out, we will find three per-class outputs.

>> +out(1:4)
ans =
0.9860    0.0140    0.0000
0.9947    0.0053    0.0000
0.8437    0.1562    0.0001
0.9933    0.0067    0.0000

To return crisp decisions, we need to add a decision step in the pipeline. The simplest way is to call sddecide function:

>> pd=sddecide(p)
sequential pipeline     2x1 'Gauss eq.cov.+Output normalization+Decision'
 1  Gauss eq.cov.           2x3  3 classes, 3 components (sdp_normal)
 2  Output normalization    3x3  (sdp_norm)
 3  Decision                3x1  weighting, 3 classes, 1 ops at op 1 (sdp_decide)

Executed on input data set, the pipeline pd will return decisions in the sdlab object.

>> dec=a*pd
sdlab with 260 entries, 3 groups: 'apple'(105) 'banana'(99) 'stone'(56) 

>> +dec(1:7)
ans =
apple 
apple 
apple 
apple 
banana
banana
apple 

Pipelines are the PRSD Studio vehicle of delivering trained classifiers into external applications. They are always executed through the C library libPRSD which is inside Matlab linked as a MEX library.

Pipelines are fully self-contained and do not need any further information for their execution. We may, therefore, apply a pipeline directly to a matrix of measurements:

>> [0 0; -8 -5; -3 5; 5 10]*p
ans =
0.6167    0.3640    0.0193
0.0070    0.9849    0.0081
0.0039    0.1592    0.8369
0.1438    0.0245    0.8317

>> dec=double(a(1:7))*pd
dec =
  -101
  -101
  -101
  -101
  -102
  -102
  -101

7.2. Working with pipelines ↩

7.2.1. Pipeline output types ↩

Pipelines may provide different types of output. We may query the output type using getoutput function:

>> getoutput(p)
ans =
class similarity

>> getoutput(pd)
ans =
decision

The complete list of pipeline outputs includes:

Distinguishing similarity and distance output helps us to automatically fix the polarity of decision functions.

7.2.2. Pipeline labels ↩

When a pipeline which does not returning decisions is applied to the data set, it sets the feature names of the resulting data set object. This helps us to interpret the pipeline output.

For example, the labels of the pipeline p trained above are:

>> getlab(p)
sdlab with 3 entries, 3 groups: 'apple'(1) 'banana'(1) 'stone'(1) 

>> +getlab(p)
ans =
apple 
banana
stone 

Alternatively, we may just access the pipeline lab field:

>> p.lab
sdlab with 3 entries, 3 groups: 'apple'(1) 'banana'(1) 'stone'(1) 

The pipelines returning decisions return empty []:

>> pd.lab
ans =
 []

7.2.3. Pipeline list ↩

Pipelines returning decisions provide detailed information about decisions they are capable of in the list.

>> pd.list
sdlist (3 entries)
 ind name
   1 apple 
   2 banana
   3 stone 

The pipelines not returning decisions return empty list.

>> p.list
ans =
 []

7.2.4. Executing the trained model on new data ↩

The pipeline p may be executed on a 2D feature vector or matrix with two columns:

>> [0 0]*p
ans =
0.6167    0.3640    0.0193

Note that the pipeline outputs are posteriors and thus sum to one. Each of the outputs corresponds to one of the classes. We can request the labels assigned to pipeline outputs using getlab method:

>> getlab(p)
sdlab with 3 entries, 3 groups: 'apple'(1) 'banana'(1) 'stone'(1) 

>> +getlab(p)
ans =
apple 
banana
stone 

Alternatively, we can just access the pipeline lab field:

>> p.lab
sdlab with 3 entries, 3 groups: 'apple'(1) 'banana'(1) 'stone'(1) 

Pipelines may be executed on matrices with rows corresponding to samples and columns to features:

>> [0 0; -8 -5; -3 5; 5 10]*p
ans =
0.6167    0.3640    0.0193
0.0070    0.9849    0.0081
0.0039    0.1592    0.8369
0.1438    0.0245    0.8317

7.2.5. Visualizing model outputs using sdscatter ↩

In order to visualize the pipeline output, we may pass it together with the data to the sdscatter function:

>> p=sdgauss(a)
Gaussian model pipeline 2x3  3 classes, 3 components (sdp_normal)
>> sdscatter(a,p)
Warning: rendering only the default first output feature. Use 'out' option to visualize other outputs.

The scatter plot now contains a backdrop image showing the pipeline output computed in a grid over our 2D feature space. We can see the shape of the Gaussian probability density function estimated for the first class in the problem.

To visualize the soft output for the second class using the out option:

>> sdscatter(a,p,'out',2)

Note that the visualization of model output is supported only for 2D feature spaces.

7.3. Pipeline objects ↩

The concept of a pipeline is fundamental both for the design of classifiers in PRSD Studio and for their deployment in custom applications.

Pipelines represent pattern recognition algorithms. They may be trained using build-in routines like sdquadratic or sdmixture. They may be also created by converting algorithms trained elsewhere, for example in PRTools or using LIBSVM. Pipelines allow composition of complex chains of algorithms such as sequences, classifier combiners or even hierarchical classifiers.

Pipelines enable fast transition from algorithm design under Matlab to the production deployment. This is because pipeline execution is always performed through the libPRSD library written in C. Under Matlab, the libPRSD is available through the MEX interface; outside as a DLL without any dependency on Matlab or external libraries. The benefit of this solution is that you use identical execution implementation during algorithm design, testing and in production.

7.3.1. Training a pipeline ↩

We will illustrate training a pipeline on example of general mixture model implemented by sdmixture.

>> a=sddata(gendatf(1000))
1000 by 2 sddata, 3 classes: 'apple'(333) 'banana'(333) 'stone'(334)    

Used without arguments, it optimizes the number of Gaussian component per class).

>> p=sdmixture(a)
[class 'apple' initialization:......................... 4 clusters  EM:.............................. 4 comp] 
[class 'banana' initialization:......................... 2 clusters  EM:.............................. 2 comp] 
[class 'stone' initialization:......................... 1 cluster  EM:.............................. 1 comp] 
Mixture of Gaussians pipeline 2x3  3 classes, 7 components (sdp_normal)

We may specify number of components per class.

>> p=sdmixture(a,'comp',5)
[class 'apple' EM:.............................. 5 comp] 
[class 'banana' EM:.............................. 5 comp] 
[class 'stone' EM:.............................. 5 comp] 
Mixture of Gaussians pipeline 2x3  3 classes, 15 components (sdp_normal)

>> sdscatter(a,p)

The training is implemented using the Expectation-Maximization (EM) algorithm maximizing the model likelihood. By default it stops when likelihood change falls under a specific limit. We can also stop it after a given number of iterations:

>> p=sdmixture(a,'comp',[5 5 1],'iter',10)
[class 'apple' EM:.......... 5 comp] 
[class 'banana' EM:.......... 5 comp] 
[class 'stone' EM:.......... 1 comp] 
Mixture of Gaussians pipeline 2x3  3 classes, 11 components (sdp_normal)

Pipeline may be combined together. The example below constructs a trained pipeline composed of two steps, namely a PCA followed by a quadratic classifier.

>> a
381 by 1024 sddata, 17 classes: [31  28  24  33  19  21  57  26  21   9  13  15  14   1  14  29  26]
>> p1=sdpca(a,6)  %  PCA projection on the first 6 eigenvectors                    
PCA pipeline            1024x6  75% of variance (sdp_affine)     
>> p2=sdquadratic(a*p1)               
sequential pipeline     6x17 'Quadratic discr.'
 1  Gauss full cov.         6x17 17 classes, 17 components (sdp_normal)
 2  Output normalization   17x17 (sdp_norm)
>> p=p1*p2
sequential pipeline     1024x17 'PCA+Quadratic discr.'
 1  PCA                  1024x6  75%% of variance (sdp_affine)
 2  Gauss full cov.         6x17 17 classes, 17 components (sdp_normal)
 3  Output normalization   17x17 (sdp_norm)

7.3.2. Inspecting pipeline structure and parameters ↩

Pipeline object behaves as a sequence of actions. Each action is accessible by its index shown on the left in the pipeline display string. We can access pipeline actions using parentheses (). In our example above, we can inspect the second step in the pipeline p by:

>> p(2)
Gauss full cov. pipeline 6x17  17 classes, 17 components (sdp_normal)

In order to poke inside the pipeline, we can use the curly brackets {}:

>> p{2}

ans = 

 mean: [17x6x17 sddata]
  cov: [6x6x17 double]
prior: [1x17 double]

The parameters of pipeline actions are returned in a structure. This structure contains all the information needed to execute the pipeline on new data. For the quadratic classifier, it contains the means and covariances of the classes, and class priors. Accessing the value is straightforward:

>> +p{2}.mean(1:2)   %  gives the mean value for the first 2 classes 

ans =

  888.0094  490.5106   87.9525 -314.7964 -241.2580   54.5189
 -139.8726 -397.0727 -559.8701 -283.7282  -89.5977  234.9844

7.3.3. Constructing pipelines manually ↩

Pipelines may be constructed manually using the functions with sdp_* prefix. This gives us the freedom to train in arbitrary toolbox or library. The only requirement is that we are able to extract the classifier parameters.

For example, we may construct Parzen classifier by supplying smoothing parameter and the matrix with prototypes. We might use this approach to model the two fruit classes in the data a and so protect them from outliers. We might select proper smoothing, for example, using a grid search minimizing the detector error on the existing outliers (stone class). We will discuss construction of detectors in detail in Chapter 8.

>> load fruit
260 by 2 sddata, 3 classes: 'apple'(100) 'banana'(100) 'stone'(60) 
>> p=sdp_parzen('gauss',2,+a(:,:,{'apple','banana'}))
sequential pipeline     2x1 ''
 1  sdp_parzen          2x1  one class, 200 prototypes
>> sdscatter(a,p)

Another example of manual pipeline construction is the knowledge base article on training Support Vector classifier through the LIBSVM.