PRSD Studio Documentation development version 2.2.3 (29-July-2010)

Chapter 11: From models to classifiers

Table of contents

PRSD Studio provides several tools for training detectors, multi-class classifiers (discriminants) and hierarchical classifiers.

11.1. Detectors ↩

A detector is a classifier that focuses only at one class of interest, the target class. Detectors may be constructed using the sddetector command. It takes a data set, the target class and an untrained model as parameters and returns the pipeline object providing decisions.

pd = sddetector( data, target_class, model )

In the example below a Gaussian detector is constructed for the class 'apple' of the fruit data:

>> load fruit
260 by 2 sddata, 3 classes: 'apple'(100) 'banana'(100) 'stone'(60) 
>> pd=sddetector(a,'apple',sdgauss)
 1: apple  -> apple    
 2: banana -> non-apple
 3: stone  -> non-apple
sequential pipeline     2x1 'Gaussian model+Decision'
 1  Gaussian model          2x1  one class, 1 component (sdp_normal)
 2  Decision                1x1  thresholding on apple at op 32 (sdp_decide)

 >> sdscatter(a,pd)

Two approaches may be used to set the operating point (threshold) of sddetector:

If the user specifies an existing class as the target class, the sddetector will use the remaining classes as non-targets and set the operating points by the ROC analysis. Non-existing name of target class results in building a detector for all data using the one-class strategy. The reject parameter must then specify the fraction of samples to be rejected. If used when building detector for an existing class, sddetector simply fixes the threshold based on the target class samples only, skipping the ROC analysis.

11.1.1. Setting the detector operating point via thresholding ↩

The threshold is set by fixing the reject fraction of the samples in the target class. In this examples the detector rejects 10% of the banana class:

>> pd=sddetector(a,'banana',sdgauss,'reject',0.1)
  1: apple  -> non-banana
  2: banana -> banana    
  3: stone  -> non-banana
sequential pipeline     2x1 'Gaussian model+Decision'
 1  Gaussian model          2x1  one class, 1 component (sdp_normal)
 2  Decision                1x1  thresholding on banana at op 1 (sdp_decide)    
 >> sdscatter(a,pd)

11.1.2. Detector for the all data ↩

A detector can be build for the complete data set. In the example below the target class is named 'all'. Since this class name is not defined in the data set a, all samples are used as target:

>> pd=sddetector(a,'all',sdgauss,'reject',0.1)
sequential pipeline     2x1 'Gaussian model+Decision'
 1  Gaussian model          2x1  one class, 1 component (sdp_normal)
 2  Decision                1x1  thresholding on all at op 1 (sdp_decide)    
 >> sdscatter(a,pd)

11.1.3. Setting the detector operating point via ROC analysis ↩

sddetector may returns also the ROC object r with a set of alternative operating points. We can then change the operating point directly from the ROC curve plotted with the sddrawroc function.

>> load fruit;
>> [pd,r]=sddetector(a,'apple',sdmixture([],'comp',5,'iter',10))  
  1: apple  -> apple    
  2: banana -> non-apple
  3: stone  -> non-apple
[class 'apple' EM:.......... 5 comp] 
sequential pipeline     2x1 'Mixture of Gaussians+Decision'
 1  Mixture of Gaussians    2x1  one class, 5 components (sdp_normal)
 2  Decision                1x1  thresholding on apple at op 34 (sdp_decide)
ROC (52 thr-based op.points, 3 measures), curop: 34
est: 1:err(apple)=0.05, 2:err(non-apple)=0.00, 3:mean-error=0.03
>> sdscatter(a,pd,'roc',r) 

The scatter plot shows the corresponding boundary of the two-Gaussians model trained for the apple class.

11.1.4. Visualizing detector decisions on the image data ↩

sdimage may visualize decisions of any classifier pipeline trained in the feature space spanned by the image bands. We may also inspect decisions in an image at different operating points.

We save the hand-painted road labels into data2 data set:

>> data2
412160 by 3 sddata, 2 classes: 'unknown'(399848) 'road'(12312) 

Now we may train the road detector. We use a subset of 500 pixels for road sign background classes and train the detector:

>> b=randsubset(data2,500)
1000 by 3 sddata, 2 classes: 'unknown'(500) 'road'(500) 

>> [pd,r]=sddetector(b,'road',sdgauss)
  1: unknown -> non-road
  2: road   -> road
  1: road   -> road
  2: non-road -> non-road
sequential pipeline     3x1 'Gaussian model+Decision'
 1  Gaussian model          3x1  one class, 1 component (sdp_normal)
 2  Decision                1x1  thresholding on road at op 80 (sdp_decide)
ROC (192 thr-based op.points, 3 measures), curop: 80
est: 1:err(road)=0.00, 2:err(non-road)=0.19, 3:mean-error [0.50,0.50]=0.10

We can now visualize detector decisions on another image:

>> im2=imread('roadsign11.bmp');
>> sdimage(im2,pd)
ans =
 3

sdimage may visualize both the decisions and the ROC of the detector:

>> sdimage(im2,pd,'roc',r)
ans =
 1

This allows us to interactively analyze detector decisions at different operating points.

11.2. Discriminants ↩

PRSD Studio brings number of tools for designing two- or multi-class discriminants.

11.2.1. Nearest mean classifier ↩

sdnmean implements the nearest mean classifier. It uses normal model with identical covariance matrix for all classes. Its output is a similarity.

>> load fruit
260 by 2 sddata, 3 classes: 'apple'(100) 'banana'(100) 'stone'(60) 
>> p=sdnmean(a)
sequential pipeline     2x3 'Nearest mean'
 1  sdp_normal          2x3  3 classes, 3 components

>> sdscatter(a,sddecide(p))

11.2.2. Linear classifier assuming normal densities ↩

sdlinear is a linear discriminant based on assumption of normal densities. Covariance matrix is averaged using apparent class priors.

>> load fruit;
>> a=a(:,:,[1 2])
260 by 2 sddata, 2 classes: 'apple'(100) 'banana'(100)  

>> p=sdlinear(a)
sequential pipeline     2x2 'Linear discriminant'
 1  Gauss eq.cov.           2x2  2 classes, 2 components (sdp_normal)
 2  Output normalization    2x2  (sdp_norm)

Confusion matrix estimated on the training set at default operating point:

>> sdconfmat(a.lab,a*sddecide(p),'norm')

ans =

 True      | Decisions
 Labels    | apple  banana  | Totals
-------------------------------------
 apple     | 0.890  0.110   | 1.00
 banana    | 0.150  0.850   | 1.00
-------------------------------------

In order to use specific priors, provide them using the priors option:

>> p=sdlinear(a,'priors',[0.8 0.2])
sequential pipeline     2x2 'Linear discriminant'
 1  Gauss eq.cov.           2x2  2 classes, 2 components (sdp_normal)
 2  Output normalization    2x2  (sdp_norm)

>> sdconfmat(a.lab,a*sddecide(p),'norm')

ans =

 True      | Decisions
 Labels    | apple  banana  | Totals
-------------------------------------
 apple     | 0.950  0.050   | 1.00
 banana    | 0.200  0.800   | 1.00
-------------------------------------

Note that by increasing the apple class prior we lower the apple error. However, the banana error will increase accordingly.

11.2.3. Quadratic classifier assuming normal densities ↩

sdquadratic implements quadratic discriminant based on assumption of normal densities. Specific covariance matrix is estimated for each class.

>> p=sdquadratic(a)
sequential pipeline     2x2 'Quadratic discr.'
 1  Gauss full cov.         2x2  2 classes, 2 components (sdp_normal)
 2  Output normalization    2x2  (sdp_norm)
>> sdscatter(a,sddecide(p))

11.2.4. Gaussian mixture models ↩

sdmixture implements training of Gaussian mixture models using EM algorithm. By default, sdmixture estimates the number of components automatically from the data using the approach proposed in (J. Grim, J. Novovicova, P. Pudil, P. Somol, F.J. Ferri, Initializing Normal Mixtures of Densities, Prof.of ICPR 1998.).

>> load fruit
260 by 2 sddata, 3 classes: 'apple'(100) 'banana'(100) 'stone'(60) 
>> p=sdmixture(a)
[class 'apple' initialization:....................... 3 clusters  
EM:.............................. 3 comp] 
[class 'banana' initialization:....................... 3 clusters  
EM:.............................. 3 comp] 
[class 'stone' initialization:....................... 1 cluster  
EM:.............................. 1 comp] 
Mixture of Gaussians pipeline 2x3  3 classes, 7 components (sdp_normal)

>> sdscatter(a,p) %  visualize the output on 1st class ('apple')

We can visualize the mixture decisions by adding default operating point using sddecide:

The number of mixture components may be also specified manually using the comp option. If a scalar number is specified, it is used for each class. Alternatively, we may provide the vector with number of components for each class (in the lab.list order):

>> p2=sdmixture(a,'comp',[4 4 1])
[class 'apple' EM:.............................. 4 comp] 
[class 'banana' EM:.............................. 4 comp] 
[class 'stone' EM:.............................. 1 comp] 
Mixture of Gaussians pipeline 2x3  3 classes, 9 components (sdp_normal)

By default, the EM algorithm is terminated after 30 iterations. The iteration count may be changed using the iter option. If iter is set to [], the EM algorithm stops when the likelihood difference becomes lower that delta (by default 1e-4).

The EM algorithm may be initialized using init option by providing a sdp_normal pipeline. In this example, we estimate the mixture starting from a simple two-component Gaussian model. Note that the provided model initializes each per-class EM algorithm. Therefore, we eventually obtain a six-component mixture model.

>> pinit=sdp_normal(sddata([0 0; 1 1]), {eye(2) eye(2)},[0.5 0.5])
Gauss pipeline          2x1  one class, 2 components (sdp_normal)
>> p3=sdmixture(a,'initmodel',pinit)
[class 'apple' EM:.............................. 2 comp] 
[class 'banana' EM:.............................. 2 comp] 
[class 'stone' EM:.............................. 2 comp] 
Mixture of Gaussians pipeline 2x3  3 classes, 6 components (sdp_normal)

11.2.5. k-NN classifier ↩

k-NN (k-th nearest neighbor) is a non-parametric classifier. This means that instead of building a model from training examples it uses them directly as evidence for computing its output on new observations. The stored training examples are called "prototypes".

By default sdknn implements the first nearest neighbor rule (k=1) using all provided training examples as prototypes:

>> load fruit
260 by 2 sddata, 3 classes: 'apple'(100) 'banana'(100) 'stone'(60) 
>> p=sdknn(a)
1-NN pipeline           2x3  3 classes, 260 prototypes (sdp_1nn) 
>> sdscatter(a,sddecide(p))

Using larger neighborhoods (k>1), the nearest neighbor classifier becomes more robust against noise in the areas of overlap. sdknn implements two approaches:

The kappa method is used by default for any k>1:

>> p1=sdknn(a,'k',10) %  by default 'kappa'
10-NN classifier (dist) pipeline 2x3   (sdp_stack)    
>> sdscatter(a,sddecide(p1))

Note that the decision boundary becomes much more stable between the overlapping classes.

The nearest neighbor computing class fractions is created by setting the method to classfrac:

>> p2=sdknn(a,'k',10,'method','classfrac')
10-NN pipeline          2x3  k=10, 260 prototypes (sdp_knnmc)

Although both methods may look similar at the first sight, different way of computing the per-class output has important practical consequences. The kappa method returns distance while the classfrac the similarity (fraction):

>> getoutput(p1)
ans =
class distance

>> getoutput(p2)
ans =
class similarity

Because classfrac is computing class fractions between k neighbors, it provides a discriminant applicable to two or more classes. This means that it splits the feature space into open sub-spaces. It separates classes but cannot be used to detect one class protecting it from all directions.

The kappa method, on the other hand, computes distance to k-th nearest neighbor for each class separately.

Finally, let us compare the outputs of both k-NN methods on a small test set:

>> ts=sddata(gendatf([3 3 3]))
'Fruit set' 9 by 2 sddata, 3 classes: 'apple'(3) 'banana'(3) 'stone'(3) 

>> out1=ts*p1 %  kappa
'Fruit set' 9 by 2 sddata, 3 classes: 'apple'(3) 'banana'(3) 'stone'(3) 
>> out2=ts*p2 %  classfrac
'Fruit set' 9 by 2 sddata, 3 classes: 'apple'(3) 'banana'(3) 'stone'(3) 

>> [+out1 +out2]

ans =

   1.4888   44.7181  115.8823    0.8462    0.0769    0.0769
   1.9581   26.9135   71.0346    0.8462    0.0769    0.0769
   1.4120   40.8463  109.4723    0.8462    0.0769    0.0769
  48.1386    3.0872    5.4234    0.0769    0.6154    0.3077
  26.9787    6.6342    6.6689    0.0769    0.5385    0.3846
  26.4837    2.0395   14.6985    0.0769    0.6154    0.3077
  40.3134   34.9477    4.5834    0.0769    0.0769    0.8462
  49.8063   20.2838    2.3588    0.0769    0.0769    0.8462
  23.6764   35.0027   12.8724    0.2308    0.0769    0.6923

In the first three columns, we can see distances returned by kappa k-NN. In the last three columns, we get the fractions of classfrac. Note that the later ones attain only handful of fixed values (fractions). This results in sparse ROC plots for the classfrac classifier.

11.2.5.1. Prototype selection ↩

By default sdknn uses all training examples as prototypes. It may, however, also select a smaller subset. Two prototype selection strategies are used:

In this example, we randomly select 40 prototypes per class:

>> p=sdknn(a,'k',5,'proto',40)
5-NN classifier (dist) pipeline 2x3   (sdp_stack)

Number of prototypes may be adjusted per class. Simpler classes may be described by smaller prototype sets. This increases execution speed of the k-NN classifier:

>> p=sdknn(a,'k',5,'proto',[40 40 10])
5-NN classifier (dist) pipeline 2x3   (sdp_stack)

To select prototypes using kcentres, use protosel option:

>> p=sdknn(a,'k',5,'proto',40,'protosel','kcentres')
5-NN classifier (dist) pipeline 2x3   (sdp_stack)

k-centres algorithm computes dissimilarity matrix. For large data sets, sdknn uses only randomly drawn 2000 examples to compute k-centres.

11.2.5.2. Using large data sets ↩

sdknn can handle large data sets with tens of thousands of samples. Here we train 10-NN classifier on 100 000 samples and test its execution speed on 1000 test objects:

>> a=sddata(gendatf(100000))
'Fruit set' 100000 by 2 sddata, 3 classes: 'apple'(33333) 'banana'(33333) 'stone'(33334)

>> p=sdknn(a,'k',10)
10-NN classifier (dist) pipeline 2x3   (sdp_stack)

>> ts=sddata(gendatf(1000))
'Fruit set' 100000 by 2 sddata, 3 classes: 'apple'(333) 'banana'(333) 'stone'(334)

>> tic; out=ts*p; toc
Elapsed time is 1.338342 seconds. % using 2.33 GHz laptop

11.2.6. Parzen classifier ↩

Parzen classifier estimates probability density for each class using a non-parametric approach based on stored training examples. When computing output for a new observation, the contribution of each training example is integrated. The contribution is modeled by a kernel function and is influenced by the smoothing parameter (kernel width).

By default, sdparzen trains a Parzen classifier with Laplace kernel function which is less computationally demanding than frequently-adopted Gaussian kernel. Smoothing parameter is optimized using EM algorithm optimizing cross-validated log-likelihood.

>> load fruit
260 by 2 sddata, 3 classes: 'apple'(100) 'banana'(100) 'stone'(60) 

>> p=sdparzen(a)
.....
Parzen pipeline         2x3  3 classes, 260 prototypes (sdp_parzen)    
>> sdscatter(a,sddecide(p))

The default Parzen classifier uses scalar smoothing i.e. equal kernel width for each feature. Smoothing parameter is stored in h field of parzen pipeline:

>> p{1}

ans = 

kernel: 'laplace'
     h: 0.6482
 proto: [260x2 double]
 prior: [0.3846 0.3846 0.2308]

Smoothing may be fixed manually. Here we use very small kernel with:

>> p=sdparzen(a,'h',0.04)
Parzen pipeline         2x3  3 classes, 260 prototypes (sdp_parzen)    
>> sdscatter(a,sddecide(p))

Note that the decision boundary becomes very complicated emphasizing very local changes of the class distributions.

Smoothing parameter may be also estimated for each dimension, specifying h as vector:

>> p=sdparzen(a,'h','vector')
.........
Parzen pipeline         2x3  3 classes, 260 prototypes (sdp_parzen)

>> p{1}

ans = 

kernel: 'laplace'
     h: [0.5782 0.7266]
 proto: [260x2 double]
 prior: [0.3846 0.3846 0.2308]

Vector smoothing requires extra multiplication for each dimension of each training sample. Alternative strategy is to scale the data to unit variance so that scalar smoothing is sufficient.

11.2.7. Neural network ↩

Feed-forward neural network may be trained using sdneural function. By default, 10 hidden units is used and the optimization runs for 1000 iterations (epochs):

>> load fruit
260 by 2 sddata, 3 classes: 'apple'(100) 'banana'(100) 'stone'(60) 
>> p=sdneural(a)
epochs (*100):..........
Neural network pipeline 2x3   (sdp_neural)

>> sdscatter(a,sddecide(p))

The provided data set is split into training and validation subsets (75%/25% by default). The training subset is used in optimization and the validation subset to estimate the generalization error. The validation fraction may be changed using valfrac option. Eventually, sdneural returns the network with lowest mean square error (MSE) on the validation set. Thanks to this approach the sdneural does not overfit training data when trained for a large number of epochs.

11.2.8. Naive Bayes classifier ↩

Naive Bayes classifier is implemented by the sdnbayes function. For each feature, it estimates a class-conditional distribution using a histogram. Assuming independence of features, the per-class output is computed as a product of per-feature class conditional densities.

sdnbayes estimates the number of histogram bins from the data using non-parametric density estimation approach.

>> a=sddata(gendatf(3000))
3000 by 2 sddata, 3 classes: 'apple'(1000) 'banana'(1000) 'stone'(1000)     
>> p=sdnbayes(a)
 class 'apple':.. class 'banana':.. class 'stone':..
Naive Bayes pipeline    2x3   (sdp_nbayes) 
>> sdscatter(a,sddecide(p))

The number of histogram bins may be fixed manually using the bins option:

>> p=sdnbayes(a,'bins',20)
  class 'apple': class 'banana': class 'stone':
Naive Bayes pipeline    2x3   (sdp_nbayes)

11.2.9. Decision tree classifier ↩

sdtree implements a decision tree algorithm training in a feature per feature manner. It uses validation set to find the most generalizing tree.

>> a=sddata(gendatl(5000))
Lithuanian Classes, 10000 by 2 sddata, 2 classes: '1'(5000) '2'(5000)
>> p=sdtree(a)
Decision tree pipeline  2x2   (sdp_tree)
>> pd=sddecide(p)  %  we add the default operating point
sequential pipeline     2x1 'Decision tree+Decision'
 1  Decision tree           2x2  (sdp_tree)
 2  Decision                2x1  weighting, 2 classes, 1 ops at op 1 (sdp_decide)

Let's visualize the decisions at default operating point on a subset of the training set:

>> sdscatter(gendat(a,500),pd)

We may estimate the mean test error using sdtest:

>> b=sddata(gendatl(5000))
'Lithuanian Classes' 10000 by 2 sddata, 2 classes: '1'(5000) '2'(5000) 
>> sdtest(b,pd)

ans =

0.0346

Execution speed on 100 000 samples is:

>> data=rand(100000,2);
>> tic; dec=data.*pd; toc
Elapsed time is 0.014774 seconds.

Given enough training examples, decision tree may yield a high performance classifier which is very fast in execution.

11.2.10. Support Vector Machine ↩

PRSD Studio sdsvc command trains a support vector machine classifier using libSVM library. At the moment, sdsvc supports only RBF kernel. The trained support vector machines are executed through the libPRSD execution library.

By default, sigma and C parameters are selected automatically based on grid search minimizing the mean error on a validation set (25% of the training set).

>> b
'Fruit set' 200 by 2 sddata, 2 classes: 'apple'(100) 'banana'(100) 

>> p=sdsvc(b)
....................sigma=1.85511 C=54.6 err=0.000 SVs=15
sequential pipeline     2x1 'Scaling+Support Vector Machine'
 1  Scaling                 2x2  (sdp_affine)
 2  Support Vector Machine    2x1  (sdp_svc)

sdsvc displays the sigma and C parameters together with the error on validation set and number of support vectors. Keep in mind that the number of support vectors is important indicator of well-trained support vector machine. Large number of SVs may mean that we use wrong sigma or C or that the libsvm optimizer did not find good solution.

Note that sdsvc performs scaling of input data by default. This may be switched off using the 'noscale' option.

The two-class support vector machine yields one soft output which needs to be thresholded in order to make a decision. We may add a default operating point with zero threshold using sddecide:

>> pd=sddecide(p)
sequential pipeline     2x1 'Scaling+Support Vector Machine+Decision'
 1  Scaling                 2x2  (sdp_affine)
 2  Support Vector Machine    2x1  (sdp_svc)
 3  Decision                1x1  thresholding on apple at op 1 (sdp_decide)

>> sdscatter(b,pd)

11.2.10.1. Grid search for sigma and C parameters ↩

sdsvc allows us to specify sigma and C parameters explicitly:

>> p=sdsvc(b,'sigma',1.5,'C',10)
SVs=21
sequential pipeline     2x1 'Scaling+Support Vector Machine'
 1  Scaling                 2x2  (sdp_affine)
 2  Support Vector Machine    2x1  (sdp_svc)

We may also specify the vector of sigmas and Cs. Second output of sdsvc is a structure with estimated errors and numbers of support vectors:

>> [p,E]=sdsvc(b,'sigma',0.1:0.1:5,'C',[0.01 0.1 1 3 5 10 20])
..................................................sigma=1.10000 C=20 err=0.020 SVs=21
sequential pipeline     2x1 'Scaling+Support Vector Machine'
 1  Scaling                 2x2  (sdp_affine)
 2  Support Vector Machine    2x1  (sdp_svc)

E = 

sigmas: [1x50 double]
    Cs: [0.0100 0.1000 1 3 5 10 20]
   err: [50x7 double]
   svs: [50x7 double]

We may visualize the errors and number of suport vectors in a 3D surface plot:

>> figure; surf(E.Cs,E.sigmas,E.err)
>> xlabel('C'); ylabel('sigma'); zlabel('error')
>> figure; surf(E.Cs,E.sigmas,E.svs)
>> xlabel('C'); ylabel('sigma'); zlabel('SVs')

11.2.10.2. Multi-class support vector machines ↩

For multi-class data sets, sdsvc trains one support vector machine per class in one-against-all manner optimizing the parameters specifically for each sub-problem.

>> a
'Fruit set' 260 by 2 sddata, 3 classes: 'apple'(100) 'banana'(100) 'stone'(60) 

>> p=sdsvc(a)
one-against-all: ['apple' ....................sigma=0.90884 C=0.695 err=0.062 SVs=58] 
['banana' ....................sigma=0.98847 C=0.336 err=0.113 SVs=96] 
['stone' ....................sigma=1.37335 C=1.44 err=0.043 SVs=45] 
sequential pipeline     2x3 'Scaling+Support Vector Machine+Output normalization'
 1  Scaling                 2x2  (sdp_affine)
 2  Support Vector Machine    2x3  (sdp_stack)
 3  Output normalization    3x3  (sdp_norm)

>> pd=sddecide(p)
sequential pipeline     2x1 'Scaling+Support Vector Machine+Output normalization+Decision'
 1  Scaling                 2x2  (sdp_affine)
 2  Support Vector Machine    2x3  (sdp_stack)
 3  Output normalization    3x3  (sdp_norm)
 4  Decision                3x1  weighting, 3 classes, 1 ops at op 1 (sdp_decide)

>> sdscatter(a,pd)

Grid-search estimated errors in multi-class problems are returned in a cell array with one E structure per class:

>> [p,E]=sdsvc(a,'sigma',0.1:0.1:5,'C',[0.01 0.1 1 3 5 10 20])
one-against-all: ['apple' ..................................................sigma=0.80000 C=1 err=0.050 SVs=51] 
['banana' ..................................................sigma=0.80000 C=0.1 err=0.125 SVs=150] 
['stone' ..................................................sigma=1.10000 C=10 err=0.010 SVs=33] 
sequential pipeline     2x3 'Scaling+Support Vector Machine+Output normalization'
 1  Scaling                 2x2  (sdp_affine)
 2  Support Vector Machine    2x3  (sdp_stack)
 3  Output normalization    3x3  (sdp_norm)

E = 

[1x1 struct]    [1x1 struct]    [1x1 struct]

>> E{2}

ans = 

sigmas: [1x50 double]
    Cs: [0.0100 0.1000 1 3 5 10 20]
   err: [50x7 double]
   svs: [50x7 double]

11.2.10.3. Accessing support vectors ↩

Support vector objects are stored in the proto sddata set inside the trained sdsvc pipeline.

>> b
'Fruit set' 200 by 2 sddata, 2 classes: 'apple'(100) 'banana'(100) 

>> p2=sdsvc(b)
....................sigma=0.92990 C=26.4 err=0.000 SVs=19
sequential pipeline     2x1 'standardization+Support Vector Machine'
 1  standardization         2x2  (sdp_affine)
 2  Support Vector Machine    2x1  (sdp_svc)

In case of this two-class SVC, we look for support vectors in the second step of the resulting pipeline (the first step performs data scaling).

>> p2{2}

ans = 

   type: 'rbf'
    par: 0.9299
  proto: [19x2x2 sddata]
weights: [19x1 double]
 offset: -0.0757
   mean: [0 0]

>> p2{2}.proto
19 by 2 sddata, 2 classes: 'apple'(9) 'banana'(10) 

Multi-class SVC contains for each one-against-others problem one SVC model in the stacked pipeline.

>> a
'Fruit set' 260 by 2 sddata, 3 classes: 'apple'(100) 'banana'(100) 'stone'(60) 

>> p=sdsvc(a)
one-against-all: ['apple' ....................sigma=0.97084 C=26.4 err=0.025 SVs=21] 
['banana' ....................sigma=0.93841 C=0.336 err=0.050 SVs=96] 
['stone' ....................sigma=0.94301 C=12.7 err=0.000 SVs=36] 
sequential pipeline     2x3 'standardization+Support Vector Machine+Output normalization'
 1  standardization         2x2  (sdp_affine)
 2  Support Vector Machine    2x3  (sdp_stack)
 3  Output normalization    3x3  (sdp_norm)

>> p{2}
stacked pipeline        2x3 '++'
 1  Support Vector Machine    2x1  (pipeline)
 2  Support Vector Machine    2x1  (pipeline)
 3  Support Vector Machine    2x1  (pipeline)

>> p{2}{1}
Support Vector Machine pipeline 2x1   (sdp_svc)

>> p{2}{1}{1}

ans = 

   type: 'rbf'
    par: 0.9708
  proto: [21x2x2 sddata]
weights: [21x1 double]
 offset: 0.6474
   mean: [0 0]

>> p{2}{1}{1}.proto
21 by 2 sddata, 2 classes: 'apple'(8) 'others'(13) 

In order to access indices of the support vectors in the original data set, use the original property of the proto data set.

>> proto=p{2}{1}{1}.proto
21 by 2 sddata, 2 classes: 'apple'(8) 'others'(13) 

>> proto'
21 by 2 sddata, 2 classes: 'apple'(8) 'others'(13) 
sample props: 'lab'->'class' 'class'(L) 'original'(N)
feature props: 'featlab'->'featname' 'featname'(L)
data props:  'data'(N)

>> proto(1:3).original

ans =

14
21
22

The first three support vectors are:

>> +proto(1:3)

ans =

0.5759   -0.6914
1.0284    1.1254
1.1145    0.1547

Support vectors in the original (unscaled) feature space are:

>> +b( proto(1:3).original )

ans =

0.1362   -3.2968
2.2507    4.8269
2.6530    0.4864

Scaling with the first pipeline step, we receive identical numbers:

>> +b( proto(1:3).original ) * p(1)

ans =

0.5759   -0.6914
1.0284    1.1254
1.1145    0.1547

11.3. Classifier combining ↩

PRSD Studio supports both major approaches to classifier combining, namely fixed and trained combiners. Fixed combiners are rules chosen on the basis of our assumptions on the problem. For example, if we know that classifiers trained on different data representations are independent, we might choose product rule. The second type of classifier combiner is trained. Instead of assuming fusion strategy, we learn it from examples. Trained combiners simply use per-class outputs of base classifiers as new features.

11.3.1. Fixed combiners ↩

In order to build fixed combiner, we need to train the base classifiers, stack them into a single pipeline and add the fixed combiner stem.

>> load fruit
260 by 2 sddata, 3 classes: 'apple'(100) 'banana'(100) 'stone'(60) 
>> p1=sdquadratic(a)
sequential pipeline     2x3 'Quadratic discr.'
 1  Gauss full cov.         2x3  3 classes, 3 components (sdp_normal)
 2  Output normalization    3x3  (sdp_norm)
>> p2=sdparzen(a)
....
Parzen pipeline         2x3  3 classes, 260 prototypes (sdp_parzen)

>> P=sdp_stack({p1,p2})
stack pipeline          2x6   (sdp_stack)

Note that the stacked pipeline has two inputs (2D feature space) and 6 outputs corresponding to 3 per-class outputs for the p1 and 3 per-class outputs for p2.

Fixed combiner using mean rule is constructed by supplying also the number of classes, number of classifiers and class names:

>> pcomb=sdp_combine('mean',3,2,a.lab)
sequential pipeline     6x3 ''
 1  sdp_combine         6x3  Fixed combiner

Let us now compose the full pipeline delivering decisions of the fixed combiner at default operating point:

>> Pfull=[P pcomb sddecide(pcomb)]
sequential pipeline     2x1 ''
 1  sdp_stack           2x6  
 2  sdp_combine         6x3  Fixed combiner
 3  sdp_decide          3x1  Weight-based decision (3 classes)
>> sdscatter(a,Pfull)

11.4. Hierarchical classifiers ↩

PRSD Studio provides tools for designing hierarchies of classifiers, based on apriori decision-level rules. Let us, for example, consider a problem separating 'apple', 'bananas' and 'lemons' on the conveyor belt. We want to build a detector separating all fruit examples from outliers on the conveyor belt (rocks, dirt, mice,...). Only the samples classified as "fruit" will pass on to the next stage where different types of fruit get separated.

% preparing simple artificial data set:
>> rand('state',42)
>> data=[gauss(1000,[-2 3],[1 0.4; 0.1 1]); gauss(1000,[0 2],[1 0; 0 1]); gauss(1000,[1 4],[2 0.4; 0.1 2]); gauss(100, [-3 4],[6 0; 0 6])];
>> lab=sdlab({'apple','banana','lemon','outlier'},[1000 1000 1000 100])
sdlab with 3100 entries, 4 groups: 'apple'(1000) 'banana'(1000) 'lemon'(1000) 'outlier'(100) 
>> a=sddata(+data,lab)
3100 by 2 sddata, 4 classes: 'apple'(1000) 'banana'(1000) 'lemon'(1000) 'outlier'(100) 

We will split the data into training and testing parts:

>> [tr,ts]=randsubset(a,0.5)
1550 by 2 sddata, 4 classes: 'apple'(500) 'banana'(500) 'lemon'(500) 'outlier'(50) 
1550 by 2 sddata, 4 classes: 'apple'(500) 'banana'(500) 'lemon'(500) 'outlier'(50) 

To train the detector, we need to re-label the data so it reflects our fruit/outlier problem:

>> tr_det=sdrelab(tr,{'~outlier','fruit'})
1: apple   -> fruit  
2: banana  -> fruit  
3: lemon   -> fruit  
4: outlier -> outlier
1550 by 2 sddata, 2 classes: 'outlier'(50) 'fruit'(1500) 

Now, we train a detector using a Gaussian mixture with three components. We fix the operating point using the reject option rejecting 1% of fruit samples:

>> pd=sddetector(tr_det,'fruit',sdmixture([],'n',3,'iter',20),'reject',0.01)    
  1: outlier -> non-fruit
  2: fruit  -> fruit    
[class 'fruit' EM:.................... 3 comp] 
sequential pipeline     2x1 'Mixture of Gaussians+Decision'
 1  Mixture of Gaussians    2x1  one class, 3 components (sdp_normal)
 2  Decision                1x1  thresholding on fruit at op 1 (sdp_decide)

Our detector may be now tested. We will print the confusion matrix:

>> sdconfmat(ts.lab,ts*pd)

ans =

 True      | Decisions
 Labels    | fruit  non-fr  | Totals
-------------------------------------
 apple     |   494      6   |   500
 banana    |   494      6   |   500
 lemon     |   489     11   |   500
 outlier   |    25     25   |    50
-------------------------------------
 Totals    |  1502     48   |  1550

In the second step, we want to train the fruit discriminant:

>> tr_clf=tr(:,:,{'apple','banana','lemon'})
1500 by 2 sddata, 3 classes: 'apple'(500) 'banana'(500) 'lemon'(500) 

>> p=sdgauss(tr_clf)
Gaussian model pipeline 2x3  3 classes, 3 components (sdp_normal)

We use a default operating point in this example:

>> pclf=sddecide(p)
sequential pipeline     2x1 'Gaussian model+Decision'
 1  Gaussian model          2x3  3 classes, 3 components (sdp_normal)
 2  Decision                3x1  weighting, 3 classes, 1 ops at op 

With both stages trained, we can construct the classifier cascade using the sdcascade command. It takes the detector pipeline, the name of the decision to "pass through" and the discriminant pipeline:

>> pc=sdcascade(pd,'fruit',pclf)
2-stage cascade pipeline 2x1   (sdp_cascade)

Let's visualize the decisions on the test set and the confusion matrix of the two-stage system:

>> sdscatter(ts,pc)
>> sdconfmat(ts.lab,ts*pc)

ans =

 True      | Decisions
 Labels    | non-fr  apple banana  lemon  | Totals
---------------------------------------------------
 apple     |     2    425     38     35   |   500
 banana    |     5     57    375     63   |   500
 lemon     |    20     57     41    382   |   500
 outlier   |    30     16      1      3   |    50
---------------------------------------------------
 Totals    |    57    555    455    483   |  1550