- 11.1. Detectors
- 11.1.1. Setting the detector operating point via thresholding
- 11.1.2. Detector for the all data
- 11.1.3. Setting the detector operating point via ROC analysis
- 11.1.4. Visualizing detector decisions on the image data
- 11.2. Discriminants
- 11.2.1. Nearest mean classifier
- 11.2.2. Linear classifier assuming normal densities
- 11.2.3. Quadratic classifier assuming normal densities
- 11.2.4. Gaussian mixture models
- 11.2.5. k-NN classifier
- 11.2.5.1. Prototype selection
- 11.2.5.2. Using large data sets
- 11.2.6. Parzen classifier
- 11.2.7. Neural network
- 11.2.8. Naive Bayes classifier
- 11.2.9. Decision tree classifier
- 11.2.10. Support Vector Machine
- 11.2.10.1. Grid search for sigma and C parameters
- 11.2.10.2. Multi-class support vector machines
- 11.2.10.3. Accessing support vectors
- 11.3. Classifier combining
- 11.3.1. Fixed combiners
- 11.4. Hierarchical classifiers
PRSD Studio provides several tools for training detectors, multi-class classifiers (discriminants) and hierarchical classifiers.
11.1. Detectors ↩
A detector is a classifier that focuses only at one class of interest, the target class.
Detectors may be constructed using the sddetector command. It takes a data set, the target class and an untrained model as parameters and returns the pipeline object providing decisions.
pd = sddetector( data, target_class, model )
In the example below a Gaussian detector is constructed for the class 'apple' of the fruit data:
>> load fruit
260 by 2 sddata, 3 classes: 'apple'(100) 'banana'(100) 'stone'(60)
>> pd=sddetector(a,'apple',sdgauss)
1: apple -> apple
2: banana -> non-apple
3: stone -> non-apple
sequential pipeline 2x1 'Gaussian model+Decision'
1 Gaussian model 2x1 one class, 1 component (sdp_normal)
2 Decision 1x1 thresholding on apple at op 32 (sdp_decide)
>> sdscatter(a,pd)

Two approaches may be used to set the operating point (threshold) of sddetector:
One-class scenario, where the threshold is fixed by specifying the fraction of objects rejected from a target class
Two-class situation where the model is trained on a subset of the data and possible thresholds (operating points) are defined based on ROC analysis of the remaining validation set. By default, the operating point is set to minimize mean error between target and non-target classes.
If the user specifies an existing class as the target class, the
sddetector will use the remaining classes as non-targets and set the
operating points by the ROC analysis. Non-existing name of target class
results in building a detector for all data using the one-class
strategy. The reject parameter must then specify the fraction of samples to
be rejected. If used when building detector for an existing class,
sddetector simply fixes the threshold based on the target class samples
only, skipping the ROC analysis.
11.1.1. Setting the detector operating point via thresholding ↩
The threshold is set by fixing the reject fraction of the samples in the target class. In this examples the detector rejects 10% of the banana class:
>> pd=sddetector(a,'banana',sdgauss,'reject',0.1)
1: apple -> non-banana
2: banana -> banana
3: stone -> non-banana
sequential pipeline 2x1 'Gaussian model+Decision'
1 Gaussian model 2x1 one class, 1 component (sdp_normal)
2 Decision 1x1 thresholding on banana at op 1 (sdp_decide)
>> sdscatter(a,pd)

11.1.2. Detector for the all data ↩
A detector can be build for the complete data set. In the example below the target class is named 'all'. Since this class name is not defined in the data set a, all samples are used as target:
>> pd=sddetector(a,'all',sdgauss,'reject',0.1)
sequential pipeline 2x1 'Gaussian model+Decision'
1 Gaussian model 2x1 one class, 1 component (sdp_normal)
2 Decision 1x1 thresholding on all at op 1 (sdp_decide)
>> sdscatter(a,pd)
11.1.3. Setting the detector operating point via ROC analysis ↩
sddetector may returns also the ROC object r with a set of alternative
operating points. We can then change the operating point directly from the
ROC curve plotted with the sddrawroc function.
>> load fruit;
>> [pd,r]=sddetector(a,'apple',sdmixture([],'comp',5,'iter',10))
1: apple -> apple
2: banana -> non-apple
3: stone -> non-apple
[class 'apple' EM:.......... 5 comp]
sequential pipeline 2x1 'Mixture of Gaussians+Decision'
1 Mixture of Gaussians 2x1 one class, 5 components (sdp_normal)
2 Decision 1x1 thresholding on apple at op 34 (sdp_decide)
ROC (52 thr-based op.points, 3 measures), curop: 34
est: 1:err(apple)=0.05, 2:err(non-apple)=0.00, 3:mean-error=0.03
>> sdscatter(a,pd,'roc',r)
The scatter plot shows the corresponding boundary of the two-Gaussians
model trained for the apple class.
11.1.4. Visualizing detector decisions on the image data ↩
sdimage may visualize decisions of any classifier pipeline trained in the
feature space spanned by the image bands. We may also inspect decisions in
an image at different operating points.
We save the hand-painted road labels into data2 data set:
>> data2
412160 by 3 sddata, 2 classes: 'unknown'(399848) 'road'(12312)
Now we may train the road detector. We use a subset of 500 pixels for road sign background classes and train the detector:
>> b=randsubset(data2,500)
1000 by 3 sddata, 2 classes: 'unknown'(500) 'road'(500)
>> [pd,r]=sddetector(b,'road',sdgauss)
1: unknown -> non-road
2: road -> road
1: road -> road
2: non-road -> non-road
sequential pipeline 3x1 'Gaussian model+Decision'
1 Gaussian model 3x1 one class, 1 component (sdp_normal)
2 Decision 1x1 thresholding on road at op 80 (sdp_decide)
ROC (192 thr-based op.points, 3 measures), curop: 80
est: 1:err(road)=0.00, 2:err(non-road)=0.19, 3:mean-error [0.50,0.50]=0.10
We can now visualize detector decisions on another image:
>> im2=imread('roadsign11.bmp');
>> sdimage(im2,pd)
ans =
3
sdimage may visualize both the decisions and the ROC of the detector:
>> sdimage(im2,pd,'roc',r)
ans =
1
This allows us to interactively analyze detector decisions at different operating points.
11.2. Discriminants ↩
PRSD Studio brings number of tools for designing two- or multi-class discriminants.
11.2.1. Nearest mean classifier ↩
sdnmean implements the nearest mean classifier. It uses normal model with
identical covariance matrix for all classes. Its output is a similarity.
>> load fruit
260 by 2 sddata, 3 classes: 'apple'(100) 'banana'(100) 'stone'(60)
>> p=sdnmean(a)
sequential pipeline 2x3 'Nearest mean'
1 sdp_normal 2x3 3 classes, 3 components
>> sdscatter(a,sddecide(p))

11.2.2. Linear classifier assuming normal densities ↩
sdlinear is a linear discriminant based on assumption of normal
densities. Covariance matrix is averaged using apparent class priors.
>> load fruit;
>> a=a(:,:,[1 2])
260 by 2 sddata, 2 classes: 'apple'(100) 'banana'(100)
>> p=sdlinear(a)
sequential pipeline 2x2 'Linear discriminant'
1 Gauss eq.cov. 2x2 2 classes, 2 components (sdp_normal)
2 Output normalization 2x2 (sdp_norm)
Confusion matrix estimated on the training set at default operating point:
>> sdconfmat(a.lab,a*sddecide(p),'norm')
ans =
True | Decisions
Labels | apple banana | Totals
-------------------------------------
apple | 0.890 0.110 | 1.00
banana | 0.150 0.850 | 1.00
-------------------------------------
In order to use specific priors, provide them using the priors option:
>> p=sdlinear(a,'priors',[0.8 0.2])
sequential pipeline 2x2 'Linear discriminant'
1 Gauss eq.cov. 2x2 2 classes, 2 components (sdp_normal)
2 Output normalization 2x2 (sdp_norm)
>> sdconfmat(a.lab,a*sddecide(p),'norm')
ans =
True | Decisions
Labels | apple banana | Totals
-------------------------------------
apple | 0.950 0.050 | 1.00
banana | 0.200 0.800 | 1.00
-------------------------------------
Note that by increasing the apple class prior we lower the apple error. However, the banana error will increase accordingly.
11.2.3. Quadratic classifier assuming normal densities ↩
sdquadratic implements quadratic discriminant based on assumption of
normal densities. Specific covariance matrix is estimated for each class.
>> p=sdquadratic(a)
sequential pipeline 2x2 'Quadratic discr.'
1 Gauss full cov. 2x2 2 classes, 2 components (sdp_normal)
2 Output normalization 2x2 (sdp_norm)
>> sdscatter(a,sddecide(p))

11.2.4. Gaussian mixture models ↩
sdmixture implements training of Gaussian mixture models using EM
algorithm. By default, sdmixture estimates the number of components
automatically from the data using the approach proposed in (J. Grim,
J. Novovicova, P. Pudil, P. Somol, F.J. Ferri, Initializing Normal Mixtures
of Densities, Prof.of ICPR 1998.).
>> load fruit
260 by 2 sddata, 3 classes: 'apple'(100) 'banana'(100) 'stone'(60)
>> p=sdmixture(a)
[class 'apple' initialization:....................... 3 clusters
EM:.............................. 3 comp]
[class 'banana' initialization:....................... 3 clusters
EM:.............................. 3 comp]
[class 'stone' initialization:....................... 1 cluster
EM:.............................. 1 comp]
Mixture of Gaussians pipeline 2x3 3 classes, 7 components (sdp_normal)
>> sdscatter(a,p) % visualize the output on 1st class ('apple')
We can visualize the mixture decisions by adding default operating point using sddecide:
The number of mixture components may be also specified manually using the
comp option. If a scalar number is specified, it is used for each
class. Alternatively, we may provide the vector with number of components
for each class (in the lab.list order):
>> p2=sdmixture(a,'comp',[4 4 1])
[class 'apple' EM:.............................. 4 comp]
[class 'banana' EM:.............................. 4 comp]
[class 'stone' EM:.............................. 1 comp]
Mixture of Gaussians pipeline 2x3 3 classes, 9 components (sdp_normal)
By default, the EM algorithm is terminated after 30 iterations. The
iteration count may be changed using the iter option. If iter is set
to [], the EM algorithm stops when the likelihood difference becomes
lower that delta (by default 1e-4).
The EM algorithm may be initialized using init option by providing a
sdp_normal pipeline. In this example, we estimate the mixture starting
from a simple two-component Gaussian model. Note that the provided model
initializes each per-class EM algorithm. Therefore, we eventually obtain a
six-component mixture model.
>> pinit=sdp_normal(sddata([0 0; 1 1]), {eye(2) eye(2)},[0.5 0.5])
Gauss pipeline 2x1 one class, 2 components (sdp_normal)
>> p3=sdmixture(a,'initmodel',pinit)
[class 'apple' EM:.............................. 2 comp]
[class 'banana' EM:.............................. 2 comp]
[class 'stone' EM:.............................. 2 comp]
Mixture of Gaussians pipeline 2x3 3 classes, 6 components (sdp_normal)
11.2.5. k-NN classifier ↩
k-NN (k-th nearest neighbor) is a non-parametric classifier. This means that instead of building a model from training examples it uses them directly as evidence for computing its output on new observations. The stored training examples are called "prototypes".
By default sdknn implements the first nearest neighbor rule (k=1) using all
provided training examples as prototypes:
>> load fruit
260 by 2 sddata, 3 classes: 'apple'(100) 'banana'(100) 'stone'(60)
>> p=sdknn(a)
1-NN pipeline 2x3 3 classes, 260 prototypes (sdp_1nn)
>> sdscatter(a,sddecide(p))

Using larger neighborhoods (k>1), the nearest neighbor classifier becomes
more robust against noise in the areas of overlap. sdknn implements two
approaches:
- kappa - computes the distance to the k-th neighbor within each class. This is used by default.
- classfrac - returns the fraction of samples of each class between the
kprototypes closest to the new observation.
The kappa method is used by default for any k>1:
>> p1=sdknn(a,'k',10) % by default 'kappa'
10-NN classifier (dist) pipeline 2x3 (sdp_stack)
>> sdscatter(a,sddecide(p1))

Note that the decision boundary becomes much more stable between the overlapping classes.
The nearest neighbor computing class fractions is created by setting the method to classfrac:
>> p2=sdknn(a,'k',10,'method','classfrac')
10-NN pipeline 2x3 k=10, 260 prototypes (sdp_knnmc)
Although both methods may look similar at the first sight, different way of computing the per-class output has important practical consequences. The kappa method returns distance while the classfrac the similarity (fraction):
>> getoutput(p1)
ans =
class distance
>> getoutput(p2)
ans =
class similarity
Because classfrac is computing class fractions between k neighbors, it
provides a discriminant applicable to two or more classes. This means
that it splits the feature space into open sub-spaces. It separates classes but
cannot be used to detect one class protecting it from all directions.
The kappa method, on the other hand, computes distance to k-th nearest neighbor for each class separately.
Finally, let us compare the outputs of both k-NN methods on a small test set:
>> ts=sddata(gendatf([3 3 3]))
'Fruit set' 9 by 2 sddata, 3 classes: 'apple'(3) 'banana'(3) 'stone'(3)
>> out1=ts*p1 % kappa
'Fruit set' 9 by 2 sddata, 3 classes: 'apple'(3) 'banana'(3) 'stone'(3)
>> out2=ts*p2 % classfrac
'Fruit set' 9 by 2 sddata, 3 classes: 'apple'(3) 'banana'(3) 'stone'(3)
>> [+out1 +out2]
ans =
1.4888 44.7181 115.8823 0.8462 0.0769 0.0769
1.9581 26.9135 71.0346 0.8462 0.0769 0.0769
1.4120 40.8463 109.4723 0.8462 0.0769 0.0769
48.1386 3.0872 5.4234 0.0769 0.6154 0.3077
26.9787 6.6342 6.6689 0.0769 0.5385 0.3846
26.4837 2.0395 14.6985 0.0769 0.6154 0.3077
40.3134 34.9477 4.5834 0.0769 0.0769 0.8462
49.8063 20.2838 2.3588 0.0769 0.0769 0.8462
23.6764 35.0027 12.8724 0.2308 0.0769 0.6923
In the first three columns, we can see distances returned by kappa k-NN. In the last three columns, we get the fractions of classfrac. Note that the later ones attain only handful of fixed values (fractions). This results in sparse ROC plots for the classfrac classifier.
11.2.5.1. Prototype selection ↩
By default sdknn uses all training examples as prototypes. It may,
however, also select a smaller subset. Two prototype selection strategies
are used:
- random - selecting a specified number of prototypes per class
- kcentres - using PRTools
kcentresclustering algorithm to systematically select prototypes.
In this example, we randomly select 40 prototypes per class:
>> p=sdknn(a,'k',5,'proto',40)
5-NN classifier (dist) pipeline 2x3 (sdp_stack)
Number of prototypes may be adjusted per class. Simpler classes may be described by smaller prototype sets. This increases execution speed of the k-NN classifier:
>> p=sdknn(a,'k',5,'proto',[40 40 10])
5-NN classifier (dist) pipeline 2x3 (sdp_stack)
To select prototypes using kcentres, use protosel option:
>> p=sdknn(a,'k',5,'proto',40,'protosel','kcentres')
5-NN classifier (dist) pipeline 2x3 (sdp_stack)
k-centres algorithm computes dissimilarity matrix. For large data sets,
sdknn uses only randomly drawn 2000 examples to compute k-centres.
11.2.5.2. Using large data sets ↩
sdknn can handle large data sets with tens of thousands of samples. Here
we train 10-NN classifier on 100 000 samples and test its execution speed on
1000 test objects:
>> a=sddata(gendatf(100000))
'Fruit set' 100000 by 2 sddata, 3 classes: 'apple'(33333) 'banana'(33333) 'stone'(33334)
>> p=sdknn(a,'k',10)
10-NN classifier (dist) pipeline 2x3 (sdp_stack)
>> ts=sddata(gendatf(1000))
'Fruit set' 100000 by 2 sddata, 3 classes: 'apple'(333) 'banana'(333) 'stone'(334)
>> tic; out=ts*p; toc
Elapsed time is 1.338342 seconds. % using 2.33 GHz laptop
11.2.6. Parzen classifier ↩
Parzen classifier estimates probability density for each class using a non-parametric approach based on stored training examples. When computing output for a new observation, the contribution of each training example is integrated. The contribution is modeled by a kernel function and is influenced by the smoothing parameter (kernel width).
By default, sdparzen trains a Parzen classifier with Laplace kernel
function which is less computationally demanding than frequently-adopted
Gaussian kernel. Smoothing parameter is optimized using EM algorithm
optimizing cross-validated log-likelihood.
>> load fruit
260 by 2 sddata, 3 classes: 'apple'(100) 'banana'(100) 'stone'(60)
>> p=sdparzen(a)
.....
Parzen pipeline 2x3 3 classes, 260 prototypes (sdp_parzen)
>> sdscatter(a,sddecide(p))

The default Parzen classifier uses scalar smoothing i.e. equal kernel width
for each feature. Smoothing parameter is stored in h field of parzen pipeline:
>> p{1}
ans =
kernel: 'laplace'
h: 0.6482
proto: [260x2 double]
prior: [0.3846 0.3846 0.2308]
Smoothing may be fixed manually. Here we use very small kernel with:
>> p=sdparzen(a,'h',0.04)
Parzen pipeline 2x3 3 classes, 260 prototypes (sdp_parzen)
>> sdscatter(a,sddecide(p))

Note that the decision boundary becomes very complicated emphasizing very local changes of the class distributions.
Smoothing parameter may be also estimated for each dimension, specifying h as vector:
>> p=sdparzen(a,'h','vector')
.........
Parzen pipeline 2x3 3 classes, 260 prototypes (sdp_parzen)
>> p{1}
ans =
kernel: 'laplace'
h: [0.5782 0.7266]
proto: [260x2 double]
prior: [0.3846 0.3846 0.2308]
Vector smoothing requires extra multiplication for each dimension of each training sample. Alternative strategy is to scale the data to unit variance so that scalar smoothing is sufficient.
11.2.7. Neural network ↩
Feed-forward neural network may be trained using sdneural function. By
default, 10 hidden units is used and the optimization runs for 1000
iterations (epochs):
>> load fruit
260 by 2 sddata, 3 classes: 'apple'(100) 'banana'(100) 'stone'(60)
>> p=sdneural(a)
epochs (*100):..........
Neural network pipeline 2x3 (sdp_neural)
>> sdscatter(a,sddecide(p))

The provided data set is split into training and validation subsets (75%/25%
by default). The training subset is used in optimization and the validation
subset to estimate the generalization error. The validation fraction may
be changed using valfrac option. Eventually, sdneural returns the
network with lowest mean square error (MSE) on the validation set. Thanks
to this approach the sdneural does not overfit training data when trained
for a large number of epochs.
11.2.8. Naive Bayes classifier ↩
Naive Bayes classifier is implemented by the sdnbayes function. For each
feature, it estimates a class-conditional distribution using a
histogram. Assuming independence of features, the per-class output is
computed as a product of per-feature class conditional densities.
sdnbayes estimates the number of histogram bins from the data using
non-parametric density estimation approach.
>> a=sddata(gendatf(3000))
3000 by 2 sddata, 3 classes: 'apple'(1000) 'banana'(1000) 'stone'(1000)
>> p=sdnbayes(a)
class 'apple':.. class 'banana':.. class 'stone':..
Naive Bayes pipeline 2x3 (sdp_nbayes)
>> sdscatter(a,sddecide(p))

The number of histogram bins may be fixed manually using the bins option:
>> p=sdnbayes(a,'bins',20)
class 'apple': class 'banana': class 'stone':
Naive Bayes pipeline 2x3 (sdp_nbayes)
11.2.9. Decision tree classifier ↩
sdtree implements a decision tree algorithm training in a feature per
feature manner. It uses validation set to find the most generalizing tree.
>> a=sddata(gendatl(5000))
Lithuanian Classes, 10000 by 2 sddata, 2 classes: '1'(5000) '2'(5000)
>> p=sdtree(a)
Decision tree pipeline 2x2 (sdp_tree)
>> pd=sddecide(p) % we add the default operating point
sequential pipeline 2x1 'Decision tree+Decision'
1 Decision tree 2x2 (sdp_tree)
2 Decision 2x1 weighting, 2 classes, 1 ops at op 1 (sdp_decide)
Let's visualize the decisions at default operating point on a subset of the training set:
>> sdscatter(gendat(a,500),pd)

We may estimate the mean test error using sdtest:
>> b=sddata(gendatl(5000))
'Lithuanian Classes' 10000 by 2 sddata, 2 classes: '1'(5000) '2'(5000)
>> sdtest(b,pd)
ans =
0.0346
Execution speed on 100 000 samples is:
>> data=rand(100000,2);
>> tic; dec=data.*pd; toc
Elapsed time is 0.014774 seconds.
Given enough training examples, decision tree may yield a high performance classifier which is very fast in execution.
11.2.10. Support Vector Machine ↩
PRSD Studio sdsvc command trains a support vector machine classifier
using libSVM library. At the moment, sdsvc supports only RBF kernel. The
trained support vector machines are executed through the libPRSD execution
library.
By default, sigma and C parameters are selected automatically based on grid search minimizing the mean error on a validation set (25% of the training set).
>> b
'Fruit set' 200 by 2 sddata, 2 classes: 'apple'(100) 'banana'(100)
>> p=sdsvc(b)
....................sigma=1.85511 C=54.6 err=0.000 SVs=15
sequential pipeline 2x1 'Scaling+Support Vector Machine'
1 Scaling 2x2 (sdp_affine)
2 Support Vector Machine 2x1 (sdp_svc)
sdsvc displays the sigma and C parameters together with the error on
validation set and number of support vectors. Keep in mind that the number
of support vectors is important indicator of well-trained support vector
machine. Large number of SVs may mean that we use wrong sigma or C or that
the libsvm optimizer did not find good solution.
Note that sdsvc performs scaling of input data by default. This may be
switched off using the 'noscale' option.
The two-class support vector machine yields one soft output which needs to
be thresholded in order to make a decision. We may add a default operating
point with zero threshold using sddecide:
>> pd=sddecide(p)
sequential pipeline 2x1 'Scaling+Support Vector Machine+Decision'
1 Scaling 2x2 (sdp_affine)
2 Support Vector Machine 2x1 (sdp_svc)
3 Decision 1x1 thresholding on apple at op 1 (sdp_decide)
>> sdscatter(b,pd)

11.2.10.1. Grid search for sigma and C parameters ↩
sdsvc allows us to specify sigma and C parameters explicitly:
>> p=sdsvc(b,'sigma',1.5,'C',10)
SVs=21
sequential pipeline 2x1 'Scaling+Support Vector Machine'
1 Scaling 2x2 (sdp_affine)
2 Support Vector Machine 2x1 (sdp_svc)
We may also specify the vector of sigmas and Cs. Second output of sdsvc
is a structure with estimated errors and numbers of support vectors:
>> [p,E]=sdsvc(b,'sigma',0.1:0.1:5,'C',[0.01 0.1 1 3 5 10 20])
..................................................sigma=1.10000 C=20 err=0.020 SVs=21
sequential pipeline 2x1 'Scaling+Support Vector Machine'
1 Scaling 2x2 (sdp_affine)
2 Support Vector Machine 2x1 (sdp_svc)
E =
sigmas: [1x50 double]
Cs: [0.0100 0.1000 1 3 5 10 20]
err: [50x7 double]
svs: [50x7 double]
We may visualize the errors and number of suport vectors in a 3D surface plot:
>> figure; surf(E.Cs,E.sigmas,E.err)
>> xlabel('C'); ylabel('sigma'); zlabel('error')
>> figure; surf(E.Cs,E.sigmas,E.svs)
>> xlabel('C'); ylabel('sigma'); zlabel('SVs')

11.2.10.2. Multi-class support vector machines ↩
For multi-class data sets, sdsvc trains one support vector machine per
class in one-against-all manner optimizing the parameters specifically for
each sub-problem.
>> a
'Fruit set' 260 by 2 sddata, 3 classes: 'apple'(100) 'banana'(100) 'stone'(60)
>> p=sdsvc(a)
one-against-all: ['apple' ....................sigma=0.90884 C=0.695 err=0.062 SVs=58]
['banana' ....................sigma=0.98847 C=0.336 err=0.113 SVs=96]
['stone' ....................sigma=1.37335 C=1.44 err=0.043 SVs=45]
sequential pipeline 2x3 'Scaling+Support Vector Machine+Output normalization'
1 Scaling 2x2 (sdp_affine)
2 Support Vector Machine 2x3 (sdp_stack)
3 Output normalization 3x3 (sdp_norm)
>> pd=sddecide(p)
sequential pipeline 2x1 'Scaling+Support Vector Machine+Output normalization+Decision'
1 Scaling 2x2 (sdp_affine)
2 Support Vector Machine 2x3 (sdp_stack)
3 Output normalization 3x3 (sdp_norm)
4 Decision 3x1 weighting, 3 classes, 1 ops at op 1 (sdp_decide)
>> sdscatter(a,pd)

Grid-search estimated errors in multi-class problems are returned in a cell
array with one E structure per class:
>> [p,E]=sdsvc(a,'sigma',0.1:0.1:5,'C',[0.01 0.1 1 3 5 10 20])
one-against-all: ['apple' ..................................................sigma=0.80000 C=1 err=0.050 SVs=51]
['banana' ..................................................sigma=0.80000 C=0.1 err=0.125 SVs=150]
['stone' ..................................................sigma=1.10000 C=10 err=0.010 SVs=33]
sequential pipeline 2x3 'Scaling+Support Vector Machine+Output normalization'
1 Scaling 2x2 (sdp_affine)
2 Support Vector Machine 2x3 (sdp_stack)
3 Output normalization 3x3 (sdp_norm)
E =
[1x1 struct] [1x1 struct] [1x1 struct]
>> E{2}
ans =
sigmas: [1x50 double]
Cs: [0.0100 0.1000 1 3 5 10 20]
err: [50x7 double]
svs: [50x7 double]
11.2.10.3. Accessing support vectors ↩
Support vector objects are stored in the proto sddata set
inside the trained sdsvc pipeline.
>> b
'Fruit set' 200 by 2 sddata, 2 classes: 'apple'(100) 'banana'(100)
>> p2=sdsvc(b)
....................sigma=0.92990 C=26.4 err=0.000 SVs=19
sequential pipeline 2x1 'standardization+Support Vector Machine'
1 standardization 2x2 (sdp_affine)
2 Support Vector Machine 2x1 (sdp_svc)
In case of this two-class SVC, we look for support vectors in the second step of the resulting pipeline (the first step performs data scaling).
>> p2{2}
ans =
type: 'rbf'
par: 0.9299
proto: [19x2x2 sddata]
weights: [19x1 double]
offset: -0.0757
mean: [0 0]
>> p2{2}.proto
19 by 2 sddata, 2 classes: 'apple'(9) 'banana'(10)
Multi-class SVC contains for each one-against-others problem one SVC model in the stacked pipeline.
>> a
'Fruit set' 260 by 2 sddata, 3 classes: 'apple'(100) 'banana'(100) 'stone'(60)
>> p=sdsvc(a)
one-against-all: ['apple' ....................sigma=0.97084 C=26.4 err=0.025 SVs=21]
['banana' ....................sigma=0.93841 C=0.336 err=0.050 SVs=96]
['stone' ....................sigma=0.94301 C=12.7 err=0.000 SVs=36]
sequential pipeline 2x3 'standardization+Support Vector Machine+Output normalization'
1 standardization 2x2 (sdp_affine)
2 Support Vector Machine 2x3 (sdp_stack)
3 Output normalization 3x3 (sdp_norm)
>> p{2}
stacked pipeline 2x3 '++'
1 Support Vector Machine 2x1 (pipeline)
2 Support Vector Machine 2x1 (pipeline)
3 Support Vector Machine 2x1 (pipeline)
>> p{2}{1}
Support Vector Machine pipeline 2x1 (sdp_svc)
>> p{2}{1}{1}
ans =
type: 'rbf'
par: 0.9708
proto: [21x2x2 sddata]
weights: [21x1 double]
offset: 0.6474
mean: [0 0]
>> p{2}{1}{1}.proto
21 by 2 sddata, 2 classes: 'apple'(8) 'others'(13)
In order to access indices of the support vectors in the original data set,
use the original property of the proto data set.
>> proto=p{2}{1}{1}.proto
21 by 2 sddata, 2 classes: 'apple'(8) 'others'(13)
>> proto'
21 by 2 sddata, 2 classes: 'apple'(8) 'others'(13)
sample props: 'lab'->'class' 'class'(L) 'original'(N)
feature props: 'featlab'->'featname' 'featname'(L)
data props: 'data'(N)
>> proto(1:3).original
ans =
14
21
22
The first three support vectors are:
>> +proto(1:3)
ans =
0.5759 -0.6914
1.0284 1.1254
1.1145 0.1547
Support vectors in the original (unscaled) feature space are:
>> +b( proto(1:3).original )
ans =
0.1362 -3.2968
2.2507 4.8269
2.6530 0.4864
Scaling with the first pipeline step, we receive identical numbers:
>> +b( proto(1:3).original ) * p(1)
ans =
0.5759 -0.6914
1.0284 1.1254
1.1145 0.1547
11.3. Classifier combining ↩
PRSD Studio supports both major approaches to classifier combining, namely fixed and trained combiners. Fixed combiners are rules chosen on the basis of our assumptions on the problem. For example, if we know that classifiers trained on different data representations are independent, we might choose product rule. The second type of classifier combiner is trained. Instead of assuming fusion strategy, we learn it from examples. Trained combiners simply use per-class outputs of base classifiers as new features.
11.3.1. Fixed combiners ↩
In order to build fixed combiner, we need to train the base classifiers, stack them into a single pipeline and add the fixed combiner stem.
>> load fruit
260 by 2 sddata, 3 classes: 'apple'(100) 'banana'(100) 'stone'(60)
>> p1=sdquadratic(a)
sequential pipeline 2x3 'Quadratic discr.'
1 Gauss full cov. 2x3 3 classes, 3 components (sdp_normal)
2 Output normalization 3x3 (sdp_norm)
>> p2=sdparzen(a)
....
Parzen pipeline 2x3 3 classes, 260 prototypes (sdp_parzen)
>> P=sdp_stack({p1,p2})
stack pipeline 2x6 (sdp_stack)
Note that the stacked pipeline has two inputs (2D feature space) and 6
outputs corresponding to 3 per-class outputs for the p1 and 3 per-class
outputs for p2.
Fixed combiner using mean rule is constructed by supplying also the number of classes, number of classifiers and class names:
>> pcomb=sdp_combine('mean',3,2,a.lab)
sequential pipeline 6x3 ''
1 sdp_combine 6x3 Fixed combiner
Let us now compose the full pipeline delivering decisions of the fixed combiner at default operating point:
>> Pfull=[P pcomb sddecide(pcomb)]
sequential pipeline 2x1 ''
1 sdp_stack 2x6
2 sdp_combine 6x3 Fixed combiner
3 sdp_decide 3x1 Weight-based decision (3 classes)
>> sdscatter(a,Pfull)

11.4. Hierarchical classifiers ↩
PRSD Studio provides tools for designing hierarchies of classifiers, based on apriori decision-level rules. Let us, for example, consider a problem separating 'apple', 'bananas' and 'lemons' on the conveyor belt. We want to build a detector separating all fruit examples from outliers on the conveyor belt (rocks, dirt, mice,...). Only the samples classified as "fruit" will pass on to the next stage where different types of fruit get separated.
% preparing simple artificial data set:
>> rand('state',42)
>> data=[gauss(1000,[-2 3],[1 0.4; 0.1 1]); gauss(1000,[0 2],[1 0; 0 1]); gauss(1000,[1 4],[2 0.4; 0.1 2]); gauss(100, [-3 4],[6 0; 0 6])];
>> lab=sdlab({'apple','banana','lemon','outlier'},[1000 1000 1000 100])
sdlab with 3100 entries, 4 groups: 'apple'(1000) 'banana'(1000) 'lemon'(1000) 'outlier'(100)
>> a=sddata(+data,lab)
3100 by 2 sddata, 4 classes: 'apple'(1000) 'banana'(1000) 'lemon'(1000) 'outlier'(100)
We will split the data into training and testing parts:
>> [tr,ts]=randsubset(a,0.5)
1550 by 2 sddata, 4 classes: 'apple'(500) 'banana'(500) 'lemon'(500) 'outlier'(50)
1550 by 2 sddata, 4 classes: 'apple'(500) 'banana'(500) 'lemon'(500) 'outlier'(50)
To train the detector, we need to re-label the data so it reflects our fruit/outlier problem:
>> tr_det=sdrelab(tr,{'~outlier','fruit'})
1: apple -> fruit
2: banana -> fruit
3: lemon -> fruit
4: outlier -> outlier
1550 by 2 sddata, 2 classes: 'outlier'(50) 'fruit'(1500)
Now, we train a detector using a Gaussian mixture with three components. We fix the operating point using the reject option rejecting 1% of fruit samples:
>> pd=sddetector(tr_det,'fruit',sdmixture([],'n',3,'iter',20),'reject',0.01)
1: outlier -> non-fruit
2: fruit -> fruit
[class 'fruit' EM:.................... 3 comp]
sequential pipeline 2x1 'Mixture of Gaussians+Decision'
1 Mixture of Gaussians 2x1 one class, 3 components (sdp_normal)
2 Decision 1x1 thresholding on fruit at op 1 (sdp_decide)
Our detector may be now tested. We will print the confusion matrix:
>> sdconfmat(ts.lab,ts*pd)
ans =
True | Decisions
Labels | fruit non-fr | Totals
-------------------------------------
apple | 494 6 | 500
banana | 494 6 | 500
lemon | 489 11 | 500
outlier | 25 25 | 50
-------------------------------------
Totals | 1502 48 | 1550
In the second step, we want to train the fruit discriminant:
>> tr_clf=tr(:,:,{'apple','banana','lemon'})
1500 by 2 sddata, 3 classes: 'apple'(500) 'banana'(500) 'lemon'(500)
>> p=sdgauss(tr_clf)
Gaussian model pipeline 2x3 3 classes, 3 components (sdp_normal)
We use a default operating point in this example:
>> pclf=sddecide(p)
sequential pipeline 2x1 'Gaussian model+Decision'
1 Gaussian model 2x3 3 classes, 3 components (sdp_normal)
2 Decision 3x1 weighting, 3 classes, 1 ops at op
With both stages trained, we can construct the classifier cascade using the sdcascade command. It takes the detector pipeline, the name of the decision to "pass through" and the discriminant pipeline:
>> pc=sdcascade(pd,'fruit',pclf)
2-stage cascade pipeline 2x1 (sdp_cascade)
Let's visualize the decisions on the test set and the confusion matrix of the two-stage system:
>> sdscatter(ts,pc)
>> sdconfmat(ts.lab,ts*pc)
ans =
True | Decisions
Labels | non-fr apple banana lemon | Totals
---------------------------------------------------
apple | 2 425 38 35 | 500
banana | 5 57 375 63 | 500
lemon | 20 57 41 382 | 500
outlier | 30 16 1 3 | 50
---------------------------------------------------
Totals | 57 555 455 483 | 1550
