Local Support Vector Machine for Noise ReductionSW Author: Nicola Segata <segata@disi.unitn.it> Version: 0.9 |
Local Support Vector Machine Noise Reduction (LSVM-nr) [Segata, Blanzieri, Delany, Cunningham, 2008 ] is a novel approach to noise reduction based on local Support Vector Machines (LSVM) [Blanzieri, Melgani, 2006, 2008 ] which brings the benefits of maximal margin classifiers to bear on noise reduction. This provides a more robust alternative to the majority rule on which almost all the existing noise reduction techniques are based. Roughly speaking, for each training sample an SVM is trained on its neighbourhood and if the SVM classification for the central sample disagrees with its actual class there is evidence in favour of removing it from the training set. There is empirical evidence of improved generalization accuracy of nearest neighbor based classifiers in a number of real datasets when using training data edited with LSVM-nr. In particular some experiments suggest that LSVM-nr is particularly effective in the spam filtering application domain, for datasets affected by Gaussian noise and in presence of uneven class densities.
[Segata, Blanzieri, Delany, Cunningham, 2008] |
N. Segata, E. Blanzieri, S.J. Delany, P. Cunningham, Noise Reduction for Instance-Based Learning with a Local Maximal Margin Approach. Technical Report. |
LSVM-nr is based on a probabilistic variant of Local Support Vector Machines (LSVM). The main references fo LSVM are:
[Blanzieri, Melgani, 2006] |
E. Blanzieri, F. Melgani, An Adaptive SVM Nearest Neighbor Classifier for Remotely Sensed Imagery. IEEE International Conference on Geoscience and Remote Sensing Symposium, 2006. pp. 3931-3934. |
[Blanzieri, Melgani, 2008] |
E. Blanzieri, F. Melgani, Nearest Neighbor Classification of Remote Sensing Images With the Maximal Margin Principle. IEEE Transactions on Geoscience and Remote Sensing. June 2008 Volume: 46, Issue: 6 On page(s): 1804-1811 |
[Segata, Blanzieri, 2008] |
N. Segata, E. Blanzieri, Empirical Assessment of Classification Accuracy of Local SVM. DISI Technical Report. |
LSVM-nr is called with the following parameters:
LSVM-nr [options] input_unedited_set_file_name output_edited_file_name
Available options are:
-k k: set the LSVM neighborhood size (default 1/10 input set cardinality) -l l: set LSVM probabilistic output threshold for noise removal (default 0.5) -u u: set LSVM probabilistic output threshold for redundancy reduction (default 1.0, i.e. no redundancy reduction) -t kernel_type : set type of kernel function (default 0) 0 -- linear: u'*v 1 -- polynomial: (gamma*u'*v + coef0)^degree 2 -- radial basis function: exp(-gamma*|u-v|^2) -d degree : set degree in kernel function (default 2) -g gamma : set gamma in kernel function (default 1) -r coef0 : set coef0 in kernel function (default 0) -c cost : set the parameter C of C-SVC, epsilon-SVR, and nu-SVR (default 1) -m cachesize : set cache memory size in MB (default 100) -e epsilon : set tolerance of termination criterion (default 0.001) -h shrinking: whether to use the shrinking heuristics, 0 or 1 (default 1) -wi weight: set the parameter C of class i to weight*C, for C-SVC (default 1)
LSVM-cv [options] training_set_file_name
Available options are:
-k k: neighborhood size (default 1/10 input set cardinality) -t kernel_type : set type of kernel function (default 0) 0 -- linear: u'*v 1 -- polynomial: (gamma*u'*v + coef0)^degree 2 -- radial basis function: exp(-gamma*|u-v|^2) -d degree : set degree in kernel function (default 2) -g gamma : set gamma in kernel function (default 1) -r coef0 : set coef0 in kernel function (default 0) -c cost : set the parameter C of C-SVC, epsilon-SVR, and nu-SVR (default 1) -m cachesize : set cache memory size in MB (default 100) -e epsilon : set tolerance of termination criterion (default 0.001) -h shrinking: whether to use the shrinking heuristics, 0 or 1 (default 1) -wi weight: set the parameter C of class i to weight*C, for C-SVC (default 1) -f folds: number of folds of cross validation (defaoult 10)
Last modified October 6, 2008 by Nicola Segata <segata@disi.unitn.it>