DATA MINING
Desktop Survival Guide
by Graham Williams

Resources and Further Reading

An issue with support vector machines is that parameter tuning is not altogether easy for users new to them. One computationally expensive approach is to build multiple models with different parameters and choose the one with lowest expected error. This can lead to suboptimal results though, unless quite extensive search is performed. Research has explored using previous performance of different parameter settings to predict their relative performance on new tasks ().

It has been recognised that the task of learning is often difficult from the specific collection of available variables. Artificial intelligence research, and knowledge representation research in particular, often notes how changing a representation can significantly impact on the ability to reason and learn. Kernel learning projects entities into a higher dimensional space where it is found that learning is actually easier. Kernel learning does this by computing the dot product of the data. Different kernels result in different projections, which result in different distances between entities in the higher dimensional space. A support vector machine is a kernel learner.

Kernel methods are like k nearest neighbours, except all data points contribute through a weighted sum where the kernel measures the distance between points. Kernel methods have demonstrated excellent performance on many machine learning and pattern recognition tasks. However, they are sensitive to the choice of kernel, may be intolerant to noise, and can not deal with missing data and data of mixed types.

A kernel is a function which takes two entities ( and and computes a scalar.

kernlab for kernel learning provides ksvm and is more integrated into R so that different kernels can easily be explored. A new class called kernel is introduced, an kernel functions are objects of this class.

A Support Vector Machine (SMV) searches for so called support vectors which are data points that are found to lie at the edge of an area in space which is a boundary from one class of points to another. In the terminology of SVM we talk about the space between regions containing data points in different classes as being the margin between those classes. The support vectors are used to identify a hyperplane (when we are talking about many dimensions in the data, or a line if we were talking about only two dimensional data) that separates the classes. Essentially, the maximum margin between the separable classes is found. An advantage of the method is that the modelling only deals with these support vectors, rather than the whole training dataset, and so the size of the training set is not usually an issue. If the data is not linearly separable, then kernels are used to map the data into higher dimensions so that the classes are linearly separable. Also, Support Vector Machines have been found to perform well on problems that are non-linear, sparse, and high dimensional. A disadvantage is that the algorithm is sensitive to the choice of parameter settings, making it harder to use, and time consuming to identify the best.

A kernel is a function which takes two entities ( and and computes a scalar.

Subsections

Support further development through the purchase of the PDF version of the book.
The PDF version is a formatted comprehensive draft book (with over 800 pages).
Brought to you by Togaware. This page generated: Sunday, 22 August 2010