Classification problems are often solved using supervised learning algorithms such as Random Forest Classifier, Support Vector Machine, Logistic Regressor (for binary class classification) etc.
A specific type of binary classification problem with single class training examples is called One-Class Classification (OCC).One-Class Classification is solved using an unsupervised or semi-supervised learning algorithm such as One-Class Support Vector Machines (1-SVM), Support Vector Data Description (SVDD) etc. One of the popular examples of One-Class Classification is Anamoly Detection (AD) i.e., outlier detection and novelty detection.
One Class Classification
One Class Classification (OCC) aims to differentiate samples of one particular class by learning from single class samples during training. It is one of the most commonly used approaches to solve Anamoly Detection (AD), a subfield of machine learning that deals with identifying anomalous cases which the model has never seen before. OCC is also called unary classification, class-modelling.
Overview of Support Vector Machines
Support Vector Machines (SVMs) are one of the most robust statistical algorithms for classification and regression problem statements. SVMs (aka Support Vector Networks) are commonly used in classification problem statements due to their broader generalizability of unseen data.
To understand SVMs more intuitively, let’s consider a
dataset of positive (green balls), and negative examples (blue squares).
As shown in the below figure, the aim is to draw the best fit line that
would separate positive examples from negative examples but unlike
linear classifiers, the best fit line drawn by SVM guarantees that the
distance between the extreme points of both the classes towards the best
fit line (or median) is almost equal and maximum. This idea of forming a
gutter or widest street through learning makes SVMs robust during
testing on unseen data. The SVM is already implemented on SK-learn
directly from libsvm. The SVC can be imported to apply to any
classification problem.
The function that is maximized to ensure the widest street is 2/|w| where w is a vector of random weights so that the function maximizes the street between support vectors. Maximizing 2/|w| is similar to minimizing the function 1/2*(|w|^2). In order to find the extremum of a function (1/2*(|w|^2)) with constraints i.e., there is a penalty if the function misclassifies any samples, Lagranges multiplier is applied. Applying Lagranges multiplier gives raise to a complex equation, which upon derivating w.r.t w gives the following relation
where w is a vector of weights, alpha is the Lagrange multiplier, y denotes either +1 or -1 i.e., class of the sample and x denotes the samples from data.
One-Class Support Vector Machines
One-Class SVM is an unsupervised learning technique to learn the ability to differentiate the test samples of a particular class from other classes. 1-SVM is one of the most convenient methods to approach OCC problem statements including AD. 1-SVM works on the basic idea of minimizing the hypersphere of the single class of examples in training data and considers all the other samples outside the hypersphere to be outliers or out of training data distribution. The figure below shows the image that demonstrates the hypersphere formed by 1-SVM to learn the ability to classify out of training distribution data based on the hypersphere.
The mathematical expression to compute a hypersphere with centre c and radius r is
The expression above tries to minimize the radius of a hypersphere. However, the above formulation is very restrictive to outliers so, the more flexible formulation to tolerate outliers to an extent is given by
Here function phi is the hypersphere transformation of x samples. The figure below shows how the formulation of a hypersphere forms a hypersphere by minimizing the radius r, centre c.
1-SVM can be used for both kinds of anomaly detection applications i.e., outlier detection and novelty detection.
Implementing :
Implementing 1-SVM is made very easy through SK-learn. The SK-learn implements the 1-SVM from libsvm. The SK-learn provides a class known as ‘OneClassSVM’ that internally implements the mathematical modelling of minimizing the hypersphere through training from data samples. To understand more about the mathematical reasoning behind the function that minimizes the hypersphere refer to the page.
In the below example, we train a One-Class SVM using random positive integers from 1 to 10. Further, the model is tested on a set of random positive and negative integers, the model successfully classifies the integers between positive and negative classes or samples.
Note: The One-Class SVM model treats the positive integers to be class +1 and negative integers to be -1.
The samples in the training dataset are always considered positive samples by One-Class SVM.