What is Support Vector Machines
Support Vector Machines (SVM) is a supervised machine learning algorithm used for classification and regression tasks. Its primary objective is to find a hyperplane in a high-dimensional space that effectively separates data points belonging to different classes.
SVMs are capable of dealing with quite complex problems where models such as logistic regression typically fail. SVMs have been extensively used for solving complex classification problems such as image recognition, voice detection etc.
Here are the key concepts and features of Support Vector Machines:
- Hyperplane: In a two-dimensional space, a hyperplane is a simple line that separates the data into two classes. In higher-dimensional spaces, a hyperplane becomes a decision boundary that separates data into different classes.
Image Source: www.datacamp.com
- Support Vectors: These are the data points that lie closest to the decision boundary or hyperplane. Support vectors play a crucial role in determining the optimal hyperplane because they influence its position and orientation.
- Margin: The margin is the distance between the hyperplane and the nearest data points (support vectors) from each class. This is referring as "safety gap" between the classes. SVM aims to find the hyperplane with the maximum margin, as it tends to provide better generalization to unseen data.
- Kernel Trick: SVM can handle non-linear decision boundaries by mapping the input data into a higher-dimensional space using a kernel function. This allows the algorithm to find a linear hyperplane in the transformed space, effectively capturing complex relationships in the original feature space.
- C Parameter: The C parameter in SVM represents the regularization parameter. It controls the trade-off between achieving a smooth decision boundary and classifying the training points correctly. A smaller C encourages a wider margin but may misclassify some points, while a larger C results in a narrower margin but fewer misclassifications.
Why is SVM called a Maximum Margin Classifier?
SVM sometimes refers as “Maximum Margin Classifier" because of its emphasis on maximizing the margin between classes when constructing the decision boundary, leading to improved robustness, generalization, and optimal separation of classes.
How does Support Vector Machine (SVM) function?
SVM's main goal is to split data effectively. It does this by finding a line (hyperplane) that separates groups of data points. The distance between this line and the nearest data points from each group is called the margin. SVM looks for the line that maximizes this margin. It does this by:
- Trying out different lines to see how well they separate the groups. Some lines might have more errors in classifying points than others. The left-hand side figure displays three hyperplanes in black, blue, and orange. In this illustration, the blue and orange hyperplanes exhibit higher classification errors, while the black hyperplane accurately separates the two classes.
- Picking the best line that has the biggest margin from the nearest data points, making sure it separates the groups accurately, as depicted in the right-hand side figure.
Image Source: www.datacamp.com
Advantages of SVM
There are several advantages of SVM, but not limited to below:
- SVM can handle different types of data and problems, such as linear and non-linear classification, regression, and outlier detection, thanks to its support for various kernel functions.
- SVM's regularization parameter prevents overfitting, ensuring it can generalize well to new data.
- SVM is memory efficient because it only uses a subset of training data points (called support vectors) in its decision-making process, which is especially helpful for datasets with many features.
Limitations/Shortcomings of SVM
SVM has limitations related to parameter selection, computational complexity, memory requirements, handling of noisy data, interpretability, and lack of probabilistic interpretation, which should be considered when choosing this model.
Below are some major limitations:
- It requires careful selection of hyperparameters as the performance of the SVM model can be sensitive to the choice of parameters such as the kernel type, regularization parameter, and kernel parameters, which may require extensive tuning.
- SVM stores support vectors in memory, which can be memory-intensive for datasets with many support vectors, leading to increased memory requirements.
- SVM can be sensitive to noise in the dataset, which may result in overfitting.
- SVM does not naturally provide probabilities for class membership, making it less suitable for tasks where probability estimates are required.
- SVMs do not perform feature selection or handle irrelevant features well.
- SVM can be biased towards the majority class in imbalanced datasets.