Overview
Two-Class Support Vector Machine is used to create a model that is based on the Support Vector Machine Algorithm. The classifier that this module initializes is useful for predicting two possible outcomes that depend on continuous or categorical predictor variables.
This model is a supervised learning method and therefore, requires a dataset that includes a labeled column. You can train the model by providing the model and the tagged dataset as an input to Train Model or Tune Model Hyperparameters. The trained model can then be used to predict values for the new input examples.
Understanding SVMs
Support Vector Machines (SVMs) are supervised learning models that analyze data and recognize patterns. They can be used for classification and regression tasks.
The classifier that is created by this module is useful for predicting two possible outcomes that depend on continuous or categorical predictor variables.
Given a set of training examples labeled as belonging to one of two classes, the SVM algorithm assigns new examples into one category or the other. The examples are represented as points in space, and they are mapped so that the examples of the separate categories are divided by a clear gap that is as wide as possible. New examples are then mapped into that same space and predicted to belong to a category based on which side of the gap they fall on.
The feature space that contains the training examples is called a hyperplane, and it may have many dimensions.
SVM models have been used in many applications, from information retrieval to text and image classification. Although recent research has developed algorithms that have higher accuracy, this algorithm can work well on simple data sets when your goal is speed over accuracy.
Configuration of the Two-Class SVM Model
For this model type, it is recommended that you normalize the dataset before using it to train the classifier.
- Add the Two-Class Support Vector Machine module to the experiment
- Specify the method of model training by setting the Create trainer model
- Single Parameter. If you know how you want to configure the model, you can provide a specific set of values as arguments.
- Parameter Range. If you are not sure of the best parameters, you can find the optimal parameters by specifying multiple values and using the Tune Model Hyperparameters module to find the optimal configuration. The trainer will iterate over multiple combinations of the settings you provided and determine the combination of values that produces the best model.
- Set the Number of Iterations - This parameter can be used to control the trade-off between training speed and accuracy.
- Lambda - value is used as the weight for L1 regularization.
This regularization coefficient can be used to tune the model. Larger values penalize more complex models.
- Normalize features - If you apply normalization before training, data points are centered at the mean and scaled to have one unit of standard deviation
- Project to the unit sphere - to normalize coefficients.
- Random number seed - type an integer value to use as a seed if you want to ensure reproducibility across runs
- Allow unknown category - to create a group for unknown values in the training or validation sets. In this case, the model might be less precise for known values, but it can provide better predictions for new (unknown) values
If you deselect it, the model can accept only the values that are contained in the training data
- Train the Model
- If Create trainer mode is set to Single Parameter, connect a tagged dataset and the Train Model module.
- If Create trainer mode is set to Parameter Range, connect a tagged dataset and train the model by using Tune Model Hyperparameters.
- When the model is trained, right-click the output of the Train Model module (or Tune Model Hyperparameters module) and select Visualize to see a summary of the model's parameters, together with the feature weights learned from training.
- Pass the trained model to the Score Model module to make predictions. Alternatively, the untrained model can be passed to Cross-Validate Model for cross-validation against a labeled data set.
Experiment with example
- Dataset - Iris Dataset
- Model
- Configuration
- Score Model
- Evaluation Model
The Iris Dataset is an ideal dataset and the Two-Class SVM algorithm classifies the classes with an accuracy of 100%.
Accuracy = (TP+TN)/(TP+TN+FP+FN)= 1
Type 1 Error = (FP)/(FP+TN) = 0
Type 2 Error = (FN)/(FN+TP) = 0