SVM


Code and images and splits: Rishekesan3012/ML_WR

Support Vector Machines (SVMs) are powerful supervised learning models used for classification and regression. SVMs are particularly known for their ability to find the optimal linear separator (hyperplane) between classes in a dataset.

Why are SVMs linear separators? At their core, SVMs attempt to find the hyperplane that best separates the classes by maximizing the margin between the closest points (support vectors) of each class. In two dimensions, this is simply a line; in higher dimensions, it's a plane or hyperplane.

The Role of Kernels Many real-world datasets are not linearly separable. SVMs use kernel functions to implicitly map data into higher-dimensional spaces where a linear separator can be found. The kernel trick allows SVMs to compute the dot product in this high-dimensional space without explicitly transforming the data, making computations efficient.

Dot Product and Kernels The dot product measures similarity between vectors. Kernels are, at their heart, functions that compute dot products in transformed feature spaces. This is what enables SVMs to find complex, nonlinear boundaries.


Why are SVMs linear separators?
At their core, SVMs attempt to find the hyperplane that best separates the classes by maximizing the margin between the closest points (support vectors) of each class. In two dimensions, this is simply a line; in higher dimensions, it's a plane or hyperplane.

The Role of Kernels
Many real-world datasets are not linearly separable. SVMs use kernel functions to implicitly map data into higher-dimensional spaces where a linear separator can be found. The kernel trick allows SVMs to compute the dot product in this high-dimensional space without explicitly transforming the data, making computations efficient.


The dot product measures similarity between vectors. Kernels are, at their heart, functions that compute dot products in transformed feature spaces. This is what enables SVMs to find complex, nonlinear boundaries.



 





Data Prep

Supervised modeling requires labeled data. Only data with known class labels can be used to train and evaluate SVMs.

Data Loading: 

We began by loading the dataset from a CSV file containing information about restaurants, their locations, cuisines, weather, and whether delivery was available.

Encoding Categorical Variables:

Since machine learning models require numeric input, we converted categorical columns (restaurant_name, city, cuisine) into numeric values using label encoding. This process assigns a unique integer to each category in a column.

Imputing Missing Values:

To ensure that our models could handle incomplete data, we used mean imputation for any missing values in the feature columns. This means that any missing value was replaced with the average value of that column.

Feature and Target Selection:

We defined our features (X) as all columns except the target (has_delivery) and the identifier (restaurant_id). The target variable (y) was set as the has_delivery column, indicating whether a restaurant offers delivery.


Cleaning the Data:

After encoding and imputation, we combined the features and target into a single DataFrame and dropped any remaining rows with missing values to ensure data integrity.

Splitting into Training and Testing Sets:

We split our cleaned data into a training set (80%) and a testing set (20%) using stratified sampling. Stratification ensures that the proportion of classes (delivery/no delivery) is preserved in both sets, which is important for fair evaluation.

Balancing the Training Set (for Classification):

Since our dataset was imbalanced (more restaurants without delivery than with delivery), we applied SMOTE (Synthetic Minority Over-sampling Technique) to the training set. SMOTE generates synthetic examples of the minority class, resulting in a balanced training set and improving the model’s ability to learn from both classes.


Cleaned data




Train data 

Test data



Results :

Kernel and Cost Performance




Linear Kernel:
The linear kernel consistently achieved the highest accuracy among all tested configurations, with a peak accuracy of 0.932 at C=10. This indicates that the relationship between the features and the target variable in our dataset is largely linear, allowing a straight hyperplane to effectively separate the classes. The model’s performance was stable across different cost values (C=0.1, 1, 10), suggesting that the data is well-suited to linear separation and not overly sensitive to the regularization parameter.


Polynomial and RBF Kernels:


Both the polynomial and RBF kernels resulted in lower and identical accuracies (0.883) across all tested cost values. This suggests that introducing nonlinearity through these kernels did not provide any advantage for our data. It is likely that the additional complexity of these kernels was unnecessary and possibly led to overfitting or underfitting, given the nature of our features.