Naïve Bayes -- Link to the code : Rishekesan3012/ML_WR
Overview of Naïve Bayes
Naïve Bayes is a family of probabilistic classifiers based on Bayes' Theorem, which calculates the probability of a class label given certain feature values. Despite its simplicity, Naïve Bayes has proven to be highly effective in many practical applications, particularly in text classification and binary classification problems.
There are four primary types of Naïve Bayes classifiers:
- Multinomial Naïve Bayes (MN NB): This variant is used when the data consists of discrete count data, such as word counts in text classification.
- Gaussian Naïve Bayes (GNB): Suitable for continuous data that is assumed to follow a Gaussian (normal) distribution.
- Bernoulli Naïve Bayes (BNB): Designed for binary features (e.g., yes/no or true/false).
- Categorical Naïve Bayes (CNB): Best for categorical features with more than two categories.
Each variant of Naïve Bayes has its own strengths and is suitable for different types of data. For this project, we’ll work with the Multinomial, Bernoulli, and Gaussian versions.
Data Preparation for Naïve Bayes
Before we apply the Naïve Bayes classifiers, it’s important to properly preprocess the data. Here are the steps we followed for data preparation:
- Handling Missing Values: The dataset contained some missing values in the has_delivery and has_takeaway columns. Since most restaurants did not offer delivery or takeaway, we imputed these missing values with zeros.Categorical Encoding: The cuisine column was categorical, and it needed to be converted into numerical values for model processing. We used simple encoding to assign a unique integer to each cuisine type.Feature Selection: We selected the following features for model training:cuisine_encoded (numerical encoding of the cuisine type)temperature_F (temperature in Fahrenheit)precipitation (rainfall data)humidity (humidity percentage)wind_speed (wind speed in mph)
-
Train-Test Split: We split the data into training (75%) and testing (25%) sets, ensuring that the splits were stratified to maintain the distribution of the target variable (
has_delivery).
Initial Model Training (Normal)
We initially trained three Naïve Bayes models:
- Multinomial Naïve Bayes (MN NB)
- Bernoulli Naïve Bayes (BNB)
- Gaussian Naïve Bayes (GNB)
Results (Normal)
After training the models, we evaluated their performance based on accuracy and confusion matrices:
- Multinomial Naïve Bayes Accuracy: 0.5578
- Bernoulli Naïve Bayes Accuracy: 0.8826
- Gaussian Naïve Bayes Accuracy: 0.8814
The results show that the Bernoulli and Gaussian models performed better than the Multinomial model. This is expected because the target variable (has_delivery) is binary.
Handling Class Imbalance (Minority Class Focus)
Upon inspecting the target variable, we noticed that there was a significant imbalance between the classes (the majority class being 0, indicating no delivery, and the minority class being 1, indicating delivery). To address this, we used upsampling for the minority class (has_delivery = 1). This technique involves replicating samples from the minority class to create a more balanced dataset.
Tuned Model Training
After balancing the dataset, we retrained the Naïve Bayes models using the upsampled data. We also tuned the hyperparameters, such as the alpha parameter for smoothing in the Naïve Bayes models.
Results (Balanced & Tuned)
The tuned models performed as follows:
- Multinomial Naïve Bayes (Balanced) Accuracy: 0.5706
- Bernoulli Naïve Bayes (Balanced) Accuracy: 0.5903
- Gaussian Naïve Bayes (Balanced) Accuracy: 0.6271
The performance improved after balancing the dataset, with the Gaussian Naïve Bayes model achieving the highest accuracy.
Conclusions
The balancing of the dataset and tuning the models yielded improvements in model performance, with the Decision Tree showing the most significant improvements in accuracy, precision, and recall. Gaussian Naïve Bayes (Balanced) also showed strong performance, especially in recall, making it a good choice for detecting the minority class (restaurants offering delivery). On the other hand, Multinomial Naïve Bayes did not perform well under both normal and balanced conditions, and its utility for this specific task appears limited.

















0 Comments