Ensemble Methods

Ensemble methods

Code and images and splits: Rishekesan3012/ML_WR

Ensemble learning combines multiple machine learning models to improve predictive performance, increase robustness, and reduce overfitting. Common ensemble methods include:

Bagging (Bootstrap Aggregating):

Trains multiple models on different random subsets of the data and averages their predictions to reduce variance.

Boosting:

Builds models sequentially, where each new model focuses on correcting the errors of the previous ones. This can reduce both bias and variance.

Random Forest:

A type of bagging that uses many decision trees and averages their predictions. It also adds randomness in feature selection for each split.

AdaBoost:

A boosting method that adjusts the weights of incorrectly classified instances so that subsequent models focus more on difficult cases.

Voting:

Combines the predictions of multiple different models (e.g., SVM, Random Forest, Logistic Regression) by majority or average vote.

Stacking:

Trains multiple models and then uses another model (meta-learner) to combine their outputs.

Methods Used in ours

1. Random Forest

Random Forest is an ensemble of decision trees, each trained on a random subset of the data and features. The final prediction is made by averaging (regression) or majority vote (classification).

Strengths: Handles high-dimensional data, reduces overfitting, provides feature importance.

2. AdaBoost

AdaBoost builds a sequence of weak learners (usually shallow trees), each focusing more on the errors of the previous ones.

Strengths: Often improves performance on difficult datasets, robust to overfitting if not too many estimators are used.

Data Preparation

Data Source and Cleaning

Dataset: master_delivery_data.csv containing restaurant and delivery information.

Sampling: 10,000 rows randomly sampled for analysis.

Feature Engineering:

Categorical features (restaurant_name, city, cuisine) encoded using LabelEncoder.
Missing values in numeric features imputed using SimpleImputer (mean strategy).
Target Variable: has_delivery (whether a restaurant offers delivery).
Splitting and Balancing

Train/Test Split:

Data split into 80% training and 20% testing using train_test_split with stratification to preserve class distribution.

Class Imbalance:

Only ~11% of restaurants offer delivery.

SMOTE (Synthetic Minority Over-sampling Technique) was applied to the training set to balance classes.

Random forest results:

The most important feature was has_takeaway, followed by longitude, latitude, restaurant_name, and cuisine.

Performance:Random Forest achieved high accuracy and strong recall for both classes, especially after SMOTE balancing. The model is particularly strong at identifying restaurants that do not offer delivery, but also performs well for the minority class, correctly identifying most delivery-offering restaurants.

Adaboost Results:

Performance:

AdaBoost achieved a recall of 81% for the minority class, correctly identifying most restaurants offering delivery. Precision was 69%, and overall accuracy was 93.3%. AdaBoost produced more false positives than Random Forest but slightly higher recall for delivery restaurants.

Comparison

Random Forest had higher overall accuracy and precision for the minority class, while AdaBoost achieved slightly higher recall for delivery restaurants.

Both ensemble methods significantly outperform single classifiers, especially for the minority class (has_delivery = 1.0).

Feature importance analysis revealed that has_takeaway, location, and cuisine are the most influential predictors of delivery availability.

(e) Conclusions

Ensemble methods such as Random Forest and AdaBoost, especially when combined with SMOTE for class balancing, significantly improve model performance on imbalanced datasets.

Both models were able to detect restaurants offering delivery with much higher recall and precision than single classifiers, making them valuable for real-world applications.

The most important features for predicting delivery were has_takeaway, location, and cuisine, suggesting that both restaurant characteristics and geographic factors play a key role.

Practical Implication:

These results can help food delivery platforms or restaurant aggregators better identify and target restaurants likely to offer delivery, improving operational planning and user experience. The ensemble approach is robust, generalizes well, and is recommended for similar predictive tasks in the food service domain.

Weafoodher

Menu

Ensemble Methods

0 Comments

About

About Me

Blogger templates

Image Background (True/False)

Popular Posts

Weafoodher

Menu

Ensemble Methods

You may like these posts

0 Comments

About

About Me

Blogger templates

Image Background (True/False)

Popular Posts