Decision Tree


In our project, Decision Trees are used to predict whether a restaurant offers delivery services based on various features, such as cuisine type, temperature, precipitation, humidity, and wind speed. The goal is to leverage the interpretability and flexibility of Decision Trees to create a model that can efficiently classify restaurants into two categories: those that offer delivery and those that do not.

Decision Trees were particularly useful in this context due to their ability to handle both categorical and continuous variables. They also provide clear decision rules, which makes them easy to interpret and understand. By training the Decision Tree on the dataset and applying techniques such as upsampling to address the class imbalance, we ensured that the model was exposed equally to both classes (delivery and no delivery).

After balancing the data and tuning the model's parameters (like tree depth and minimum samples per split), the Decision Tree model improved its accuracy and recall, especially for the minority class (restaurants offering delivery). This makes Decision Trees an ideal choice for this classification task, as they not only provide good predictive performance but also allow us to visualize how decisions are made based on the input features, offering valuable insights into the factors that influence a restaurant's decision to offer delivery services.








Decision Trees use various criteria for splitting the data. Some of the most commonly used are:

  • GINI Impurity: Measures how often a randomly chosen element would be incorrectly classified. A lower GINI value indicates a better split.
  • Entropy: A measure of the disorder or randomness. A split that reduces entropy results in a more ordered decision-making process.
  • Information Gain: The reduction in entropy after a split, which helps in choosing the best feature for the split.

Data Preparation for Decision Trees

Just like in Naïve Bayes, the data was preprocessed by handling missing values, encoding categorical features, and splitting the dataset into training and testing sets. For Decision Trees, we used the same features (cuisine_encoded, temperature_F, precipitation, humidity, wind_speed) and the target variable (has_delivery).

The initial dataset was imbalanced, with the majority of restaurants not offering delivery. This imbalance could lead to biased predictions, so the dataset was balanced by upsampling the minority class (has_delivery = 1). This ensured that the model was equally trained on both classes, which is critical for good model performance.


CODE : ML_WR/3_module.ipynb at main · Rishekesan3012/ML_WR



Initial Decision Tree Training (Normal)

For the initial Decision Tree model, we used the Entropy criterion for splitting the data. The tree was trained on the imbalanced dataset and tested on the testing set.






Train data

Test data



Decision Tree Results (Normal)


Confusion Matrix (Entropy Criterion)

The confusion matrix for the Decision Tree model trained with the Entropy criterion shows the performance of the model when predicting the target variable has_delivery. The matrix reveals that the model predicted the majority class (no delivery) almost exclusively, with 234,689 instances predicted as 0 (no delivery) and 31,216 instances predicted as 1 (delivery), but there were no false positives or false negatives in the matrix. This suggests that, despite achieving a high accuracy score of 0.8826, the model was biased toward predicting the majority class and failed to capture the minority class (restaurants offering delivery).


Decision Tree Visualization (Entropy Criterion)

The visualization of the Decision Tree trained with the Entropy criterion shows a tree with several decision nodes based on various features, such as cuisine_encoded and temperature_F. The tree is relatively shallow with clear decisions at each level based on the values of these features. For example, the root node is based on cuisine_encoded, and subsequent splits are made using other features like humidity and temperature.

Although the tree structure is easy to interpret, the model's inability to correctly classify the minority class (delivery) shows that overfitting might have occurred due to the imbalance in the data.




Decision Tree Visualization (Gini Criterion)

The Decision Tree trained using the Gini criterion showed a deeper tree, with splits made on features like temperature_F, cuisine_encoded, and humidity. The Gini index values for each split indicate that the tree was trying to find the best possible separation of the data, but similar to the entropy model, it still heavily favored predicting the majority class.

Despite the tree having more splits and being trained with a different criterion, it still faces the same challenge with class imbalance, making it less effective in predicting the minority class.









Handling Class Imbalance (Minority Class Focus)

To address the class imbalance, we applied upsampling to the minority class (has_delivery = 1) so that the model would be equally exposed to both classes during training. This helps the model better generalize and accurately predict the minority class.

Tuning the Decision Tree

After balancing the dataset, we fine-tuned the Decision Tree by adjusting key hyperparameters such as the maximum depth of the tree and the minimum samples per split. This tuning helps prevent overfitting by limiting the complexity of the tree.


Results and Conclusion :



In the initial, untuned Decision Tree model (using the Entropy and Gini criteria), we observed that the model was heavily biased toward predicting the majority class (no delivery). The confusion matrix indicated that no instances of the minority class (restaurants offering delivery) were predicted, leading to low recall for the minority class. This demonstrates the importance of class balancing and hyperparameter tuning to address the issues arising from imbalanced datasets.

However, after balancing the dataset and tuning the Decision Tree model, the model's ability to predict both classes improved significantly