Explanation of the Implementation Steps
Dataset Loading and Preparation:
- The dataset (assumed to be
"heart.csv"
) is loaded using pandas. Download Here - Features are separated from the target variable (
"AHD"
). - Data is split into training (70%) and testing (30%) sets.
- The dataset (assumed to be
Bagging Classifier:
- A
BaggingClassifier
is created using aDecisionTreeClassifier
as its base estimator. - The model is trained and evaluated on the test set.
- Accuracy and a detailed classification report are printed.
- A
Individual Classifiers:
- Logistic Regression, SVC, and Random Forest models are individually trained.
- For SVC, probability estimates are enabled (using
probability=True
) so that it can later be used in soft voting. - Each model is evaluated separately with accuracy scores printed.
Voting Classifier Ensemble:
- A
VotingClassifier
is built by combining the three individual classifiers. - Soft voting is used, meaning the classifier predicts the class label based on the average of predicted probabilities.
- The ensemble model is evaluated on the test set, and its performance is compared using accuracy and a classification report.
- A
Model Comparison:
- The accuracy of the Bagging Classifier, individual classifiers, and the Voting Classifier are printed side-by-side to facilitate comparison.
This example highlights how ensemble methods like Bagging and Voting can often improve model performance compared to using individual classifiers.
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.ensemble import BaggingClassifier, VotingClassifier, RandomForestClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score, classification_report
# ------------------------------
# 1. Load and Prepare the Dataset
# ------------------------------
# Assuming the CSV file "heart.csv" is in the current working directory.
data = pd.read_csv('/content/drive/MyDrive/ML_Data_Sets/Heart.csv')
# Assume the dataset has features and a 'target' column for heart disease diagnosis.
# Identify categorical columns
categorical_cols = ['Sex', 'ChestPain', 'Fbs', 'RestECG', 'ExAng', 'Slope', 'Thal']
# Convert categorical columns to dummy variables using one-hot encoding
data = pd.get_dummies(data, columns=categorical_cols, drop_first=True)
data.dropna(inplace=True)
X = data.drop('AHD', axis=1)
y = data['AHD']
# Split the data into training and testing sets (70% train, 30% test)
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.3, random_state=42)
# ------------------------------
# 2. Experiment with Bagging Classifier
# ------------------------------
# Use DecisionTreeClassifier as the base estimator for bagging.
bagging_clf = BaggingClassifier(
estimator=DecisionTreeClassifier(), # Changed base_estimator to estimator
n_estimators=50,
random_state=42
)
bagging_clf.fit(X_train, y_train)
y_pred_bagging = bagging_clf.predict(X_test)
bagging_acc = accuracy_score(y_test, y_pred_bagging)
print("Bagging Classifier Accuracy: {:.2f}%".format(bagging_acc * 100))
print("Bagging Classifier Report:")
print(classification_report(y_test, y_pred_bagging))
# ------------------------------
# 3. Train Individual Classifiers
# ------------------------------
# Logistic Regression
log_reg = LogisticRegression(max_iter=1000, random_state=42)
log_reg.fit(X_train, y_train)
y_pred_lr = log_reg.predict(X_test)
lr_acc = accuracy_score(y_test, y_pred_lr)
print("Logistic Regression Accuracy: {:.2f}%".format(lr_acc * 100))
# Support Vector Machine (SVC)
# Enable probability estimates for soft voting.
svc = SVC(probability=True, random_state=42)
svc.fit(X_train, y_train)
y_pred_svc = svc.predict(X_test)
svc_acc = accuracy_score(y_test, y_pred_svc)
print("SVC Accuracy: {:.2f}%".format(svc_acc * 100))
# Random Forest
rf = RandomForestClassifier(random_state=42)
rf.fit(X_train, y_train)
y_pred_rf = rf.predict(X_test)
rf_acc = accuracy_score(y_test, y_pred_rf)
print("Random Forest Accuracy: {:.2f}%".format(rf_acc * 100))
# ------------------------------
# 4. Train a Voting Classifier Ensemble
# ------------------------------
# Combine Logistic Regression, SVC, and Random Forest with soft voting.
voting_clf = VotingClassifier(
estimators=[('lr', log_reg), ('svc', svc), ('rf', rf)],
voting='soft'
)
voting_clf.fit(X_train, y_train)
y_pred_voting = voting_clf.predict(X_test)
voting_acc = accuracy_score(y_test, y_pred_voting)
print("Voting Classifier Accuracy: {:.2f}%".format(voting_acc * 100))
print("Voting Classifier Report:")
print(classification_report(y_test, y_pred_voting))
# ------------------------------
# 5. Compare Ensemble Models with Individual Classifiers
# ------------------------------
print("\n--- Model Performance Comparison ---")
print("Bagging Classifier Accuracy: {:.2f}%".format(bagging_acc * 100))
print("Logistic Regression Accuracy: {:.2f}%".format(lr_acc * 100))
print("SVC Accuracy: {:.2f}%".format(svc_acc * 100))
print("Random Forest Accuracy: {:.2f}%".format(rf_acc * 100))
print("Voting Classifier Accuracy: {:.2f}%".format(voting_acc * 100))
Output:
0 comments :
Post a Comment
Note: only a member of this blog may post a comment.