Monday, 24 February 2025

Implemenating and Comparing Gradient Boosting, XGBoost, and Random Forest.

Explanation of Implementation Steps

  1. Data Loading and Splitting:

    • The California Housing dataset is loaded using fetch_california_housing(), providing features and target values (median house prices).
    • The dataset is split into a training set (70%) and a testing set (30%) for evaluation.
  2. Experiment 1: Gradient Boosting:

    • A GradientBoostingRegressor is instantiated and trained on the training data.
    • The model’s performance is evaluated on the test set using Mean Squared Error (MSE) and R2R^2 Score.
  3. Experiment 2: XGBoost:

    • An XGBoost model is created with XGBRegressor. The objective is set to 'reg:squarederror' for regression.
    • After training, its performance is measured using the same metrics (MSE and R2R^2 Score) on the test data.
    • XGBoost is known for its speed and accuracy, especially on large datasets.
  4. Experiment 3: Model Comparison:

    • A RandomForestRegressor is trained on the same dataset.
    • All three models—Gradient Boosting, XGBoost, and Random Forest—are compared by printing their MSE and R2R^2 Score.
    • A scatter plot visualizes predicted versus actual house prices for a direct comparison of model performance.

This workflow helps you understand how ensemble methods like Gradient Boosting and Random Forest compare with XGBoost, both in terms of accuracy and prediction quality, for a house price prediction task.

 

import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.ensemble import GradientBoostingRegressor, RandomForestRegressor
import xgboost as xgb

# ------------------------------
# Data Loading and Splitting
# ------------------------------
# Load the California Housing dataset
data = fetch_california_housing()
X, y = data.data, data.target

# Split data into training and testing sets (70% train, 30% test)
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42)

# ------------------------------
# Experiment 1: Gradient Boosting
# ------------------------------
gbr = GradientBoostingRegressor(random_state=42)
gbr.fit(X_train, y_train)
y_pred_gbr = gbr.predict(X_test)
mse_gbr = mean_squared_error(y_test, y_pred_gbr)
r2_gbr = r2_score(y_test, y_pred_gbr)
print("Gradient Boosting Regressor:")
print("MSE: {:.4f}".format(mse_gbr))
print("R²: {:.4f}".format(r2_gbr))

# ------------------------------
# Experiment 2: XGBoost
# ------------------------------
# Note: Set the objective to 'reg:squarederror' for regression tasks.
xgb_reg = xgb.XGBRegressor(random_state=42, objective='reg:squarederror')
xgb_reg.fit(X_train, y_train)
y_pred_xgb = xgb_reg.predict(X_test)
mse_xgb = mean_squared_error(y_test, y_pred_xgb)
r2_xgb = r2_score(y_test, y_pred_xgb)
print("\nXGBoost Regressor:")
print("MSE: {:.4f}".format(mse_xgb))
print("R²: {:.4f}".format(r2_xgb))

# ------------------------------
# Experiment 3: Compare Models
# ------------------------------
# Train a Random Forest Regressor
rf = RandomForestRegressor(random_state=42)
rf.fit(X_train, y_train)
y_pred_rf = rf.predict(X_test)
mse_rf = mean_squared_error(y_test, y_pred_rf)
r2_rf = r2_score(y_test, y_pred_rf)

print("\nComparison of Models:")
print("Model\t\t\tMSE\t\tR²")
print("Gradient Boosting\t{:.4f}\t{:.4f}".format(mse_gbr, r2_gbr))
print("XGBoost\t\t\t{:.4f}\t{:.4f}".format(mse_xgb, r2_xgb))
print("Random Forest\t\t{:.4f}\t{:.4f}".format(mse_rf, r2_rf))

# Optional: Visual Comparison of Predictions vs. Actual Values
plt.figure(figsize=(12, 6))
plt.scatter(y_test, y_pred_gbr, alpha=0.5, label='Gradient Boosting')
plt.scatter(y_test, y_pred_xgb, alpha=0.5, label='XGBoost')
plt.scatter(y_test, y_pred_rf, alpha=0.5, label='Random Forest')
plt.plot([y_test.min(), y_test.max()], [y_test.min(), y_test.max()], 'k--', lw=2)
plt.xlabel('Actual House Prices')
plt.ylabel('Predicted House Prices')
plt.title('Model Comparison: Actual vs Predicted House Prices')
plt.legend()
plt.show()

 Output:

 

0 comments :

Post a Comment

Note: only a member of this blog may post a comment.

Machine Learning

More

Advertisement

Java Tutorial

More

UGC NET CS TUTORIAL

MFCS
COA
PL-CG
DBMS
OPERATING SYSTEM
SOFTWARE ENG
DSA
TOC-CD
ARTIFICIAL INT

C Programming

More

Python Tutorial

More

Data Structures

More

computer Organization

More
Top