Implemenating and Comparing Gradient Boosting, XGBoost, and Random Forest. ~ TUTORIALTPOINT- Java Tutorial, C Tutorial, DBMS Tutorial

Explanation of Implementation Steps

Data Loading and Splitting:
- The California Housing dataset is loaded using fetch_california_housing(), providing features and target values (median house prices).
- The dataset is split into a training set (70%) and a testing set (30%) for evaluation.
Experiment 1: Gradient Boosting:
- A GradientBoostingRegressor is instantiated and trained on the training data.
- The model’s performance is evaluated on the test set using Mean Squared Error (MSE) and $R^2$ Score.
Experiment 2: XGBoost:
- An XGBoost model is created with XGBRegressor. The objective is set to 'reg:squarederror' for regression.
- After training, its performance is measured using the same metrics (MSE and $R^2$ Score) on the test data.
- XGBoost is known for its speed and accuracy, especially on large datasets.
Experiment 3: Model Comparison:
- A RandomForestRegressor is trained on the same dataset.
- All three models—Gradient Boosting, XGBoost, and Random Forest—are compared by printing their MSE and $R^2$ Score.
- A scatter plot visualizes predicted versus actual house prices for a direct comparison of model performance.

This workflow helps you understand how ensemble methods like Gradient Boosting and Random Forest compare with XGBoost, both in terms of accuracy and prediction quality, for a house price prediction task.

import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.ensemble import GradientBoostingRegressor, RandomForestRegressor
import xgboost as xgb

# ------------------------------
# Data Loading and Splitting
# ------------------------------
# Load the California Housing dataset
data = fetch_california_housing()
X, y = data.data, data.target

# Split data into training and testing sets (70% train, 30% test)
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42)

# ------------------------------
# Experiment 1: Gradient Boosting
# ------------------------------
gbr = GradientBoostingRegressor(random_state=42)
gbr.fit(X_train, y_train)
y_pred_gbr = gbr.predict(X_test)
mse_gbr = mean_squared_error(y_test, y_pred_gbr)
r2_gbr = r2_score(y_test, y_pred_gbr)
print("Gradient Boosting Regressor:")
print("MSE: {:.4f}".format(mse_gbr))
print("R²: {:.4f}".format(r2_gbr))

# ------------------------------
# Experiment 2: XGBoost
# ------------------------------
# Note: Set the objective to 'reg:squarederror' for regression tasks.
xgb_reg = xgb.XGBRegressor(random_state=42, objective='reg:squarederror')
xgb_reg.fit(X_train, y_train)
y_pred_xgb = xgb_reg.predict(X_test)
mse_xgb = mean_squared_error(y_test, y_pred_xgb)
r2_xgb = r2_score(y_test, y_pred_xgb)
print("\nXGBoost Regressor:")
print("MSE: {:.4f}".format(mse_xgb))
print("R²: {:.4f}".format(r2_xgb))

# ------------------------------
# Experiment 3: Compare Models
# ------------------------------
# Train a Random Forest Regressor
rf = RandomForestRegressor(random_state=42)
rf.fit(X_train, y_train)
y_pred_rf = rf.predict(X_test)
mse_rf = mean_squared_error(y_test, y_pred_rf)
r2_rf = r2_score(y_test, y_pred_rf)

print("\nComparison of Models:")
print("Model\t\t\tMSE\t\tR²")
print("Gradient Boosting\t{:.4f}\t{:.4f}".format(mse_gbr, r2_gbr))
print("XGBoost\t\t\t{:.4f}\t{:.4f}".format(mse_xgb, r2_xgb))
print("Random Forest\t\t{:.4f}\t{:.4f}".format(mse_rf, r2_rf))

# Optional: Visual Comparison of Predictions vs. Actual Values
plt.figure(figsize=(12, 6))
plt.scatter(y_test, y_pred_gbr, alpha=0.5, label='Gradient Boosting')
plt.scatter(y_test, y_pred_xgb, alpha=0.5, label='XGBoost')
plt.scatter(y_test, y_pred_rf, alpha=0.5, label='Random Forest')
plt.plot([y_test.min(), y_test.max()], [y_test.min(), y_test.max()], 'k--', lw=2)
plt.xlabel('Actual House Prices')
plt.ylabel('Predicted House Prices')
plt.title('Model Comparison: Actual vs Predicted House Prices')
plt.legend()
plt.show()

Output:

Monday, 24 February 2025

Implemenating and Comparing Gradient Boosting, XGBoost, and Random Forest.

Explanation of Implementation Steps

0 comments :

Post a Comment

NumPy Tutorial

Advertisement

Java Tutorial

UGC NET CS TUTORIAL

Data Base Management

C Programming

Python Tutorial

GATE TUTORIAL

Data Structures

computer Organization

Computer Basics