Monday, 24 February 2025

Avoid overfitting in regression models.

Implementation Steps Explained

  1. Dataset Loading and Splitting:

    • The California Housing dataset is loaded using fetch_california_housing().
    • The dataset is split into training and testing sets (70% training, 30% testing) to evaluate model performance on unseen data.
  2. Ordinary Linear Regression:

    • A basic LinearRegression model is trained on the training data.
    • Predictions are made on the test set, and performance is evaluated using Mean Squared Error (MSE) and R² Score.
  3. Defining a Range of Regularization Parameters (α):

    • A set of alpha values is defined using np.logspace to span from very low (0.001) to high (1000) values.
    • These values control the strength of the regularization.
  4. Training Ridge and Lasso Regression Models:

    • For each alpha value, both a Ridge and a Lasso regression model are trained on the training data.
    • Their performance on the test set is evaluated using MSE and R² Score, and the results are stored.
  5. Identifying the Optimal α:

    • The alpha value that results in the lowest MSE is selected as the optimal parameter for each model (Ridge and Lasso).
  6. Visualization:

    • MSE versus alpha is plotted on a logarithmic scale for both Ridge and Lasso regression.
    • Vertical lines indicate the optimal alpha values, illustrating how regularization impacts model performance.
  7. Comparing Models:

    • The best Ridge and Lasso models (using the optimal α values) are retrained and compared against the ordinary Linear Regression model.
    • The comparison is done using MSE and R² Score to assess how regularization improves or affects performance.

 

 Output:

Linear Regression MSE: 0.53
Linear Regression R²: 0.60

Optimal Ridge alpha: 323.7458
Optimal Lasso alpha: 0.0095


 

Comparison of Models on Test Data:
Linear Regression - MSE: 0.53, R²: 0.60
Ridge Regression    - MSE: 0.52, R²: 0.60
Lasso Regression    - MSE: 0.53, R²: 0.60

Analysis of Regularization Impact

  • Low α Values:
    When α is very low, the regularization effect is minimal, and the models behave similarly to ordinary linear regression. This can lead to complex models that may overfit the training data.

  • High α Values:
    As α increases, the regularization effect becomes stronger. This forces the model coefficients to shrink toward zero, potentially reducing overfitting but also risking underfitting if α is too high.

  • Optimal α:
    The optimal value of α balances the trade-off between bias and variance, leading to a model that generalizes well to unseen data. The plots help visualize how MSE changes with α, providing insight into the best regularization strength for the given dataset.

 

0 comments :

Post a Comment

Note: only a member of this blog may post a comment.

Machine Learning

More

Advertisement

Java Tutorial

More

UGC NET CS TUTORIAL

MFCS
COA
PL-CG
DBMS
OPERATING SYSTEM
SOFTWARE ENG
DSA
TOC-CD
ARTIFICIAL INT

C Programming

More

Python Tutorial

More

Data Structures

More

computer Organization

More
Top