Monday, 24 February 2025

Extend Linear Regression to handle non-linear relationships.


1. Data Generation

  • Objective: Create a dataset that simulates a non-linear relationship, such as advertising expenditure vs. sales.
  • Process:
    • Use numpy.linspace to generate 100 evenly spaced values between 0 and 10 for the independent variable XX.
    • Simulate the dependent variable yy using a quadratic function (e.g., y=2+3X0.5X2y = 2 + 3X - 0.5X^2) and add Gaussian noise with np.random.randn to mimic real-world variability.

2. Simple Linear Regression

  • Objective: Fit a basic linear model to the data.
  • Process:
    • Create an instance of LinearRegression from scikit-learn.
    • Fit the model using the original XX and yy values.
    • Use the model to predict yy values (y_lin_pred) on the same XX data.
  • Note: This model assumes a straight-line relationship and may not capture the non-linear patterns in the data.

3. Polynomial Regression

  • Objective: Capture the non-linear relationship by introducing polynomial terms.
  • Process:
    • Transforming Features:
      • Use PolynomialFeatures to transform XX into polynomial features. For example, setting the degree to 2 generates three features: the constant term (X0X^0), XX, and X2X^2.
    • Fitting the Model:
      • Use another LinearRegression instance and fit it on these transformed features.
      • Predict the target variable (y_poly_pred) using the polynomial model.
  • Benefit: This approach allows the regression model to fit curves rather than straight lines, improving the fit for non-linear data.

4. Model Performance Comparison

  • Objective: Evaluate and compare the performance of the simple linear model against the polynomial model.
  • Process:
    • Compute the Mean Squared Error (MSE) for both models using mean_squared_error.
    • Calculate the R2R^2 score using r2_score to assess how well each model explains the variance in the data.
  • Interpretation:
    • Lower MSE and higher R2R^2 indicate a better fit. The polynomial model is expected to outperform the linear model when the true relationship is non-linear.

5. Visualization

  • Objective: Visualize how well each model fits the data.
  • Process:
    • Use Matplotlib to create a scatter plot of the original data points.
    • Plot the predictions from the simple linear regression (a straight line) and the polynomial regression (a curve).
    • Label axes, add a title, and include a legend to differentiate between the models.
  • Benefit: Visual inspection provides an intuitive understanding of how the models compare, highlighting the polynomial model’s improved ability to capture the non-linear trend.

 

import numpy as np
import matplotlib.pyplot as plt
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score

# Generate synthetic data: advertising vs. sales with a non-linear relationship.
np.random.seed(42)
X = np.linspace(0, 10, 100).reshape(-1, 1)
# Simulate a quadratic relationship with noise:
y = 2 + 3 * X - 0.5 * X**2 + np.random.randn(100, 1) * 2

# --------------------------
# Simple Linear Regression
# --------------------------
lin_reg = LinearRegression()
lin_reg.fit(X, y)
y_lin_pred = lin_reg.predict(X)

# --------------------------
# Polynomial Regression (Degree = 2)
# --------------------------
poly_degree = 2
poly_features = PolynomialFeatures(degree=poly_degree)
X_poly = poly_features.fit_transform(X)

poly_reg = LinearRegression()
poly_reg.fit(X_poly, y)
y_poly_pred = poly_reg.predict(X_poly)

# --------------------------
# Performance Comparison
# --------------------------
mse_lin = mean_squared_error(y, y_lin_pred)
mse_poly = mean_squared_error(y, y_poly_pred)
r2_lin = r2_score(y, y_lin_pred)
r2_poly = r2_score(y, y_poly_pred)

print("Linear Regression MSE:", mse_lin)
print("Polynomial Regression MSE:", mse_poly)
print("Linear Regression R2:", r2_lin)
print("Polynomial Regression R2:", r2_poly)

# --------------------------
# Plotting the results
# --------------------------
plt.figure(figsize=(10, 6))
plt.scatter(X, y, label="Data", color="black")
plt.plot(X, y_lin_pred, label="Linear Regression", color="blue", linewidth=2)
plt.plot(X, y_poly_pred, label=f"Polynomial Regression (Degree {poly_degree})", color="red", linewidth=2)
plt.xlabel("Advertising Expenditure")
plt.ylabel("Sales")
plt.title("Linear vs Polynomial Regression")
plt.legend()
plt.show()

 Output:

Linear Regression MSE: 16.86354989653036
Polynomial Regression MSE: 3.2471947501886613
Linear Regression R2: 0.66231449906867
Polynomial Regression R2: 0.9349762895376701

 

 

 

 Interpretation:

The code creates a scatter plot of the original data and plots the predictions from both the linear and polynomial regressions. This visualization is crucial for understanding how well each model captures the relationship between the advertising expenditure and sales. The plot clearly shows that the polynomial regression (in this case with a quadratic curve) fits the non-linear data much better than simple linear regression.

0 comments :

Post a Comment

Note: only a member of this blog may post a comment.

Machine Learning

More

Advertisement

Java Tutorial

More

UGC NET CS TUTORIAL

MFCS
COA
PL-CG
DBMS
OPERATING SYSTEM
SOFTWARE ENG
DSA
TOC-CD
ARTIFICIAL INT

C Programming

More

Python Tutorial

More

Data Structures

More

computer Organization

More
Top