1. Data Generation
- Objective: Create a dataset that simulates a non-linear relationship, such as advertising expenditure vs. sales.
- Process:
- Use
numpy.linspace
to generate 100 evenly spaced values between 0 and 10 for the independent variable . - Simulate the dependent variable using a quadratic function (e.g., ) and add Gaussian noise with
np.random.randn
to mimic real-world variability.
- Use
2. Simple Linear Regression
- Objective: Fit a basic linear model to the data.
- Process:
- Create an instance of
LinearRegression
from scikit-learn. - Fit the model using the original and values.
- Use the model to predict values (
y_lin_pred
) on the same data.
- Create an instance of
- Note: This model assumes a straight-line relationship and may not capture the non-linear patterns in the data.
3. Polynomial Regression
- Objective: Capture the non-linear relationship by introducing polynomial terms.
- Process:
- Transforming Features:
- Use
PolynomialFeatures
to transform into polynomial features. For example, setting the degree to 2 generates three features: the constant term (), , and .
- Use
- Fitting the Model:
- Use another
LinearRegression
instance and fit it on these transformed features. - Predict the target variable (
y_poly_pred
) using the polynomial model.
- Use another
- Transforming Features:
- Benefit: This approach allows the regression model to fit curves rather than straight lines, improving the fit for non-linear data.
4. Model Performance Comparison
- Objective: Evaluate and compare the performance of the simple linear model against the polynomial model.
- Process:
- Compute the Mean Squared Error (MSE) for both models using
mean_squared_error
. - Calculate the score using
r2_score
to assess how well each model explains the variance in the data.
- Compute the Mean Squared Error (MSE) for both models using
- Interpretation:
- Lower MSE and higher indicate a better fit. The polynomial model is expected to outperform the linear model when the true relationship is non-linear.
5. Visualization
- Objective: Visualize how well each model fits the data.
- Process:
- Use Matplotlib to create a scatter plot of the original data points.
- Plot the predictions from the simple linear regression (a straight line) and the polynomial regression (a curve).
- Label axes, add a title, and include a legend to differentiate between the models.
- Benefit: Visual inspection provides an intuitive understanding of how the models compare, highlighting the polynomial model’s improved ability to capture the non-linear trend.
import numpy as np
import matplotlib.pyplot as plt
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
# Generate synthetic data: advertising vs. sales with a non-linear relationship.
np.random.seed(42)
X = np.linspace(0, 10, 100).reshape(-1, 1)
# Simulate a quadratic relationship with noise:
y = 2 + 3 * X - 0.5 * X**2 + np.random.randn(100, 1) * 2
# --------------------------
# Simple Linear Regression
# --------------------------
lin_reg = LinearRegression()
lin_reg.fit(X, y)
y_lin_pred = lin_reg.predict(X)
# --------------------------
# Polynomial Regression (Degree = 2)
# --------------------------
poly_degree = 2
poly_features = PolynomialFeatures(degree=poly_degree)
X_poly = poly_features.fit_transform(X)
poly_reg = LinearRegression()
poly_reg.fit(X_poly, y)
y_poly_pred = poly_reg.predict(X_poly)
# --------------------------
# Performance Comparison
# --------------------------
mse_lin = mean_squared_error(y, y_lin_pred)
mse_poly = mean_squared_error(y, y_poly_pred)
r2_lin = r2_score(y, y_lin_pred)
r2_poly = r2_score(y, y_poly_pred)
print("Linear Regression MSE:", mse_lin)
print("Polynomial Regression MSE:", mse_poly)
print("Linear Regression R2:", r2_lin)
print("Polynomial Regression R2:", r2_poly)
# --------------------------
# Plotting the results
# --------------------------
plt.figure(figsize=(10, 6))
plt.scatter(X, y, label="Data", color="black")
plt.plot(X, y_lin_pred, label="Linear Regression", color="blue", linewidth=2)
plt.plot(X, y_poly_pred, label=f"Polynomial Regression (Degree {poly_degree})", color="red", linewidth=2)
plt.xlabel("Advertising Expenditure")
plt.ylabel("Sales")
plt.title("Linear vs Polynomial Regression")
plt.legend()
plt.show()
Output:
Linear Regression MSE: 16.86354989653036 Polynomial Regression MSE: 3.2471947501886613 Linear Regression R2: 0.66231449906867 Polynomial Regression R2: 0.9349762895376701
Interpretation:
The code creates a scatter plot of the original data and plots the predictions from both the linear and polynomial regressions. This visualization is crucial for understanding how well each model captures the relationship between the advertising expenditure and sales. The plot clearly shows that the polynomial regression (in this case with a quadratic curve) fits the non-linear data much better than simple linear regression.
0 comments :
Post a Comment
Note: only a member of this blog may post a comment.