Polynomial Regression

What is Polynomial Regression?

Polynomial Regression extends linear regression by modeling the relationship between variables as an nth-degree polynomial. Instead of fitting a straight line, it fits a curve to the data, allowing you to capture non-linear patterns.

For example, a degree 2 (quadratic) polynomial includes terms like x, x², while degree 3 (cubic) includes x, x², x³. The model can fit U-shaped curves, S-curves, and other complex patterns.

When to Use Polynomial Regression

Non-Linear Relationships: When data shows curved or non-linear patterns
U-Shaped Patterns: When relationship increases then decreases (or vice versa)
Growth Curves: Modeling accelerating or decelerating growth
Simple Curves: When relationship is smoothly curved but not highly complex
Feature Engineering: Creating polynomial features for other models

How to Use in Bread

Load Your Data: Import a CSV file with your dataset
Visualize First: Plot your data to identify non-linear patterns
Select Target Variable: Choose the continuous variable you want to predict
Choose Features: Select predictor variables (works best with fewer features)
Set Polynomial Degree: Choose degree (typically 2-4; higher risks overfitting)
Run Model: Click "Polynomial Regression" and review the fit
Check Metrics: Review R² and RMSE on training and test data

Understanding the Output

R² Score: Proportion of variance explained (higher is better, but watch for overfitting)
RMSE: Average prediction error in original units
MAE: Average absolute error (less sensitive to outliers than RMSE)
Coefficients: Show contribution of each polynomial term
Fitted Curve: Visual plot showing how well the curve fits the data
Residual Plot: Should show random scatter; patterns indicate model issues

Choosing Polynomial Degree

Degree 1: Linear regression (straight line)
Degree 2: Quadratic (U-shaped or inverted U-shaped curve)
Degree 3: Cubic (S-shaped curve, one inflection point)
Degree 4+: More complex curves (higher risk of overfitting)

Rule of Thumb: Start with degree 2, increase only if necessary. Compare train vs test R² to detect overfitting.

Best Practices

Start Simple: Begin with degree 2 or 3 before trying higher degrees
Scale Features: Use standardization to prevent numerical instability with high powers
Limit Features: Works best with 1-3 features; too many creates explosive feature count
Watch for Overfitting: Compare training vs test performance
Use Regularization: Consider Ridge/Lasso with polynomial features for stability
Cross-Validate: Use k-fold cross-validation to select optimal degree

Tips & Warnings

⚠️ High-degree polynomials can wildly extrapolate beyond data range
⚠️ Overfitting risk increases dramatically with polynomial degree
⚠️ With multiple features, polynomial expansion creates many terms (e.g., 3 features × degree 3 = 10 terms)
⚠️ Unstable at edges: predictions near data boundaries can be unreliable
💡 Always visualize the fitted curve to ensure it makes sense
💡 If degree > 4 needed, consider splines or non-parametric models instead
💡 Combine with Ridge/Lasso to reduce overfitting

Example Use Cases

Modeling crop yield vs fertilizer amount (optimal point exists)
Temperature effects on chemical reactions (rate increases then plateaus)
Marketing: sales vs advertising spend (diminishing returns)
Physics: projectile motion (parabolic trajectory)
Economics: cost curves showing economies/diseconomies of scale

Common Pitfalls

Overfitting: Model fits training data perfectly but fails on new data
Extrapolation: Making predictions far outside training data range
Feature Explosion: Too many polynomial terms with multiple features
Numerical Instability: Very large/small coefficients with unscaled data