Simply ML

What is Polynomial Regression?

Polynomial Regression extends linear regression by modeling the relationship between variables as an nth-degree polynomial. Instead of fitting a straight line, it fits a curve to the data, allowing you to capture non-linear patterns.

For example, a degree 2 (quadratic) polynomial includes terms like x, x², while degree 3 (cubic) includes x, x², x³. The model can fit U-shaped curves, S-curves, and other complex patterns.

When to Use Polynomial Regression

  • Non-Linear Relationships: When data shows curved or non-linear patterns
  • U-Shaped Patterns: When relationship increases then decreases (or vice versa)
  • Growth Curves: Modeling accelerating or decelerating growth
  • Simple Curves: When relationship is smoothly curved but not highly complex
  • Feature Engineering: Creating polynomial features for other models

How to Use in Bread

  1. Load Your Data: Import a CSV file with your dataset
  2. Visualize First: Plot your data to identify non-linear patterns
  3. Select Target Variable: Choose the continuous variable you want to predict
  4. Choose Features: Select predictor variables (works best with fewer features)
  5. Set Polynomial Degree: Choose degree (typically 2-4; higher risks overfitting)
  6. Run Model: Click "Polynomial Regression" and review the fit
  7. Check Metrics: Review R² and RMSE on training and test data

Understanding the Output

  • R² Score: Proportion of variance explained (higher is better, but watch for overfitting)
  • RMSE: Average prediction error in original units
  • MAE: Average absolute error (less sensitive to outliers than RMSE)
  • Coefficients: Show contribution of each polynomial term
  • Fitted Curve: Visual plot showing how well the curve fits the data
  • Residual Plot: Should show random scatter; patterns indicate model issues

Choosing Polynomial Degree

  • Degree 1: Linear regression (straight line)
  • Degree 2: Quadratic (U-shaped or inverted U-shaped curve)
  • Degree 3: Cubic (S-shaped curve, one inflection point)
  • Degree 4+: More complex curves (higher risk of overfitting)

Rule of Thumb: Start with degree 2, increase only if necessary. Compare train vs test R² to detect overfitting.

Best Practices

  • Start Simple: Begin with degree 2 or 3 before trying higher degrees
  • Scale Features: Use standardization to prevent numerical instability with high powers
  • Limit Features: Works best with 1-3 features; too many creates explosive feature count
  • Watch for Overfitting: Compare training vs test performance
  • Use Regularization: Consider Ridge/Lasso with polynomial features for stability
  • Cross-Validate: Use k-fold cross-validation to select optimal degree

Tips & Warnings

  • ⚠️ High-degree polynomials can wildly extrapolate beyond data range
  • ⚠️ Overfitting risk increases dramatically with polynomial degree
  • ⚠️ With multiple features, polynomial expansion creates many terms (e.g., 3 features × degree 3 = 10 terms)
  • ⚠️ Unstable at edges: predictions near data boundaries can be unreliable
  • 💡 Always visualize the fitted curve to ensure it makes sense
  • 💡 If degree > 4 needed, consider splines or non-parametric models instead
  • 💡 Combine with Ridge/Lasso to reduce overfitting

Example Use Cases

  • Modeling crop yield vs fertilizer amount (optimal point exists)
  • Temperature effects on chemical reactions (rate increases then plateaus)
  • Marketing: sales vs advertising spend (diminishing returns)
  • Physics: projectile motion (parabolic trajectory)
  • Economics: cost curves showing economies/diseconomies of scale

Common Pitfalls

  • Overfitting: Model fits training data perfectly but fails on new data
  • Extrapolation: Making predictions far outside training data range
  • Feature Explosion: Too many polynomial terms with multiple features
  • Numerical Instability: Very large/small coefficients with unscaled data