What is Polynomial Regression?
Polynomial Regression extends linear regression by modeling the relationship between variables as an nth-degree polynomial. Instead of fitting a straight line, it fits a curve to the data, allowing you to capture non-linear patterns.
For example, a degree 2 (quadratic) polynomial includes terms like x, x², while degree 3 (cubic) includes x, x², x³. The model can fit U-shaped curves, S-curves, and other complex patterns.
When to Use Polynomial Regression
- Non-Linear Relationships: When data shows curved or non-linear patterns
- U-Shaped Patterns: When relationship increases then decreases (or vice versa)
- Growth Curves: Modeling accelerating or decelerating growth
- Simple Curves: When relationship is smoothly curved but not highly complex
- Feature Engineering: Creating polynomial features for other models
How to Use in Bread
- Load Your Data: Import a CSV file with your dataset
- Visualize First: Plot your data to identify non-linear patterns
- Select Target Variable: Choose the continuous variable you want to predict
- Choose Features: Select predictor variables (works best with fewer features)
- Set Polynomial Degree: Choose degree (typically 2-4; higher risks overfitting)
- Run Model: Click "Polynomial Regression" and review the fit
- Check Metrics: Review R² and RMSE on training and test data
Understanding the Output
- R² Score: Proportion of variance explained (higher is better, but watch for overfitting)
- RMSE: Average prediction error in original units
- MAE: Average absolute error (less sensitive to outliers than RMSE)
- Coefficients: Show contribution of each polynomial term
- Fitted Curve: Visual plot showing how well the curve fits the data
- Residual Plot: Should show random scatter; patterns indicate model issues
Choosing Polynomial Degree
- Degree 1: Linear regression (straight line)
- Degree 2: Quadratic (U-shaped or inverted U-shaped curve)
- Degree 3: Cubic (S-shaped curve, one inflection point)
- Degree 4+: More complex curves (higher risk of overfitting)
Rule of Thumb: Start with degree 2, increase only if necessary. Compare train vs test R² to detect overfitting.
Best Practices
- Start Simple: Begin with degree 2 or 3 before trying higher degrees
- Scale Features: Use standardization to prevent numerical instability with high powers
- Limit Features: Works best with 1-3 features; too many creates explosive feature count
- Watch for Overfitting: Compare training vs test performance
- Use Regularization: Consider Ridge/Lasso with polynomial features for stability
- Cross-Validate: Use k-fold cross-validation to select optimal degree
Tips & Warnings
- ⚠️ High-degree polynomials can wildly extrapolate beyond data range
- ⚠️ Overfitting risk increases dramatically with polynomial degree
- ⚠️ With multiple features, polynomial expansion creates many terms (e.g., 3 features × degree 3 = 10 terms)
- ⚠️ Unstable at edges: predictions near data boundaries can be unreliable
- 💡 Always visualize the fitted curve to ensure it makes sense
- 💡 If degree > 4 needed, consider splines or non-parametric models instead
- 💡 Combine with Ridge/Lasso to reduce overfitting
Example Use Cases
- Modeling crop yield vs fertilizer amount (optimal point exists)
- Temperature effects on chemical reactions (rate increases then plateaus)
- Marketing: sales vs advertising spend (diminishing returns)
- Physics: projectile motion (parabolic trajectory)
- Economics: cost curves showing economies/diseconomies of scale
Common Pitfalls
- Overfitting: Model fits training data perfectly but fails on new data
- Extrapolation: Making predictions far outside training data range
- Feature Explosion: Too many polynomial terms with multiple features
- Numerical Instability: Very large/small coefficients with unscaled data