What is Multiple Regression?
Multiple Linear Regression extends simple regression by allowing you to predict a continuous outcome variable (Y) based on multiple predictor variables (X₁, X₂, X₃, ...). The equation is: Y = β₀ + β₁X₁ + β₂X₂ + ... + βₙXₙ + ε
When to Use Multiple Regression
- You have multiple independent variables and one dependent variable
- You want to predict continuous numerical values
- You want to understand which variables are most important
- Example: Predicting house prices based on size, bedrooms, location, and age
How to Use in Simply ML
- Load Your Data: Click "Select File" or drag and drop your CSV/Excel file
- Select Model: Choose "Multiple Regression" from the ML Model dropdown
- Choose Y Variable: Select the variable you want to predict
- Choose X Variables: Check multiple predictor variables from the list
- Set Train/Test Split: Default is 80% training, 20% testing
- Click Run: The model will train and show diagnostic plots
Understanding the Results
Visualizations:
- Actual vs Predicted: Shows prediction accuracy across all variables
- Residuals vs Fitted: Checks for patterns in errors
- Normal Q-Q Plot: Verifies normal distribution of residuals
- Residual Distribution: Histogram of prediction errors
- Leverage vs Residuals: Identifies influential observations
- VIF Analysis: Checks for multicollinearity between predictors
- Coefficient Plot: Shows the importance and direction of each variable
Key Metrics:
- R² Score: Proportion of variance explained (higher is better)
- Adjusted R²: R² adjusted for number of predictors
- MSE/RMSE/MAE: Different error metrics (lower is better)
- F-statistic: Tests overall model significance
- VIF Scores: Variance Inflation Factor (VIF > 5 indicates multicollinearity)
Regression Coefficients Table
The coefficients table shows:
- Coefficient: The effect size of each variable
- Std Error: Uncertainty in the coefficient
- t-value: Statistical significance indicator
- P-value: Probability the effect is due to chance (p < 0.05 is significant)
Making Predictions
After running the model, enter comma-separated values for all your X variables in order (e.g., "1500, 3, 2, 10" for size, bedrooms, bathrooms, age) and click "Predict".
Tips & Best Practices
- Check VIF scores: If VIF > 10, consider removing highly correlated variables
- Start simple: Begin with fewer variables and add more if needed
- Standardize data: Use the "Standardize Data" button if variables have different scales
- Look for significance: Focus on variables with p-values < 0.05
- Watch for overfitting: If training R² >> testing R², you may have too many variables
- Check assumptions: Residuals should be normally distributed and show no patterns