What is Simple Regression?
Simple Linear Regression is a statistical method that allows you to predict a continuous outcome variable (Y) based on one predictor variable (X). It models the relationship between two variables by fitting a linear equation: Y = β₀ + β₁X + ε
When to Use Simple Regression
- You have one independent variable (X) and one dependent variable (Y)
- You want to predict continuous numerical values
- You expect a linear relationship between variables
- Example: Predicting house prices based on square footage
How to Use in Simply ML
- Load Your Data: Click "Select File" or drag and drop your CSV/Excel file
- Select Model: Choose "Simple Regression" from the ML Model dropdown
- Choose Y Variable: Select the variable you want to predict
- Choose X Variable: Select the single predictor variable
- Set Train/Test Split: Default is 80% training, 20% testing
- Click Run: The model will train and show diagnostic plots
Understanding the Results
Visualizations:
- Actual vs Predicted: Shows how well predictions match actual values
- Residuals vs Fitted: Checks for patterns in prediction errors
- Normal Q-Q Plot: Verifies if residuals are normally distributed
- Residual Distribution: Shows the distribution of prediction errors
- Leverage vs Residuals: Identifies influential data points
Key Metrics:
- R² Score: Proportion of variance explained (0-1, higher is better)
- MSE: Mean Squared Error (lower is better)
- RMSE: Root Mean Squared Error (in original units)
- MAE: Mean Absolute Error (average prediction error)
- F-statistic: Tests if the model is statistically significant
Making Predictions
After running the model, enter a value for your X variable in the "Make a Prediction" section and click "Predict" to get the predicted Y value.
Tips & Best Practices
- Check for outliers in your data before modeling
- Ensure your data shows a roughly linear relationship
- Look for an R² value above 0.7 for strong predictions
- Check residual plots for random patterns (good) vs systematic patterns (problematic)
- Consider standardizing your data if variables have very different scales