Simple Regression - Simply ML

What is Simple Regression?

Simple Linear Regression is a statistical method that allows you to predict a continuous outcome variable (Y) based on one predictor variable (X). It models the relationship between two variables by fitting a linear equation: Y = β₀ + β₁X + ε

When to Use Simple Regression

You have one independent variable (X) and one dependent variable (Y)
You want to predict continuous numerical values
You expect a linear relationship between variables
Example: Predicting house prices based on square footage

How to Use in Simply ML

Load Your Data: Click "Select File" or drag and drop your CSV/Excel file
Select Model: Choose "Simple Regression" from the ML Model dropdown
Choose Y Variable: Select the variable you want to predict
Choose X Variable: Select the single predictor variable
Set Train/Test Split: Default is 80% training, 20% testing
Click Run: The model will train and show diagnostic plots

Understanding the Results

Visualizations:

Actual vs Predicted: Shows how well predictions match actual values
Residuals vs Fitted: Checks for patterns in prediction errors
Normal Q-Q Plot: Verifies if residuals are normally distributed
Residual Distribution: Shows the distribution of prediction errors
Leverage vs Residuals: Identifies influential data points

Key Metrics:

R² Score: Proportion of variance explained (0-1, higher is better)
MSE: Mean Squared Error (lower is better)
RMSE: Root Mean Squared Error (in original units)
MAE: Mean Absolute Error (average prediction error)
F-statistic: Tests if the model is statistically significant

Making Predictions

After running the model, enter a value for your X variable in the "Make a Prediction" section and click "Predict" to get the predicted Y value.

Tips & Best Practices

Check for outliers in your data before modeling
Ensure your data shows a roughly linear relationship
Look for an R² value above 0.7 for strong predictions
Check residual plots for random patterns (good) vs systematic patterns (problematic)
Consider standardizing your data if variables have very different scales