What is Logistic Regression?
Logistic Regression is a classification algorithm used to predict binary outcomes (yes/no, true/false, 0/1). Despite its name, it's used for classification, not regression. It models the probability that an instance belongs to a particular class using the logistic (sigmoid) function.
The model outputs probabilities between 0 and 1, which can be converted to class predictions using a threshold (typically 0.5).
When to Use Logistic Regression
- Binary Classification: Predicting outcomes with two classes (spam/not spam, pass/fail)
- Probability Estimation: When you need probability estimates, not just class labels
- Linear Decision Boundary: When classes are linearly separable
- Interpretability: When you need to understand feature importance and coefficients
- Baseline Model: As a starting point before trying more complex classifiers
How to Use in Bread
- Load Your Data: Import a CSV file with your dataset
- Select Target Variable: Choose the binary column you want to predict (must have exactly 2 unique values)
- Choose Features: Select the predictor variables (independent variables)
- Run Model: Click "Logistic Regression" and the model will train automatically
- Review Results: Check accuracy, precision, recall, F1-score, and confusion matrix
- Examine Coefficients: Review feature coefficients to understand feature importance
Understanding the Output
- Accuracy: Overall percentage of correct predictions
- Precision: Of predicted positives, how many were actually positive (reduces false positives)
- Recall: Of actual positives, how many were correctly predicted (reduces false negatives)
- F1-Score: Harmonic mean of precision and recall (balanced metric)
- Confusion Matrix: Shows true positives, true negatives, false positives, false negatives
- Coefficients: Positive coefficients increase probability of positive class, negative decrease it
- ROC Curve: Visualizes model performance across different thresholds
Best Practices
- Encode Target Variable: Ensure target is binary (0/1 or two distinct values)
- Handle Categorical Features: Use one-hot encoding for categorical predictors
- Scale Features: Use standardization for better convergence and coefficient interpretation
- Check Class Balance: If classes are imbalanced, consider precision/recall over accuracy
- Remove Multicollinearity: Highly correlated features can affect coefficient interpretation
- Validate Model: Use train-test split to check for overfitting
Tips & Warnings
- ⚠️ Logistic regression assumes linear relationship between features and log-odds
- ⚠️ Outliers can significantly impact the model
- ⚠️ Works best when classes are roughly balanced
- 💡 Compare with Ridge Logistic Regression if you have many correlated features
- 💡 Use KNN or SVM if relationship is highly non-linear
- 💡 Consider polynomial features to capture non-linear patterns
Example Use Cases
- Medical diagnosis (disease present/absent based on symptoms)
- Email spam detection (spam/not spam)
- Customer churn prediction (will leave/will stay)
- Credit approval (approve/deny loan)
- Marketing response (clicked ad/didn't click)