Logistic Regression

What is Logistic Regression?

Logistic Regression is a classification algorithm used to predict binary outcomes (yes/no, true/false, 0/1). Despite its name, it's used for classification, not regression. It models the probability that an instance belongs to a particular class using the logistic (sigmoid) function.

The model outputs probabilities between 0 and 1, which can be converted to class predictions using a threshold (typically 0.5).

When to Use Logistic Regression

Binary Classification: Predicting outcomes with two classes (spam/not spam, pass/fail)
Probability Estimation: When you need probability estimates, not just class labels
Linear Decision Boundary: When classes are linearly separable
Interpretability: When you need to understand feature importance and coefficients
Baseline Model: As a starting point before trying more complex classifiers

How to Use in Bread

Load Your Data: Import a CSV file with your dataset
Select Target Variable: Choose the binary column you want to predict (must have exactly 2 unique values)
Choose Features: Select the predictor variables (independent variables)
Run Model: Click "Logistic Regression" and the model will train automatically
Review Results: Check accuracy, precision, recall, F1-score, and confusion matrix
Examine Coefficients: Review feature coefficients to understand feature importance

Understanding the Output

Accuracy: Overall percentage of correct predictions
Precision: Of predicted positives, how many were actually positive (reduces false positives)
Recall: Of actual positives, how many were correctly predicted (reduces false negatives)
F1-Score: Harmonic mean of precision and recall (balanced metric)
Confusion Matrix: Shows true positives, true negatives, false positives, false negatives
Coefficients: Positive coefficients increase probability of positive class, negative decrease it
ROC Curve: Visualizes model performance across different thresholds

Best Practices

Encode Target Variable: Ensure target is binary (0/1 or two distinct values)
Handle Categorical Features: Use one-hot encoding for categorical predictors
Scale Features: Use standardization for better convergence and coefficient interpretation
Check Class Balance: If classes are imbalanced, consider precision/recall over accuracy
Remove Multicollinearity: Highly correlated features can affect coefficient interpretation
Validate Model: Use train-test split to check for overfitting

Tips & Warnings

⚠️ Logistic regression assumes linear relationship between features and log-odds
⚠️ Outliers can significantly impact the model
⚠️ Works best when classes are roughly balanced
💡 Compare with Ridge Logistic Regression if you have many correlated features
💡 Use KNN or SVM if relationship is highly non-linear
💡 Consider polynomial features to capture non-linear patterns

Example Use Cases

Medical diagnosis (disease present/absent based on symptoms)
Email spam detection (spam/not spam)
Customer churn prediction (will leave/will stay)
Credit approval (approve/deny loan)
Marketing response (clicked ad/didn't click)