🩺 Breast Cancer Classification with Logistic Regression

📌 Project Overview

This project applies logistic regression to classify breast cancer tumors as malignant or benign using the Breast Cancer Wisconsin dataset.

The main goals are to:

Explore and preprocess real-world medical data
Build and evaluate a logistic regression classifier
Interpret coefficients to understand key predictive features
Discuss limitations and ethical considerations of applying ML in healthcare

📊 Dataset

Source: [Kaggle – Breast Cancer Wisconsin (Diagnostic)]
Size: 569 rows, 31 numeric features
Target: Diagnosis (Malignant = 1, Benign = 0)

🛠️ Tools & Libraries

Python (pandas, numpy, matplotlib, seaborn)
scikit-learn (Logistic Regression, metrics, preprocessing)
Jupyter Notebook for documentation and analysis

🔎 Workflow

Exploratory Data Analysis (EDA): Visualize class balance and feature distributions.
Preprocessing: Standardize features to make coefficients comparable and encode categorical features.
Modeling: Train a logistic regression classifier.
Evaluation: Assess accuracy, precision, recall, F1, confusion matrix, and ROC-AUC.
Interpretation: Analyze model coefficients.
Discussion: Highlight ethical implications and limitations.

✅ Results (Summary)

Logistic regression achieved near-perfect ROC-AUC on this dataset due to its clean separability.
Features related to tumor size, texture and shape (e.g., mean radius, concavity) were the strongest predictors.
The model is not intended for clinical use but demonstrates how ML methods can support healthcare analytics.

⚖️ Ethical Considerations

Predictive models in healthcare require rigorous testing, validation, and expert oversight.
False negatives (missed malignant cases) are especially harmful and must be carefully addressed.
This project is a proof of concept for learning purposes only.

🎯 Key Takeaways

Logistic regression is both interpretable and effective for binary classification.
Standardizing features is critical for fair coefficient comparison.
Even when performance is high, transparency and ethics are essential in healthcare ML applications.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
BreastCancerData.csv		BreastCancerData.csv
BreastCancerLogisticRegression.ipynb		BreastCancerLogisticRegression.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🩺 Breast Cancer Classification with Logistic Regression

📌 Project Overview

📊 Dataset

🛠️ Tools & Libraries

🔎 Workflow

✅ Results (Summary)

⚖️ Ethical Considerations

🎯 Key Takeaways

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🩺 Breast Cancer Classification with Logistic Regression

📌 Project Overview

📊 Dataset

🛠️ Tools & Libraries

🔎 Workflow

✅ Results (Summary)

⚖️ Ethical Considerations

🎯 Key Takeaways

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages