Skip to content

DomDom268/Breast-Cancer-Classification

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

1 Commit
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🩺 Breast Cancer Classification with Logistic Regression

πŸ“Œ Project Overview

This project applies logistic regression to classify breast cancer tumors as malignant or benign using the Breast Cancer Wisconsin dataset.

The main goals are to:

  • Explore and preprocess real-world medical data
  • Build and evaluate a logistic regression classifier
  • Interpret coefficients to understand key predictive features
  • Discuss limitations and ethical considerations of applying ML in healthcare

πŸ“Š Dataset

  • Source: [Kaggle – Breast Cancer Wisconsin (Diagnostic)]
  • Size: 569 rows, 31 numeric features
  • Target: Diagnosis (Malignant = 1, Benign = 0)

πŸ› οΈ Tools & Libraries

  • Python (pandas, numpy, matplotlib, seaborn)
  • scikit-learn (Logistic Regression, metrics, preprocessing)
  • Jupyter Notebook for documentation and analysis

πŸ”Ž Workflow

  1. Exploratory Data Analysis (EDA): Visualize class balance and feature distributions.
  2. Preprocessing: Standardize features to make coefficients comparable and encode categorical features.
  3. Modeling: Train a logistic regression classifier.
  4. Evaluation: Assess accuracy, precision, recall, F1, confusion matrix, and ROC-AUC.
  5. Interpretation: Analyze model coefficients.
  6. Discussion: Highlight ethical implications and limitations.

βœ… Results (Summary)

  • Logistic regression achieved near-perfect ROC-AUC on this dataset due to its clean separability.
  • Features related to tumor size, texture and shape (e.g., mean radius, concavity) were the strongest predictors.
  • The model is not intended for clinical use but demonstrates how ML methods can support healthcare analytics.

βš–οΈ Ethical Considerations

  • Predictive models in healthcare require rigorous testing, validation, and expert oversight.
  • False negatives (missed malignant cases) are especially harmful and must be carefully addressed.
  • This project is a proof of concept for learning purposes only.

🎯 Key Takeaways

  • Logistic regression is both interpretable and effective for binary classification.
  • Standardizing features is critical for fair coefficient comparison.
  • Even when performance is high, transparency and ethics are essential in healthcare ML applications.

About

In this project, I utilized Logistic Regression to classify cells as either malignant or benign

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors