Bias in Large AI Models for Medicine and Healthcare: Survey and Challenges

📖 Overview

This repository contains the resources, taxonomy, and data associated with our survey paper: "Bias in Large AI Models for Medicine and Healthcare: Survey and Challenges".

Large AI models (including LLMs, LVMs, and LMMs) are transforming healthcare, yet they risk perpetuating or amplifying medical biases. This project provides a comprehensive synthesis of 55 representative studies, organizing the literature into a clear taxonomy of bias, detection methods, and mitigation strategies.

Figure 1: An overview of bias in Large AI models for medicine and healthcare.

🎯 Key Contributions

Taxonomy: A dual taxonomy categorizing bias by Medical Scenarios (e.g., triage, education) and Clinical Specialties (e.g., cardiology, oncology).
Resources: A structured index of Large AI Models and Datasets used in bias research.
Methodology: A review of current techniques for Bias Detection (e.g., counterfactual testing) and Mitigation (pre-, in-, and post-processing).
Future Directions: Identification of open problems such as the fairness-accuracy trade-off and global health inequities.

🧩 Taxonomy of Medical Bias

We categorize medical bias along two principal axes to facilitate precise identification and mitigation.

1. Bias Across Medical Scenarios

Clinical Decision Support: Disparities in diagnostic reasoning or treatment planning.
Patient Communication: Biased triage advice or health counseling via chatbots.
Medical Documentation: Stereotypes or hallucinations in report generation and summarization.
Medical Education: Misrepresentation in generated case vignettes or training materials.

2. Bias Across Clinical Specialties

Our survey covers biases identified in specific domains, including:

🫀 Cardiology
🫁 Pulmonology
🦀 Oncology
🦠 Infectious Disease
👁️ Ophthalmology
🧠 Mental Health & Psychiatry

🛠️ Resources

🤖 General Large AI Models for Medical Bias Research

Below are selected models analyzed in the survey.

Model Name	Family	Parameter Size	Open Source?
GPT-4	GPT	≥ 175B	No
GPT-3.5	GPT	≥ 175B	No
Claude-3.5	Claude	≥ 175B	No
LlaMa-3	LlaMa	≥ 175B	Yes
Qwen-2.5	Qwen	≥ 175B	Yes
Deepseek-V3	Deepseek	≥ 175B	Yes

🤖 Medical Large AI Models for Medical Bias Research

Below are selected models analyzed in the survey.

Model Name	Family	Parameter Size	Open Source?
Med-PaLM 2	PaLM 2	≥ 175B	No
Meditron	LlaMa-2	70B-175B	Yes
PMC-LlaMa	LlaMa	10B-70B	Yes
LLaVA-Med	LLaVA	1B-10B	Yes
ClinicalBERT	ClinicalBERT	< 1B	Yes

📊 Datasets for Bias Detection

We have compiled datasets across three modalities: Text, Image, and Multimodal.

Text: MedQA, PubMedQA, MIMIC-IV, AMQA, BiasMD.
Image: CheXpert, MIMIC-CXR, HAM10000, ODIR, Fitzpatrick17k.
Multimodal: LLaVA-Med, ROCO, PMC-OA.

⚙️ Methodology

Bias Detection Techniques

Input Generation: Creating synthetic patients or mutating existing clinical vignettes (e.g., changing "Male" to "Female").
Evaluation Metrics:
- Answer Consistency: Measuring robustness across demographic changes.
- Fairness Metrics: Demographic Parity, Equalized Odds.
- Human Expert Assessment: Physician review for complex scenarios.

Bias Mitigation Strategies

Pre-processing: Data augmentation and rebalancing before training.
In-processing: Model fine-tuning (e.g., FairCLIP), loss function modification.
Post-processing: Prompt engineering (Chain-of-Thought), output rewriting, and ensembling.

🚀 Open Problems & Opportunities

Based on our analysis, we highlight the following urgent research directions:

Unified Foundations: Defining "medical fairness" distinct from general AI fairness.
Standardized Benchmarks: Moving beyond ad-hoc testing to rigorous, scalable benchmarks.
Real-World Validation: Continuous monitoring of models in deployed clinical settings.
Global Health Equity: Addressing the lack of representation for non-Western populations and languages.
Fairness-Accuracy Trade-off: Investigating how debiasing affects diagnostic performance.

📝 Citation

If you find this survey or repository helpful, please cite our work:

@article{xiao2025bias,
  title={Bias in Large AI Models for Medicine and Healthcare: Survey and Challenges},
  author={Xiao, Ying and Chen, Zhenpeng and Huang, Jen-tse and Chen, Wenting and Liu, Yepang and Li, Kezhi and Mousavi, Mohammadreza and Dobson, Richard and Zhang, Jie},
  year={2025}
}

This Readme file is generated by Gemini-3

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
MedLLM_Bias_Survey.pdf		MedLLM_Bias_Survey.pdf
MedicalBiasOverview.pdf		MedicalBiasOverview.pdf
MedicalBiasOverview.png		MedicalBiasOverview.png
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Bias in Large AI Models for Medicine and Healthcare: Survey and Challenges

📖 Overview

🎯 Key Contributions

🧩 Taxonomy of Medical Bias

1. Bias Across Medical Scenarios

2. Bias Across Clinical Specialties

🛠️ Resources

🤖 General Large AI Models for Medical Bias Research

🤖 Medical Large AI Models for Medical Bias Research

📊 Datasets for Bias Detection

⚙️ Methodology

Bias Detection Techniques

Bias Mitigation Strategies

🚀 Open Problems & Opportunities

📝 Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Bias in Large AI Models for Medicine and Healthcare: Survey and Challenges

📖 Overview

🎯 Key Contributions

🧩 Taxonomy of Medical Bias

1. Bias Across Medical Scenarios

2. Bias Across Clinical Specialties

🛠️ Resources

🤖 General Large AI Models for Medical Bias Research

🤖 Medical Large AI Models for Medical Bias Research

📊 Datasets for Bias Detection

⚙️ Methodology

Bias Detection Techniques

Bias Mitigation Strategies

🚀 Open Problems & Opportunities

📝 Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages