This project is an advanced symptom-to-diagnosis intelligence system engineered to deliver:
- Real-time prediction of the most probable disease condition(s)
- Intelligent specialist routing to the appropriate medical practitioner
The system accepts natural language symptom descriptions, maps them into structured binary features, and uses pre-trained XGBoost classification models to generate clinical predictions.
This project is designed for educational research, prototyping, and demonstration of ML web integration.
This system is built on a structured symptom dataset with binary feature encoding and balanced disease classes. It uses two independent XGBoost models: one for disease classification and one for specialist recommendation. XGBoost is chosen for its strong performance on tabular data, ability to capture non-linear symptom patterns, and built-in regularization for generalization on synthetic datasets.
At runtime, the pipeline ingests free-text symptoms, maps phrases to known symptom features, generates a binary feature vector, scores disease probabilities, and outputs the top result(s) with a recommended specialist.
The system is trained on a structured dataset representing common health conditions.
- Each row represents one patient
- Symptoms are encoded as binary values (0 = absent, 1 = present)
- Dataset is balanced across disease classes
- Symptom names use layman terminology
indian_symptom_dataset_layman_50plus_with_doctor.csv
| fever | cough | stomach_pain | tiredness | ... | disease | doctor_type |
|---|---|---|---|---|---|---|
| 1 | 1 | 0 | 1 | ... | pneumonia | pulmonologist |
-
Input Features: Binary symptom vector
-
Output Labels:
diseasedoctor_type
project-root/
│
├── indian_symptom_dataset_layman_50plus_with_doctor.csv
│
├── disease_model.pkl
├── doctor_model.pkl
├── disease_encoder.pkl
├── doctor_encoder.pkl
├── symptom_columns.pkl
│
├── xgbtestonterminal.py
├── xgbtest.py
│
└── README.md
- CLI based testing interface
- Accepts symptom input via terminal
- Runs predictions locally
- Displays ranked disease results and recommended doctor
- Used for quick model validation
- Flask API implementation
- Exposes
/predictendpoint - Accepts JSON symptom input
- Returns structured JSON response
- Designed for integration with PHP or web applications
disease_model.pkl- Trained XGBoost disease classifierdoctor_model.pkl- Trained XGBoost specialist classifierdisease_encoder.pkl- Label encoder for disease namesdoctor_encoder.pkl- Label encoder for specialist namessymptom_columns.pkl- Feature ordering metadata
Two independent models are used:
- Multi-class classification
- Input: Binary symptom vector
- Output: Probability distribution across disease classes
- Objective: Multi-class log loss
- Multi-class classification
- Input: Same symptom vector
- Output: Recommended medical specialist
- Operates independently from disease prediction
- User enters symptom description
- Text is mapped to structured binary features
- Feature vector passed to disease model
- Probabilities computed across all diseases
- Top results extracted (or highest probability)
- Specialist model predicts doctor type
- Results returned to CLI or API response
Python 3.9+
Install dependencies using the requirements file:
pip install -r requirements.txt
python xgbtestonterminal.py
Used for:
- Model validation
- Local testing
- Demonstration without web integration
python xgbtest.py
API Endpoint:
POST http://127.0.0.1:5000/predict
Example Request:
{
"symptoms": "fever cough breathing problem"
}Example Response:
{
"disease": "pneumonia",
"confidence": 72.45,
"recommended_doctor": "pulmonologist"
}This API can be integrated with:
- PHP backend
- Web frontend
- Postman testing
- Other REST clients
User -> PHP Website -> Flask API -> ML Models -> Flask -> PHP -> User
All components can run locally on the same machine.
- Not a medical diagnostic system
- No laboratory or imaging data included
- Based solely on symptom pattern recognition
- Dataset is synthetic and intended for educational use
- Academic projects
- ML web integration demonstrations
- Health AI prototyping
- Model deployment practice