This project is a part of Data Science I- COSC3337-Group3 (https://github.com/thieny1991/DataScience).
The purpose of this project is to apply different classification techniques to a chalange dataset, to compare the result, to potentially enhance the accuracy of the learnt models via selecting better parameters/ preprocessing/ using kernels/ incorportating background knowledge and to summarize our findings in a report. The challenge data set that our group will work on is Contraceptive Method Choice data set (https://archive.ics.uci.edu/ml/datasets/Contraceptive+Method+Choice)
1. Data Exploration
2. Data Quality and Preprocessing
3. Neural Networks Classifier
4. SVM Classifier
5. KNN Classifier
6. Random Forest Classifier
7. Comparison
8. Conclusion
- Inferential Statistics
- Machine Learning
- Data Visualization
- Neural Networks
- Grid Search technique for hyperparameter tuning
- Support Vector Machines
- KNN
- Random Forest
- Python
- Pandas, jupyter,sklearn
- Colab
- Git
This dataset is a subset of the 1987 National Indonesia Contraceptive
Prevalence Survey. The samples are married women who were either not
pregnant or do not know if they were at the time of interview. The
problem is to predict the current contraceptive method choice
(no use, long-term methods, or short-term methods) of a woman based
on her demographic and socio-economic characteristics.
Based on the given data set, our project will go through all neccessary
steps to analize the 4 listed classification methods and compare their results
in order to come up with the best fit classification.
- data exploration
- data processing/cleaning
- classification
- write up/reporting
- presentation
- Clone this repo (https://github.com/thieny1991/DataScience).
- Raw Data is cmc.da within this repo. Data Descritpion is cmc.names
- Data processing/transformation scripts are being kept [here](Repo folder containing data processing scripts/notebooks)
- Follow setup [instructions](Link to file)
| Name | Slack Handle |
|---|---|
| Y Nguyen | https://github.com/thieny1991 |
| Syed Alam | https://github.com/mubashiralam |
| GiaiTran | https://github.com/GiaiTran |
| Thuy Nguyen | https://github.com/milasido |
| Name | Slack Handle |
|---|---|
| Y Nguyen | data quality, SVM and report,tesing program,data visualization |
| Syed Alam | Random Forest and report, PPT design, testing program |
| GiaiTran | Team lead, research methods,PPT, preprocessing data, Neural Network and report |
| Thuy Nguyen | KNN and report, report design, data visualization, PPT |