Skip to content

thieny1991/DataScience

Repository files navigation

CMC-Classification

This project is a part of Data Science I- COSC3337-Group3 (https://github.com/thieny1991/DataScience).

-- Project Status: Active

Project Intro/Objective

The purpose of this project is to apply different classification techniques to a chalange dataset, to compare the result, to potentially enhance the accuracy of the learnt models via selecting better parameters/ preprocessing/ using kernels/ incorportating background knowledge and to summarize our findings in a report. The challenge data set that our group will work on is Contraceptive Method Choice data set (https://archive.ics.uci.edu/ml/datasets/Contraceptive+Method+Choice)

Steps:

1. Data Exploration
2. Data Quality and Preprocessing
3. Neural Networks Classifier
4. SVM Classifier
5. KNN Classifier
6. Random Forest Classifier
7. Comparison
8. Conclusion

Methods Used

  • Inferential Statistics
  • Machine Learning
  • Data Visualization
  • Neural Networks
  • Grid Search technique for hyperparameter tuning
  • Support Vector Machines
  • KNN
  • Random Forest

Technologies

  • Python
  • Pandas, jupyter,sklearn
  • Colab
  • Git

Project Description

This dataset is a subset of the 1987 National Indonesia Contraceptive
Prevalence Survey. The samples are married women who were either not 
pregnant or do not know if they were at the time of interview. The 
problem is to predict the current contraceptive method choice 
(no use, long-term methods, or short-term methods) of a woman based 
on her demographic and socio-economic characteristics.
Based on the given data set, our project will go through all neccessary 
steps to analize the 4 listed classification methods and compare their results 
in order to come up with the best fit classification.   

Needs of this project

  • data exploration
  • data processing/cleaning
  • classification
  • write up/reporting
  • presentation

Getting Started

  1. Clone this repo (https://github.com/thieny1991/DataScience).
  2. Raw Data is cmc.da within this repo. Data Descritpion is cmc.names
  3. Data processing/transformation scripts are being kept [here](Repo folder containing data processing scripts/notebooks)
  4. Follow setup [instructions](Link to file)

Contributing DS Members

Name Slack Handle
Y Nguyen https://github.com/thieny1991
Syed Alam https://github.com/mubashiralam
GiaiTran https://github.com/GiaiTran
Thuy Nguyen https://github.com/milasido

Contribution detail:

Name Slack Handle
Y Nguyen data quality, SVM and report,tesing program,data visualization
Syed Alam Random Forest and report, PPT design, testing program
GiaiTran Team lead, research methods,PPT, preprocessing data, Neural Network and report
Thuy Nguyen KNN and report, report design, data visualization, PPT

Contact

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages