T2D_Models

A reusable machine learning pipeline that can be used for binary, multiclass or multilabel classification. Current output is trained on genetic data where the features are genomic annotations and the target variable is the presence of a particular trait (T2D, Lipids, CAD, BMI, etc).

Setup

Clone repository or download .yml file. Ensure anaconda is installed. To install an anaconda environment from a .yml through the command line:

Create environment:

conda env create -f t2d_model.yml

Verify all packages were installed with:

conda list

Activate anaconda environment:

source activate t2d_model

Once the environment is activated, you can use the jupyter notebooks to run through the workflow. Change directory into src/notebooks/ and execute jupyter notebook from the command line.

Note: If you don't want to use anaconda, you will need to ensure that all packages in requirements.txt are installed in your virtual environment or globally if you are not using a virtual environment.

Notebooks

Feature Selection: Configure filepath and then use to create and save feature-selected datasets. Currently configured to store files in an S3 bucket, change this to store to your bucket or a local drive by editing the outpath and replacing write_to_S3 functions with write_to_local.
Clustering: Exploratory clustering of data to detect patterns prior to classification
One Vs Rest Model Evaluation: Runs multiple sklearn models on the training data and compares their performance on test data. Configure filepaths for input data before running.
Multilabel Model Evaluation: Similar to One Vs Rest Notebook except for multilabel problems.
Trait vs Trait: Used to compare two distinct sets of traits (e.g, blood traits vs metabolic traits) in order to determine whether a machine learning classfier can learn to distinguish between the two sets, provided an aggregated dataset.

Classes:

Evaluator.py is a helper class for plotting and evaluating model performance. Documentation for methods is contained in the python file.

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
src		src
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
t2d_model.yml		t2d_model.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

T2D_Models

Setup

Notebooks

Classes:

About

Uh oh!

Releases

Packages

Uh oh!

Languages

kaushik316/T2D_Models

Folders and files

Latest commit

History

Repository files navigation

T2D_Models

Setup

Notebooks

Classes:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages