100DaysOfMLCode

The creation of this repository was inspired by Siraj Raval's challenge to code machine learning for at least an hour everyday for 100 days.

I nervously accepted this challenge in addition to working full time and taking 6 hours of graduate courseowrk in the 2018 summer semester. I will use this repository to store code, jupyter notebook examples, and thought processes.

Topics Explored:

Day 1 - July 7 | Principal Component Analysis (PCA) and explained variance ratio

Day 2 - July 8 | SparsePCA -> CODE

Day 3 - July 9 | Bag of Words

Day 4 - July 10 | Tokenization & Vectorization time trials -> CODE

Day 5 - July 11 | Stemming and Lemmatizing with CountVectorizer, TfidfVectorizer, and HashingVectorizer -> CODE

Day 6 - July 12 | Development of visualization pipeline for ML -> CODE

Day 7 - July 13 | Big Data Visualization with Datashader

Day 8 - July 14 | t-SNE and Datashader Failure -> CODE

Day 9 - July 15 | Gene Expression - Getting Started -> FOLDER

Day 10 - July 16 | Gene Expression - Reading in Data

Day 11 - July 17 | Gene Expression - Preprocessing & Boxplot

Day 12 - July 18 | Intro to Data Splitting -> CODE

Day 13 - July 19 | Text Relationships with spaCy -> CODE

Day 14 - July 20 | Gene Expression - Cytoscape and Orange3

Day 15 - July 21 | Trial-and-error Data Splitting Research

Day 16 - July 22 | Trial-and-error Data Splitting Implimentation -> CODE

Day 17 - July 23 | NMF -> CODE

Day 18 - July 24 | RFE -> CODE

Day 19 - July 25 | Exploring Variable Replacement

Day 20 - July 26 | Pipelines - Introduction

Day 21 - July 27 | A list of 10,000 dictionaries -> CODE

Day 22 - July 28 | Linear Regression - Simple in R -> Folder

Day 23 - July 29 | Data Visualization, Dimensionality Reduction, Feature Selection, and a hand full of models. -> CODE

Day 24 - July 30 | Linear Regression - Continue to draft description -> Folder

Day 25 - July 31 | Linear Regression - Simple in Python -> CODE

Day 26 - Aug 1 | Start of Pipeline Example -> CODE

Day 27 - Aug 2 | Ridge Regression for Pipeline Example -> CODE

Day 28 - Aug 3 | Select missing value column(s) with Pipeline -> CODE

Day 29 - Aug 4 |

List of Topics to Explore:

PCA on Genetic Data - Gene Expression
- Create Jupyter Notebook foundation
- Find Good Data
- Explain how to differentiate good data from bad data
GPU
Efficient Use of Data Structures
Write computationally expensive parts in C++
Make good use of memory & caching
Multireading / multiprocessing in Python, Celery for parallel processing
Kernal PCA
Differences (pro/cons) between Stemming and Lemmatizing methods
PCA to display failure risk
- Lots / batches that take too long
- Determine coorinary value
- adjust threshold & critical thresholds
Producing Production Quality code
How tokenized data is used for ML algorithms
Use of predeveloped vocabularies
- i2b2
Hypertools
- Visualizing high dimensional data: https://hypertools.readthedocs.io/en/latest/
MongoDB with Neo4j and Orient
AutoML

Name		Name	Last commit message	Last commit date
Latest commit History 180 Commits
Bag of Words		Bag of Words
Dimensionality_Reduction		Dimensionality_Reduction
Genetic_Expression		Genetic_Expression
ML_Algorithms		ML_Algorithms
Splitting Data		Splitting Data
Visualization		Visualization
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

100DaysOfMLCode

Topics Explored:

List of Topics to Explore:

About

Uh oh!

Releases

Packages

Languages

anthonyduer/100DaysOfMLCode

Folders and files

Latest commit

History

Repository files navigation

100DaysOfMLCode

Topics Explored:

List of Topics to Explore:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages