Classification decision tree written from scratch (well, using pandas). Random forest written in there too
Datasets:
- breast cancer diagnosis
- titanic
- mushroom (100% accuracy, just a bit of overfitting)
- iris
Features:
- can choose either best or random split choices (for each feature: either loop through all unique values, or only some of them, sqrt to be precise)
- can visualize graph now using visualize()
to do:
optimize for big datasets / categorical data (limit classes / unique threshold splits)add more example datasets- make regression tree
make random forest- add example trees for each dataset