Conversation
sid-chava
left a comment
There was a problem hiding this comment.
This is definitely a really good start. The approach to optimizing features is interesting and shows promise. However, we have concerns about potential overfitting due to the oversampling technique used. Is there anyway you would be able to isolate the improvements without oversampling? I understand that the accuracy jump may not be as high but we are interested to see if the feature tweaks you made were able to generate improvements.
|
yes i double checked , by repeating some code i caused some data leakage which caused the model to test on data it already have seen, now that i changed the approach, i will make a PR with another notebook |
|
now that i added a new file, i read from a comment that the rows with label 1 are generated with a simulator, so i dropped the rows of it and continued working with the other 3 remaining targets, i added more pre-processing functions and different modeling, so far for this the model has on average 72% accuracy i also tried different approach of one vs all approach , when i try to predict if this sequence is generated by a specific class or not (binary classification) i had average results of 77% accuracy when predicting 4 vs all so I'm currently working on lowering the rate of misclassification in the model any review would be appreciated |
#3 provides