DataCentricAI_model_retraining

The case study: Consider you have a limited set of annotated data, using which you have built an ML model. You have done your hyperparameter tuning and the model has reached its limit, i.e., you are unable to improve your model metrics any more just by changing the algorithm or its parameters. You are now planning to add more data to the model (data augmentation), but the huge data dump you are given is not annotated. Annotation task is costly both in terms of time and money, and so you want to be sure that you only annotate the data that will improve the model. What should be your approach to choose your data?

In this solution you will learn:

Basics of data-centric AI
Error handling - how to know which data is causing problems?
Data augmentation -how to gather similar data to improve your model (using FAISS vector search)?
Active learning methodology

The detailed article is published in Medium. If you like the article please follow me on medium for more such interesting case-studies and put a star on the repository. Thank you!

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
Medium_notebook_DataCentricAI_FAISS+TFIDF_ham_spam.ipynb		Medium_notebook_DataCentricAI_FAISS+TFIDF_ham_spam.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DataCentricAI_model_retraining

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

DataCentricAI_model_retraining

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages