Skip to content

rajdeepbanerjee-git/DataCentricAI_model_retraining

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 

Repository files navigation

DataCentricAI_model_retraining

The case study: Consider you have a limited set of annotated data, using which you have built an ML model. You have done your hyperparameter tuning and the model has reached its limit, i.e., you are unable to improve your model metrics any more just by changing the algorithm or its parameters. You are now planning to add more data to the model (data augmentation), but the huge data dump you are given is not annotated. Annotation task is costly both in terms of time and money, and so you want to be sure that you only annotate the data that will improve the model. What should be your approach to choose your data?

In this solution you will learn:

  • Basics of data-centric AI
  • Error handling - how to know which data is causing problems?
  • Data augmentation -how to gather similar data to improve your model (using FAISS vector search)?
  • Active learning methodology

The detailed article is published in Medium. If you like the article please follow me on medium for more such interesting case-studies and put a star on the repository. Thank you!

About

How do you effectively choose minimum data needed to retrain your model

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors