The case study: Consider you have a limited set of annotated data, using which you have built an ML model. You have done your hyperparameter tuning and the model has reached its limit, i.e., you are unable to improve your model metrics any more just by changing the algorithm or its parameters. You are now planning to add more data to the model (data augmentation), but the huge data dump you are given is not annotated. Annotation task is costly both in terms of time and money, and so you want to be sure that you only annotate the data that will improve the model. What should be your approach to choose your data?
In this solution you will learn:
- Basics of data-centric AI
- Error handling - how to know which data is causing problems?
- Data augmentation -how to gather similar data to improve your model (using FAISS vector search)?
- Active learning methodology
The detailed article is published in Medium. If you like the article please follow me on medium for more such interesting case-studies and put a star on the repository. Thank you!