During my Master’s in Data Science and Engineering, Andrea van der Putten and I embarked on an exciting project where we predicted the age of individuals from an audio dataset.
We evaluated three different regression models (Random Forest, Ridge, and Lasso) to determine the most effective approach, with preprocessing and feature extraction playing a crucial role. Despite challenges posed by data imbalance, particularly in predicting older individuals, our methods demonstrate significant improvements over baseline models.
I'm happy to share with you the report and the code which describe the entire process, the challenges we faced, and the solutions we implemented.