Comparative clustering analysis techniques and factor analysis.
The dataset is taken from the EMI Music Data Science Hackathon Kaggle competition, see https://www.kaggle.com/c/MusicHackathon/data. The file users.csv contains the answers of almost 50000 people called users to 19 questions about their musical attitudes (variables Q1–Q19). Each answer is a 0–100 rating of how much the user agrees with the given statement. The questions (statements) are in the file UserKey.csv. The dataset also contains basic personal information about the users.
The task is to determine whether these 19 questions have any underlying structure. That is: are there any groups of questions that many users give similar answers to?