Skip to content

Peer Review - Midterm Report #77

@posadaj

Description

@posadaj

The project has a lot of potential, as it seems to one of the more rich datasets and a very interesting topic. The preliminary analysis of the dataset also shows that the dataset is relatively easy to work with, sample points are treated as all or nothing with complete or incomplete features and no corruption of the data. It also looks like decent progress has been made on the project with the forest classifier having a 12% error.

Where I think the report is lacking is in the explanation of decisions made for the project. For example it's unclear why the two models of One-vs-Rest and Forest Classifier were used. I don't believe they were covered in the course and even so, there should be some explanation as to why the model would work for this problem. Another example is the following: "we used a total of 158 features after preprocessing, which required dropping feature columns which represented one level of a particular categorical variable". Why did you decide to drop some columns of a specific category? And where do the 158 features come from? In each survey there are 35 questions (with multiple parts) so do they amount to 158? If not, mention which features are dropped and why.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions