Skip to content

Ziyi Chen's review on final report. #79

@changy12

Description

@changy12

About introduction:
It is an interesting and meaningful research question. I appreciate that you identified the relevant areas like voice recognition, which exceeds the range this course. I appreciate your courage to take the challenge.

About Dataset Description:
I appreciate your effort in merging different datasets, your insight in and explanation of many 0’s.

About Problem Description:
The problem is clear and to the point.
You could notice that confidence of an algorithm measures the variation rather than accuracy of prediction. Accuracy is more important and with high accuracy, we can compare these algorithms based on confidence.

About Exploratory Data Analysis:
The procedure is very clear, especially when you gave an example.
The classifiers with only one feature is interesting and useful for preliminary exploration.
You said “This average is considered as the segregation point between each of the genders in the test set.” The segregation point should be computed on training dataset and then its performance is tested on test dataset.
You may also consider other segregation methods like the mean of the means.
You’d better give reference for pocket learning.
Logistic regression is in fact for classification problem, i. e., the outcome variable is categorical. In binary classification, translate the outcome variable into 1 or -1 rather than log-transform.

About Model Analysis:
The success rates are really high. However, if there is significant imbalance between the 2 classes, success rate can be misleading. For example, if there are 98% males (females) in the test dataset, then the success rate can be 98% even if the classifier predicts all voices to be male (female). In this case, you could try some other measures like F1 score.
You said PCA improved the success rate of SVM, did the improved success rate exceed 0.981? You could list these success rates with PCA as well. From the information you provided, I cannot infer PCA is not helpful.

In general:
This paper proposed an interesting and meaningful research question, conducted abundant preliminary analysis and tried a large number of classifiers. The writing is clear and straightforward.
You could have better understanding of some topics in classification, such as logistic regression and unbalanced classes.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions