diff --git a/(Tiantian's methodology and next week plan) README.md b/(Tiantian's methodology and next week plan) README.md new file mode 100644 index 0000000..2a0b6af --- /dev/null +++ b/(Tiantian's methodology and next week plan) README.md @@ -0,0 +1,23 @@ +# Methodology +Upload your methodology description here + +How should we deal with the strongly right skewed data? +- After discussion with professor we decided to delete all the cases with grade of zero, and cases with unreasonably extreme data +- Then we will change our research aim to find what and how features are correlated with students’ grades for online courses given they finished assignments. + - We have two options of methodology. We decided to compare two methodologies. + - Method 1: More traditional regression model with cross validation. Base on our visualization, the regression would be linear. + - Assumption for linear regression: + - Linear relationship + - Multivariate normality + - No or little multicollinearity + - No auto-correlation + - Homoscedasticity + - Method 2: Classification And Regression Tree (CART) analysis + -Assumption: there's no distributional assumption for data. +- Validation + - we'll use cross validation techinique.We'll generate a model for one class. And then we'll apply it to other classes and see whether it can also fit well. + +Next week: + Clean the data, do the deletion + Study how to use R to produce Scatter Plots and Correlation Matrix + diff --git a/README.md b/README.md deleted file mode 100644 index 9153d3e..0000000 --- a/README.md +++ /dev/null @@ -1,2 +0,0 @@ -# Methodology -Upload your methodology description here