From 119ec328b3e43c24d2fbf3fde5a3a1debcab3dbf Mon Sep 17 00:00:00 2001 From: tiantianjin Date: Thu, 27 Oct 2016 10:38:56 -0400 Subject: [PATCH 01/17] Update README.md --- README.md | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/README.md b/README.md index 9153d3e..005446f 100644 --- a/README.md +++ b/README.md @@ -1,2 +1,13 @@ # Methodology Upload your methodology description here + +How should we deal with the strongly right skewed data? + After discussion with professor we decided to delete all the cases with grade of zero, and cases with unreasonably extreme data + Then we will change our research aim to find what and how features are correlated with students’ grades for online courses given they finished assignments. + We have two options of methodology. We decided to compare two methodologies. + Method 1: More traditional regression model with cross validation + Method 2: Classification And Regression Tree (CART) analysis or Bayesian +Next week: +Clean the data, do the deletion +Study how to use R to produce Scatter Plots and Correlation Matrix + From eb62cf70b2c44be8a3cda63f54de31c29196c05f Mon Sep 17 00:00:00 2001 From: tiantianjin Date: Thu, 27 Oct 2016 10:40:15 -0400 Subject: [PATCH 02/17] Tiantian's methodology and next week plan --- ...d => (Tiantian's methodology and next week plan) README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) rename README.md => (Tiantian's methodology and next week plan) README.md (86%) diff --git a/README.md b/(Tiantian's methodology and next week plan) README.md similarity index 86% rename from README.md rename to (Tiantian's methodology and next week plan) README.md index 005446f..352cffb 100644 --- a/README.md +++ b/(Tiantian's methodology and next week plan) README.md @@ -8,6 +8,6 @@ How should we deal with the strongly right skewed data? Method 1: More traditional regression model with cross validation Method 2: Classification And Regression Tree (CART) analysis or Bayesian Next week: -Clean the data, do the deletion -Study how to use R to produce Scatter Plots and Correlation Matrix + Clean the data, do the deletion + Study how to use R to produce Scatter Plots and Correlation Matrix From e7ee21260e7db9d86d448e3e20a8a1cca7fa386f Mon Sep 17 00:00:00 2001 From: tiantianjin Date: Tue, 1 Nov 2016 10:26:35 -0400 Subject: [PATCH 03/17] Update (Tiantian's methodology and next week plan) README.md --- ... methodology and next week plan) README.md | 20 ++++++++++++++----- 1 file changed, 15 insertions(+), 5 deletions(-) diff --git a/(Tiantian's methodology and next week plan) README.md b/(Tiantian's methodology and next week plan) README.md index 352cffb..fafd2bd 100644 --- a/(Tiantian's methodology and next week plan) README.md +++ b/(Tiantian's methodology and next week plan) README.md @@ -2,11 +2,21 @@ Upload your methodology description here How should we deal with the strongly right skewed data? - After discussion with professor we decided to delete all the cases with grade of zero, and cases with unreasonably extreme data - Then we will change our research aim to find what and how features are correlated with students’ grades for online courses given they finished assignments. - We have two options of methodology. We decided to compare two methodologies. - Method 1: More traditional regression model with cross validation - Method 2: Classification And Regression Tree (CART) analysis or Bayesian +-After discussion with professor we decided to delete all the cases with grade of zero, and cases with unreasonably extreme data +-Then we will change our research aim to find what and how features are correlated with students’ grades for online courses given they finished assignments. +-We have two options of methodology. We decided to compare two methodologies. + Method 1: More traditional regression model with cross validation. Base on our visualization, the regression would be linear. + -Assumption for linear regression: + -Linear relationship + -Multivariate normality + -No or little multicollinearity + -No auto-correlation + -Homoscedasticity + Method 2: Classification And Regression Tree (CART) analysis + -Assumption: there's no distributional assumption for data. +-Validation + -we'll use cross validation techinique.We'll generate a model for one class. And then we'll apply it to other classes and see whether it can also fit well. + Next week: Clean the data, do the deletion Study how to use R to produce Scatter Plots and Correlation Matrix From ba2b9598c1efa822d1e68b74a849db9961f1817e Mon Sep 17 00:00:00 2001 From: tiantianjin Date: Tue, 1 Nov 2016 10:27:35 -0400 Subject: [PATCH 04/17] Update (Tiantian's methodology and next week plan) README.md --- ... methodology and next week plan) README.md | 20 +++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/(Tiantian's methodology and next week plan) README.md b/(Tiantian's methodology and next week plan) README.md index fafd2bd..3207f8b 100644 --- a/(Tiantian's methodology and next week plan) README.md +++ b/(Tiantian's methodology and next week plan) README.md @@ -2,16 +2,16 @@ Upload your methodology description here How should we deal with the strongly right skewed data? --After discussion with professor we decided to delete all the cases with grade of zero, and cases with unreasonably extreme data --Then we will change our research aim to find what and how features are correlated with students’ grades for online courses given they finished assignments. --We have two options of methodology. We decided to compare two methodologies. - Method 1: More traditional regression model with cross validation. Base on our visualization, the regression would be linear. - -Assumption for linear regression: - -Linear relationship - -Multivariate normality - -No or little multicollinearity - -No auto-correlation - -Homoscedasticity + -After discussion with professor we decided to delete all the cases with grade of zero, and cases with unreasonably extreme data + -Then we will change our research aim to find what and how features are correlated with students’ grades for online courses given they finished assignments. + -We have two options of methodology. We decided to compare two methodologies. + Method 1: More traditional regression model with cross validation. Base on our visualization, the regression would be linear. + -Assumption for linear regression: + -Linear relationship + -Multivariate normality + -No or little multicollinearity + -No auto-correlation + -Homoscedasticity Method 2: Classification And Regression Tree (CART) analysis -Assumption: there's no distributional assumption for data. -Validation From 4fba4077de68c5276ad45d225b0b7379fe84db9f Mon Sep 17 00:00:00 2001 From: tiantianjin Date: Tue, 1 Nov 2016 10:28:00 -0400 Subject: [PATCH 05/17] Update (Tiantian's methodology and next week plan) README.md --- (Tiantian's methodology and next week plan) README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/(Tiantian's methodology and next week plan) README.md b/(Tiantian's methodology and next week plan) README.md index 3207f8b..2bbcfca 100644 --- a/(Tiantian's methodology and next week plan) README.md +++ b/(Tiantian's methodology and next week plan) README.md @@ -2,8 +2,8 @@ Upload your methodology description here How should we deal with the strongly right skewed data? - -After discussion with professor we decided to delete all the cases with grade of zero, and cases with unreasonably extreme data - -Then we will change our research aim to find what and how features are correlated with students’ grades for online courses given they finished assignments. + - After discussion with professor we decided to delete all the cases with grade of zero, and cases with unreasonably extreme data + - Then we will change our research aim to find what and how features are correlated with students’ grades for online courses given they finished assignments. -We have two options of methodology. We decided to compare two methodologies. Method 1: More traditional regression model with cross validation. Base on our visualization, the regression would be linear. -Assumption for linear regression: From e66c9ad23419ff139da586bf8d39cafce35fe056 Mon Sep 17 00:00:00 2001 From: tiantianjin Date: Tue, 1 Nov 2016 10:28:36 -0400 Subject: [PATCH 06/17] Update (Tiantian's methodology and next week plan) README.md --- (Tiantian's methodology and next week plan) README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/(Tiantian's methodology and next week plan) README.md b/(Tiantian's methodology and next week plan) README.md index 2bbcfca..2466ba8 100644 --- a/(Tiantian's methodology and next week plan) README.md +++ b/(Tiantian's methodology and next week plan) README.md @@ -2,8 +2,8 @@ Upload your methodology description here How should we deal with the strongly right skewed data? - - After discussion with professor we decided to delete all the cases with grade of zero, and cases with unreasonably extreme data - - Then we will change our research aim to find what and how features are correlated with students’ grades for online courses given they finished assignments. +- After discussion with professor we decided to delete all the cases with grade of zero, and cases with unreasonably extreme data +- Then we will change our research aim to find what and how features are correlated with students’ grades for online courses given they finished assignments. -We have two options of methodology. We decided to compare two methodologies. Method 1: More traditional regression model with cross validation. Base on our visualization, the regression would be linear. -Assumption for linear regression: From 98ab56f4d0e6c9d2d53e778798283a7b51b4d5c1 Mon Sep 17 00:00:00 2001 From: tiantianjin Date: Tue, 1 Nov 2016 10:28:51 -0400 Subject: [PATCH 07/17] Update (Tiantian's methodology and next week plan) README.md --- (Tiantian's methodology and next week plan) README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/(Tiantian's methodology and next week plan) README.md b/(Tiantian's methodology and next week plan) README.md index 2466ba8..796b86f 100644 --- a/(Tiantian's methodology and next week plan) README.md +++ b/(Tiantian's methodology and next week plan) README.md @@ -4,7 +4,7 @@ Upload your methodology description here How should we deal with the strongly right skewed data? - After discussion with professor we decided to delete all the cases with grade of zero, and cases with unreasonably extreme data - Then we will change our research aim to find what and how features are correlated with students’ grades for online courses given they finished assignments. - -We have two options of methodology. We decided to compare two methodologies. + -We have two options of methodology. We decided to compare two methodologies. Method 1: More traditional regression model with cross validation. Base on our visualization, the regression would be linear. -Assumption for linear regression: -Linear relationship From 4d91e28f9d1e759ddfb5ee886f4bc46f3be08c87 Mon Sep 17 00:00:00 2001 From: tiantianjin Date: Tue, 1 Nov 2016 10:29:28 -0400 Subject: [PATCH 08/17] Update (Tiantian's methodology and next week plan) README.md --- (Tiantian's methodology and next week plan) README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/(Tiantian's methodology and next week plan) README.md b/(Tiantian's methodology and next week plan) README.md index 796b86f..7bd104c 100644 --- a/(Tiantian's methodology and next week plan) README.md +++ b/(Tiantian's methodology and next week plan) README.md @@ -4,9 +4,9 @@ Upload your methodology description here How should we deal with the strongly right skewed data? - After discussion with professor we decided to delete all the cases with grade of zero, and cases with unreasonably extreme data - Then we will change our research aim to find what and how features are correlated with students’ grades for online courses given they finished assignments. - -We have two options of methodology. We decided to compare two methodologies. + - We have two options of methodology. We decided to compare two methodologies. Method 1: More traditional regression model with cross validation. Base on our visualization, the regression would be linear. - -Assumption for linear regression: + - Assumption for linear regression: -Linear relationship -Multivariate normality -No or little multicollinearity From 302bfe181b0a8123bb1ac7db31c0efbfc5bd6801 Mon Sep 17 00:00:00 2001 From: tiantianjin Date: Tue, 1 Nov 2016 10:31:02 -0400 Subject: [PATCH 09/17] Update (Tiantian's methodology and next week plan) README.md --- ... methodology and next week plan) README.md | 20 +++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/(Tiantian's methodology and next week plan) README.md b/(Tiantian's methodology and next week plan) README.md index 7bd104c..7c986d4 100644 --- a/(Tiantian's methodology and next week plan) README.md +++ b/(Tiantian's methodology and next week plan) README.md @@ -5,17 +5,17 @@ How should we deal with the strongly right skewed data? - After discussion with professor we decided to delete all the cases with grade of zero, and cases with unreasonably extreme data - Then we will change our research aim to find what and how features are correlated with students’ grades for online courses given they finished assignments. - We have two options of methodology. We decided to compare two methodologies. - Method 1: More traditional regression model with cross validation. Base on our visualization, the regression would be linear. - - Assumption for linear regression: - -Linear relationship - -Multivariate normality - -No or little multicollinearity - -No auto-correlation - -Homoscedasticity - Method 2: Classification And Regression Tree (CART) analysis + - Method 1: More traditional regression model with cross validation. Base on our visualization, the regression would be linear. + - Assumption for linear regression: + - Linear relationship + - Multivariate normality + - No or little multicollinearity + - No auto-correlation + - Homoscedasticity + - Method 2: Classification And Regression Tree (CART) analysis -Assumption: there's no distributional assumption for data. --Validation - -we'll use cross validation techinique.We'll generate a model for one class. And then we'll apply it to other classes and see whether it can also fit well. +- Validation + - we'll use cross validation techinique.We'll generate a model for one class. And then we'll apply it to other classes and see whether it can also fit well. Next week: Clean the data, do the deletion From 7300e317d80c0a151876dfcb5d244c48d5d98b2a Mon Sep 17 00:00:00 2001 From: tiantianjin Date: Tue, 1 Nov 2016 10:31:58 -0400 Subject: [PATCH 10/17] Update (Tiantian's methodology and next week plan) README.md --- (Tiantian's methodology and next week plan) README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/(Tiantian's methodology and next week plan) README.md b/(Tiantian's methodology and next week plan) README.md index 7c986d4..09b4c17 100644 --- a/(Tiantian's methodology and next week plan) README.md +++ b/(Tiantian's methodology and next week plan) README.md @@ -12,7 +12,7 @@ How should we deal with the strongly right skewed data? - No or little multicollinearity - No auto-correlation - Homoscedasticity - - Method 2: Classification And Regression Tree (CART) analysis + - Method 2: Classification And Regression Tree (CART) analysis -Assumption: there's no distributional assumption for data. - Validation - we'll use cross validation techinique.We'll generate a model for one class. And then we'll apply it to other classes and see whether it can also fit well. From e6c4474b62730e0ccef61226b77f4e154058de2a Mon Sep 17 00:00:00 2001 From: tiantianjin Date: Tue, 1 Nov 2016 10:32:30 -0400 Subject: [PATCH 11/17] Update (Tiantian's methodology and next week plan) README.md --- (Tiantian's methodology and next week plan) README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/(Tiantian's methodology and next week plan) README.md b/(Tiantian's methodology and next week plan) README.md index 09b4c17..210575e 100644 --- a/(Tiantian's methodology and next week plan) README.md +++ b/(Tiantian's methodology and next week plan) README.md @@ -5,7 +5,7 @@ How should we deal with the strongly right skewed data? - After discussion with professor we decided to delete all the cases with grade of zero, and cases with unreasonably extreme data - Then we will change our research aim to find what and how features are correlated with students’ grades for online courses given they finished assignments. - We have two options of methodology. We decided to compare two methodologies. - - Method 1: More traditional regression model with cross validation. Base on our visualization, the regression would be linear. + - Method 1: More traditional regression model with cross validation. Base on our visualization, the regression would be linear. - Assumption for linear regression: - Linear relationship - Multivariate normality From 21d16f85204769a170d1c962fa0b43d326e2e776 Mon Sep 17 00:00:00 2001 From: tiantianjin Date: Tue, 1 Nov 2016 10:32:51 -0400 Subject: [PATCH 12/17] Update (Tiantian's methodology and next week plan) README.md --- (Tiantian's methodology and next week plan) README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/(Tiantian's methodology and next week plan) README.md b/(Tiantian's methodology and next week plan) README.md index 210575e..aa6d7aa 100644 --- a/(Tiantian's methodology and next week plan) README.md +++ b/(Tiantian's methodology and next week plan) README.md @@ -12,7 +12,7 @@ How should we deal with the strongly right skewed data? - No or little multicollinearity - No auto-correlation - Homoscedasticity - - Method 2: Classification And Regression Tree (CART) analysis + - Method 2: Classification And Regression Tree (CART) analysis -Assumption: there's no distributional assumption for data. - Validation - we'll use cross validation techinique.We'll generate a model for one class. And then we'll apply it to other classes and see whether it can also fit well. From 63d84c8540ca2e8f60ebcc7dc3927b87e3ffd26e Mon Sep 17 00:00:00 2001 From: tiantianjin Date: Tue, 1 Nov 2016 10:33:31 -0400 Subject: [PATCH 13/17] Update (Tiantian's methodology and next week plan) README.md --- (Tiantian's methodology and next week plan) README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/(Tiantian's methodology and next week plan) README.md b/(Tiantian's methodology and next week plan) README.md index aa6d7aa..e486b0b 100644 --- a/(Tiantian's methodology and next week plan) README.md +++ b/(Tiantian's methodology and next week plan) README.md @@ -5,7 +5,7 @@ How should we deal with the strongly right skewed data? - After discussion with professor we decided to delete all the cases with grade of zero, and cases with unreasonably extreme data - Then we will change our research aim to find what and how features are correlated with students’ grades for online courses given they finished assignments. - We have two options of methodology. We decided to compare two methodologies. - - Method 1: More traditional regression model with cross validation. Base on our visualization, the regression would be linear. + - Method 1: More traditional regression model with cross validation. Base on our visualization, the regression would be linear. - Assumption for linear regression: - Linear relationship - Multivariate normality From 962a865e67e72a96df54e95aa66caa4c9841d21e Mon Sep 17 00:00:00 2001 From: tiantianjin Date: Tue, 1 Nov 2016 10:34:04 -0400 Subject: [PATCH 14/17] Update (Tiantian's methodology and next week plan) README.md --- (Tiantian's methodology and next week plan) README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/(Tiantian's methodology and next week plan) README.md b/(Tiantian's methodology and next week plan) README.md index e486b0b..ea77590 100644 --- a/(Tiantian's methodology and next week plan) README.md +++ b/(Tiantian's methodology and next week plan) README.md @@ -5,8 +5,8 @@ How should we deal with the strongly right skewed data? - After discussion with professor we decided to delete all the cases with grade of zero, and cases with unreasonably extreme data - Then we will change our research aim to find what and how features are correlated with students’ grades for online courses given they finished assignments. - We have two options of methodology. We decided to compare two methodologies. - - Method 1: More traditional regression model with cross validation. Base on our visualization, the regression would be linear. - - Assumption for linear regression: + - Method 1: More traditional regression model with cross validation. Base on our visualization, the regression would be linear. + - Assumption for linear regression: - Linear relationship - Multivariate normality - No or little multicollinearity From 19ccf925645fa25bdbacd42c2c1695ca4bbf795e Mon Sep 17 00:00:00 2001 From: tiantianjin Date: Tue, 1 Nov 2016 10:34:35 -0400 Subject: [PATCH 15/17] Update (Tiantian's methodology and next week plan) README.md --- (Tiantian's methodology and next week plan) README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/(Tiantian's methodology and next week plan) README.md b/(Tiantian's methodology and next week plan) README.md index ea77590..80bd8f7 100644 --- a/(Tiantian's methodology and next week plan) README.md +++ b/(Tiantian's methodology and next week plan) README.md @@ -7,7 +7,7 @@ How should we deal with the strongly right skewed data? - We have two options of methodology. We decided to compare two methodologies. - Method 1: More traditional regression model with cross validation. Base on our visualization, the regression would be linear. - Assumption for linear regression: - - Linear relationship + - Linear relationship - Multivariate normality - No or little multicollinearity - No auto-correlation From dbb9562ec3822e6b6f4b94c273c7eb2ec1d87ef1 Mon Sep 17 00:00:00 2001 From: tiantianjin Date: Tue, 1 Nov 2016 10:35:02 -0400 Subject: [PATCH 16/17] Update (Tiantian's methodology and next week plan) README.md --- (Tiantian's methodology and next week plan) README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/(Tiantian's methodology and next week plan) README.md b/(Tiantian's methodology and next week plan) README.md index 80bd8f7..1e03135 100644 --- a/(Tiantian's methodology and next week plan) README.md +++ b/(Tiantian's methodology and next week plan) README.md @@ -6,8 +6,8 @@ How should we deal with the strongly right skewed data? - Then we will change our research aim to find what and how features are correlated with students’ grades for online courses given they finished assignments. - We have two options of methodology. We decided to compare two methodologies. - Method 1: More traditional regression model with cross validation. Base on our visualization, the regression would be linear. - - Assumption for linear regression: - - Linear relationship + - Assumption for linear regression: + - Linear relationship - Multivariate normality - No or little multicollinearity - No auto-correlation From fa65a8c288ea32c416feff04e5df069879689795 Mon Sep 17 00:00:00 2001 From: tiantianjin Date: Tue, 1 Nov 2016 10:35:39 -0400 Subject: [PATCH 17/17] Update (Tiantian's methodology and next week plan) README.md --- (Tiantian's methodology and next week plan) README.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/(Tiantian's methodology and next week plan) README.md b/(Tiantian's methodology and next week plan) README.md index 1e03135..2a0b6af 100644 --- a/(Tiantian's methodology and next week plan) README.md +++ b/(Tiantian's methodology and next week plan) README.md @@ -8,10 +8,10 @@ How should we deal with the strongly right skewed data? - Method 1: More traditional regression model with cross validation. Base on our visualization, the regression would be linear. - Assumption for linear regression: - Linear relationship - - Multivariate normality - - No or little multicollinearity - - No auto-correlation - - Homoscedasticity + - Multivariate normality + - No or little multicollinearity + - No auto-correlation + - Homoscedasticity - Method 2: Classification And Regression Tree (CART) analysis -Assumption: there's no distributional assumption for data. - Validation