From bf0ee2ef9b1d785f64b34c6673c1d16a1b7c19c7 Mon Sep 17 00:00:00 2001 From: Kayla Date: Mon, 5 Feb 2024 14:30:08 -0500 Subject: [PATCH 01/17] update --- GitHub.qmd | 169 +++++++++++++++++++++++++++++++++++++++ index.html | 209 +++++++++---------------------------------------- index.qmd | 144 ++++++++++++++++++++++++---------- references.bib | 86 ++++++++++++-------- 4 files changed, 361 insertions(+), 247 deletions(-) create mode 100644 GitHub.qmd diff --git a/GitHub.qmd b/GitHub.qmd new file mode 100644 index 000000000..cdecc319c --- /dev/null +++ b/GitHub.qmd @@ -0,0 +1,169 @@ +--- +title: "Bayesian Linear Regression" +author: "Kayla Mota" +date: '`r Sys.Date()`' +format: + html: + code-fold: true +course: STA 6257 - Advanced Statistical Modeling +bibliography: references.bib # file contains bibtex for references +#always_allow_html: true # this allows to get PDF with HTML features +self-contained: true +execute: + warning: false + message: false +editor: + markdown: + wrap: 72 +--- + +## Summary of Articles + +#### An introduction to using Bayesian linear regression with clinical data + +Within the realm of psychology, the tests utilized in research can be +deceptive. This article explains how opting for the Bayesian method +could yield more precise outcomes compared to frequentist methods. +Throughout the article they are using the data from an +electroencephalogram (EEG) and anxiety study to illustrate Bayesian +models. + +Specifically focusing on psychology, the article highlights the +substantial reliance on p-values. This emphasis weakens the attention to +precise predictions, failing to emphasize whether the results are true +or valid. The Bayesian approach requires that researchers make +predictions prior to data analysis. While concerns have been raised +about potential subjectivity in these predictions, the article contends +that research inherently involves subjective decisions throughout the +process. Whether rooted in scientific research or not, these decisions +are influenced by existing literature and chosen methods. + +The Bayesian linear method involves multiple steps—essentially, +predicting prior values, inputting collected data, and obtaining +probability in the form of a distribution (posterior distribution). This +distribution is typically different from commonly known probability +distributions. This is where the Markov Chain Monte Carlo method becomes +important, as it simulates random draws from the posterior distribution. + +In assessing the adequacy of a model, Bayesian methods typically rely on +two commonly used metrics: the widely applicable information criterion +and the Leave-one-out cross-validation. When examining data, various +assumptions, such as having a non-normal distribution, can be overlooked +with the application of the Bayesian method. Despite its advantages, +there are several challenges associated with using Bayesian methods, +primarily its complexity and the need for a deeper understanding of the +technology and methodologies involved. + +#### Linear regression model using Bayesian approach for energy performance of residential building + +The article discusses the two commonly used methods for estimating +regression model parameters: the frequentist method (OLS or MLE) and the +Bayesian approach. The Bayesian method is characterized by viewing +parameters as random variables, introducing the concept of prior, +likelihood, and posterior distributions. + +The research uses an energy efficiency dataset to predict cooling +equipment needs and utilizes linear regression modeling with Ordinary +Least Square (OLS) and Bayesian approaches. Correlation tests reveal +relationships between independent and dependent variables. The OLS +model, while showing significance, fails typical assumptions, prompting +consideration of the Bayesian approach. Bayesian modeling is conducted +using Gibbs Sampler and Markov Chain Monte Carlo (MCMC) methods. The +comparison of OLS and Bayesian models involves criteria such as RMSE, +MAD, and MAPE. The study concludes that Bayesian regression is more +suitable when standard assumptions are not met. + +#### What to believe: Bayesian methods for data analysis + +The article highlights the flexibility of Bayesian data analysis, +allowing researchers to adapt models to various data types, such as +dichotomous, metric, or those involving multiple treatment groups. +Unlike traditional null-hypothesis significance testing (NHST), Bayesian +analysis focuses on estimating parameters in a descriptive model without +committing to specific cognitive mechanisms. Bayesian methods eliminate +the need for p-values, offering richer information on parameters, +including correlations and trade-offs. The article encourages a shift to +Bayesian data analysis in cognitive science. The Bayesian model allows +flexible customization to data types, estimating parameters like mean +accuracy and certainty. Credible intervals and parameter correlations +are inherent, and the discussion covers coherent power and replication +probability computation in Bayesian analysis, providing a more realistic +estimate compared to traditional NHST power analysis. + +#### Bayesian Analysis Reporting Guidelines (BARG) + +In Bayesian analysis, even when utilizing representative or informed +priors, it is crucial to perform a sensitivity analysis to assess the +impact of different prior specifications on the posterior results. This +aims to ensure that the findings are not dependent on the choice of +prior. If results remain consistent across various reasonable priors, it +improves the robustness of the findings; however, if they are highly +sensitive to the prior  the outcomes heavily rely on the assumed prior +conditions. + +The widely employed Markov Chain Monte Carlo (MCMC) computational method +in Bayesian analysis involves generating samples from the posterior +distribution of parameters. Confirming the convergence of MCMC chains is +important for result reliability. Evidence of convergence, often +indicated by statistics like the Potential Scale Reduction Factor +(PSRF), provides confidence in the validity of the samples. High +Effective Sample Size (ESS) signifies precise estimates, while low ESS +may result in imprecise estimates. Reporting ESS for each parameter or +derived value aids in assessing result reliability. + +For decision-making based on continuous-parameter posterior +distribution, defining and justifying the limits of the Region of +Practical Equivalence (ROPE) is crucial. The ROPE determines the range +of parameter values considered practically equivalent and guides +decisions on the practical importance of parameter estimates. Providing +transparency about the computational approach guarantees that users are +informed about potential limitations. + +## Methods + +\~ Bayesian Linear Regression \~ + +## Analysis and Results + +### Data and Visualization + +A study was conducted to determine how... + +```{r, warning=FALSE, echo=T, message=FALSE} +# loading packages +library(tidyverse) +library(knitr) +library(ggthemes) +library(ggrepel) +library(dslabs) +``` + +```{r, warning=FALSE, echo=TRUE} +# Load Data +kable(head(murders)) + +ggplot1 = murders %>% ggplot(mapping = aes(x=population/10^6, y=total)) + + ggplot1 + geom_point(aes(col=region), size = 4) + + geom_text_repel(aes(label=abb)) + + scale_x_log10() + + scale_y_log10() + + geom_smooth(formula = "y~x", method=lm,se = F)+ + xlab("Populations in millions (log10 scale)") + + ylab("Total number of murders (log10 scale)") + + ggtitle("US Gun Murders in 2010") + + scale_color_discrete(name = "Region")+ + theme_bw() + + +``` + +### Statistical Modeling + +```{r} + +``` + +### Conclusion + +## References diff --git a/index.html b/index.html index 9e8c19192..1f0ad8cb4 100644 --- a/index.html +++ b/index.html @@ -2,14 +2,14 @@ - + - - + + -Sample Report - Data Science Capstone +Bayesian Linear Regression + - - - @@ -3156,7 +3025,7 @@
-

Sample Report - Data Science Capstone

+

Bayesian Linear Regression

@@ -3166,14 +3035,14 @@

Sample Report - Data Science Capstone

Author
-

Student name

+

Kayla Mota

Published
-

January 9, 2024

+

January 30, 2024

@@ -3183,28 +3052,34 @@

Sample Report - Data Science Capstone

-
-

Introduction

-
-

What is “mehtod”?

-

This is an introduction to Kernel regression, which is a non-parametric estimator that estimates the conditional expectation of two variables which is random. The goal of a kernel regression is to discover the non-linear relationship between two random variables. To discover the non-linear relationship, kernel estimator or kernel smoothing is the main method to estimate the curve for non-parametric statistics. In kernel estimator, weight function is known as kernel function (Efromovich 2008). Cite this paper (Bro and Smilde 2014). The GEE (Wang 2014).

-

This is my work and I want to add more work…

+
+

Summary of Articles

+
+

An introduction to using Bayesian linear regression with clinical data

+

Within the realm of psychology, the tests utilized in research can be deceptive. This article explains how opting for the Bayesian method could yield more precise outcomes compared to frequentist methods. Throughout the article they are using the data from an electroencephalogram (EEG) and anxiety study to illustrate Bayesian models.

+

Specifically focusing on psychology, the article highlights the substantial reliance on p-values. This emphasis weakens the attention to precise predictions, failing to emphasize whether the results are true or valid. The Bayesian approach requires that researchers make predictions prior to data analysis. While concerns have been raised about potential subjectivity in these predictions, the article contends that research inherently involves subjective decisions throughout the process. Whether rooted in scientific research or not, these decisions are influenced by existing literature and chosen methods.

+

The Bayesian linear method involves multiple steps—essentially, predicting prior values, inputting collected data, and obtaining probability in the form of a distribution (posterior distribution). This distribution is typically different from commonly known probability distributions. This is where the Markov Chain Monte Carlo method becomes important, as it simulates random draws from the posterior distribution.

+

In assessing the adequacy of a model, Bayesian methods typically rely on two commonly used metrics: the widely applicable information criterion and the Leave-one-out cross-validation. When examining data, various assumptions, such as having a non-normal distribution, can be overlooked with the application of the Bayesian method. Despite its advantages, there are several challenges associated with using Bayesian methods, primarily its complexity and the need for a deeper understanding of the technology and methodologies involved.

-

Methods

-

The common non-parametric regression model is \(Y_i = m(X_i) + \varepsilon_i\), where \(Y_i\) can be defined as the sum of the regression function value \(m(x)\) for \(X_i\). Here \(m(x)\) is unknown and \(\varepsilon_i\) some errors. With the help of this definition, we can create the estimation for local averaging i.e. \(m(x)\) can be estimated with the product of \(Y_i\) average and \(X_i\) is near to \(x\). In other words, this means that we are discovering the line through the data points with the help of surrounding data points. The estimation formula is printed below (R Core Team 2019):

-

\[ -M_n(x) = \sum_{i=1}^{n} W_n (X_i) Y_i \tag{1} -\] \(W_n(x)\) is the sum of weights that belongs to all real numbers. Weights are positive numbers and small if \(X_i\) is far from \(x\).

-

Another equation:

-

\[ -y_i = \beta_0 + \beta_1 X_1 +\varepsilon_i -\]

+

~ Bayesian Linear Regression ~

Analysis and Results

@@ -3301,7 +3176,7 @@

Data and Visualizat theme_bw()
-

+

@@ -3312,25 +3187,11 @@

Statistical Modeling<

Conclusion

-
- - +
+

References

-

References

-
-Bro, Rasmus, and Age K Smilde. 2014. “Principal Component Analysis.” Analytical Methods 6 (9): 2812–31. -
-
-Efromovich, S. 2008. Nonparametric Curve Estimation: Methods, Theory, and Applications. Springer Series in Statistics. Springer New York. https://books.google.com/books?id=mdoLBwAAQBAJ. -
-
-R Core Team. 2019. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org. -
-
-Wang, Ming. 2014. “Generalized Estimating Equations in Longitudinal Data Analysis: A Review and Recent Developments.” Advances in Statistics 2014. -
-
+