-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathProjectPartII.Rmd
More file actions
170 lines (124 loc) · 7.29 KB
/
ProjectPartII.Rmd
File metadata and controls
170 lines (124 loc) · 7.29 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
---
title: "Statistical Inference Project - Part II"
subtitle: "Analysis of the ToothGrowth data"
author: "Loïc BERTHOU"
date: "August 19, 2015"
output: pdf_document
---
```{r, echo=FALSE, results='hide', message=FALSE}
set.seed(20150819)
library(ggplot2)
library(dplyr)
```
# Overview
This document is a basic analysis of the ToothGroth dataset provided in the R datasets packages.
# Introduction
Since we don't have any knowledge about this dataset, we will first perform an exploratory analysis to get a better sense of the type of data we will be dealing with.
Then we will perform a comparison of the tooth groth by supp and dose as requested.
# Exploratory data analysis
The first thing to do with the dataset is to read the documentation attached to it (throught the _help_ function).
The _str_ function also provides a useful overview of the dataset:
```{r, echo=FALSE}
str(ToothGrowth)
```
Finally, I will draw a first plot of the scattered points (Figure 1).
```{r, fig.width=7, fig.height=3.5, echo=FALSE}
g <- ggplot(data=ToothGrowth, aes(x=dose, y=len))
g <- g + geom_point(aes(color=supp), size=5, alpha=.5)
g <- g + ggtitle("figure 1 - Exploratory data analysis")
g
```
# Summary of the data
We have now a first idea of the data manipulated.
These numbers have been produced by a study on the effect of Vitamin C on guinea pigs and more precisely the length of their teeth.
The vitamin C has been administered either in the form of orange juice (represented "OJ"") or ascorbic acid (represented "VC"). Each form has been administered at 3 dose levels to 10 animals (60 different animals in total).
# Tooth growth comparison by supp and dose
We will compare the effects of the orange juice and ascorbic acid for each dose. In this experience, each measure has been taken on a different animal, so we can not infer a grouping by animal.
We will then compare the mean length progression between two doses and look at the confidence interval to make any conclusion.
To illustrate the analysis, we draw the scattered points with the average curve.
```{r, echo=FALSE, message=FALSE, warning=FALSE}
my_ci <- function(x) data.frame(y=mean(x), ymin=mean(x) - sd(x), ymax=mean(x) + sd(x))
g <- ggplot(data=ToothGrowth, aes(x=dose, y=len))
g <- g + geom_point(aes(color=supp), size=5, alpha=.5, position = position_jitter(w = 0.05, h = 0))
g <- g + geom_smooth(aes(color=supp))
g <- g + facet_wrap(~supp)
g <- g + ggtitle("figure 2 - Tooth groth comparison")
g
```
## Orange Juice
### From dose 0.5 to 1.0
```{r, echo=FALSE}
g2 <- select(filter(ToothGrowth, supp=="OJ", dose==1.0), len)
g1 <- select(filter(ToothGrowth, supp=="OJ", dose==0.5), len)
meanGrothOJ0510 <- mean(g2$len - g1$len)
confGroth <- t.test(g2, g1, paired = FALSE)$conf.int
```
The average growth difference is `r meanGrothOJ0510` with a confidence interval of 95% between `r sprintf("%.2f", confGroth[1])` and `r sprintf("%.2f", confGroth[2])`.
### From dose 1.0 to 2.0
```{r, echo=FALSE}
g2 <- select(filter(ToothGrowth, supp=="OJ", dose==2.0), len)
g1 <- select(filter(ToothGrowth, supp=="OJ", dose==1.0), len)
meanGrothOJ1020 <- mean(g2$len - g1$len)
confGroth <- t.test(g2, g1, paired = FALSE)$conf.int
```
The average growth difference is `r meanGrothOJ1020` with a confidence interval of 95% between `r sprintf("%.2f", confGroth[1])` and `r sprintf("%.2f", confGroth[2])`.
We can say that there is a difference between the effects of the two doses but are not garanteed to be very big.
### From dose 0.5 to 2.0
```{r, echo=FALSE}
g2 <- select(filter(ToothGrowth, supp=="OJ", dose==2.0), len)
g1 <- select(filter(ToothGrowth, supp=="OJ", dose==0.5), len)
meanGrothOJ0520 <- mean(g2$len - g1$len)
confGroth <- t.test(g2, g1, paired = FALSE)$conf.int
```
The average growth difference is `r meanGrothOJ0520` with a confidence interval of 95% between `r sprintf("%.2f", confGroth[1])` and `r sprintf("%.2f", confGroth[2])`.
The confidence interval with the confidence interval of the measurements "From dose 0.5 to 1.0" overlap. See the Conclusion.
## Ascorbic Acid
### From dose 0.5 to 1.0
```{r, echo=FALSE}
g2 <- select(filter(ToothGrowth, supp=="VC", dose==1.0), len)
g1 <- select(filter(ToothGrowth, supp=="VC", dose==0.5), len)
meanGrothVC0510 <- mean(g2$len - g1$len)
confGroth <- t.test(g2, g1, paired = FALSE)$conf.int
```
The average growth difference is `r meanGrothVC0510` with a confidence interval of 95% between `r sprintf("%.2f", confGroth[1])` and `r sprintf("%.2f", confGroth[2])`.
### From dose 1.0 to 2.0
```{r, echo=FALSE}
g2 <- select(filter(ToothGrowth, supp=="VC", dose==2.0), len)
g1 <- select(filter(ToothGrowth, supp=="VC", dose==1.0), len)
meanGrothVC1020 <- mean(g2$len - g1$len)
confGroth <- t.test(g2, g1, paired = FALSE)$conf.int
```
The average growth difference is `r meanGrothVC1020` with a confidence interval of 95% between `r sprintf("%.2f", confGroth[1])` and `r sprintf("%.2f", confGroth[2])`.
### From dose 0.5 to 2.0
```{r, echo=FALSE}
g2 <- select(filter(ToothGrowth, supp=="VC", dose==2.0), len)
g1 <- select(filter(ToothGrowth, supp=="VC", dose==0.5), len)
meanGrothVC0520 <- mean(g2$len - g1$len)
confGroth <- t.test(g2, g1, paired = FALSE)$conf.int
```
The average growth difference is `r meanGrothVC0520` with a confidence interval of 95% between `r sprintf("%.2f", confGroth[1])` and `r sprintf("%.2f", confGroth[2])`.
# Conclusion
The effect of ascorbic acid seems pretty consistant. The effect of the intermediate dose (1.0 mg), which has a difference in growth with the weakest dose (0.5 mg) of `r meanGrothOJ0510`. The strongest dose (2.0 mg) has an average difference in groth with the weakest dose (0.5 mg) of `r meanGrothVC0520`. By looking at the confidence intervals, we can say that the strongest dose has a very good chance of yielding better growth.
On the other hand, the effect of orange juice is not as important with a stronger dose. The effect of the intermediate dose (1.0 mg), which has a difference in growth with the weakest dose (0.5 mg) of `r meanGrothOJ0510`. The strongest dose (2.0 mg) has an average difference in growth with the weakest dose (0.5 mg) of `r meanGrothOJ0520`. By looking at the confidence intervals, we can say that the stronger dose has an effect, but it is not garanteed to be much more effective than the intermediate dose.
Something interesting to note is that, as the average differences in growth are not behaving similarly, we can see that the orange juice is more efficient at smaller doses than ascorbic acid. So despite a speeper curve for the ascorbic acid, the absolute effects on the guinea pigs are very similar at 2.0 mg. This is clearly visible in _figure 2_.
$\pagebreak$
# Appendix
## Code for generating figure 1
```{r, eval=FALSE}
g <- ggplot(data=ToothGrowth, aes(x=dose, y=len))
g <- g + geom_point(aes(color=supp), size=5, alpha=.5)
g <- g + ggtitle("figure 1 - Exploratory data analysis")
g
```
## Code for generating figure 2
```{r, eval=FALSE}
my_ci <- function(x) data.frame(y=mean(x), ymin=mean(x) - sd(x), ymax=mean(x) + sd(x))
g <- ggplot(data=ToothGrowth, aes(x=dose, y=len))
g <- g + geom_point(aes(color=supp), size=5, alpha=.5, position = position_jitter(w = 0.05, h = 0))
g <- g + geom_smooth(aes(color=supp))
g <- g + facet_wrap(~supp)
g <- g + ggtitle("figure 2 - Tooth groth comparison")
g
```
### Comments on figure 2
The points have been jittered horizontally to be able to see their concentration more clearly.