Skip to content

Latest commit

 

History

History
221 lines (151 loc) · 7.31 KB

File metadata and controls

221 lines (151 loc) · 7.31 KB
knitr::opts_chunk$set(error = TRUE)

ggplot tutorial

What is ggplot?

BLABLA TODO graphics grammar etc

Actually doing things -- first dive into ggplot2

First step: install and import ggplot2 (surprising, right)

WHY TIDYVERSE TODO

# library(ggplot2)
library(tidyverse)
## ── Attaching packages ────────────────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse 1.2.1 ──

## ✔ ggplot2 3.1.0       ✔ purrr   0.3.2  
## ✔ tibble  2.1.1       ✔ dplyr   0.8.0.1
## ✔ tidyr   0.8.3       ✔ stringr 1.4.0  
## ✔ readr   1.3.1       ✔ forcats 0.3.0

## Warning: package 'tibble' was built under R version 3.5.2

## Warning: package 'tidyr' was built under R version 3.5.2

## Warning: package 'purrr' was built under R version 3.5.2

## Warning: package 'dplyr' was built under R version 3.5.2

## Warning: package 'stringr' was built under R version 3.5.2

## ── Conflicts ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()

Second step: get data (and here you thought nothing could surprise you anymore)

If you ever want to try something out, you should know that there are built-in datasets for you to test things on.

## see all datasets available
# data()

## load one of them
data(iris)

## take a look at what it looks like
head(iris)
##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1          5.1         3.5          1.4         0.2  setosa
## 2          4.9         3.0          1.4         0.2  setosa
## 3          4.7         3.2          1.3         0.2  setosa
## 4          4.6         3.1          1.5         0.2  setosa
## 5          5.0         3.6          1.4         0.2  setosa
## 6          5.4         3.9          1.7         0.4  setosa

And now, for the actual plotting...

ggplot()

Here we initialized the plot object. Basically, this tells R: hey, this is going to be a plot. Now we would like this plot to have something in it, and to add layers to a plot, we use "+", so let's try to add points:

ggplot()+
  geom_point(data=iris)
## Error: geom_point requires the following missing aesthetics: x, y

Oh no, an error! Fortunately, R tells you what is wrong: it requires an aesthetics object containing necessary information (here, x and y). [learning how to read errors thrown by R: very useful!]

So here we said "Hey R, display points!" and R answered "Ok, where?"

What are aesthetics? BLABLA TODO

Let's indicate to R in which positions we want points and plot the lengths of petals depending on the lengths of sepals:

ggplot()+
  geom_histogram(aes(x=Sepal.Length, fill="sepal length", alpha = 0.5), data=iris)+
  geom_histogram(aes(x=Petal.Length, fill="petal length", alpha=0.5), data=iris)+
  facet_wrap(~Species)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

ggplot()+
  geom_point(aes(x=Sepal.Length, y=Petal.Length), data=iris)

What if we also want, say, a smoothed curve for this data?

ggplot()+
  geom_point(data=iris, aes(x=Sepal.Length, y=Petal.Length))+
  geom_smooth(data=iris, aes(x=Sepal.Length, y=Petal.Length))
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

Well this is a bit repetitive, isn't it? Does this mean that we will have to repeat the data, x and y values anytime we want to do something new with them?

Well no! We can put everything only once, as an argument of the ggplot() function: anything that is written there will be put in common with all the other layers:

ggplot(data=iris, aes(x=Sepal.Length, y=Petal.Length))+
  geom_point()+
  geom_smooth()
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

## adding a new layer
ggplot(data=iris, aes(x=Sepal.Length, y=Petal.Length))+
  geom_point()+
  geom_smooth()+
  geom_rug()
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

Other information can be specified about the plot: color, size, shape, fill, transparency, ...

ggplot(data=iris, aes(x=Sepal.Length, y=Petal.Length))+
  geom_point(color="darkblue")+
  geom_smooth(color="green")
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

If this information depends on the data, it should be specified in the aes:

ggplot(data=iris, aes(x=Sepal.Length, y=Petal.Length, color=Species))+
  geom_point()+
  geom_smooth()
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

What?! No! I just wanted the points to be colored based on the species they belong to, not have a different regression for each species! No worries, we can do that too, just specify that information in a layer-specific aes:

ggplot(data=iris, aes(x=Sepal.Length, y=Petal.Length))+
  geom_point(aes(color=Species))+
  geom_smooth(color="black")
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

ggplot(data=iris, aes(x=Sepal.Length, y=Petal.Length))+
  geom_point(aes(color=Species, size=Petal.Width, alpha=0.8))+
  geom_smooth(color="black", fill="yellow")+
  facet_grid(~Species)
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

Let's do some actual analysis

YES the order matters:

ggplot(iris, aes(x=Species, y=Petal.Length, color=Species, fill=Species)) +
  geom_boxplot(width = 0.6, alpha=0.5)+
  geom_jitter(width=0.3, height=0)+
  geom_violin(alpha = 0.5, fill="grey") 

  # geom_boxplot(width = 0.6, alpha=0.5)


ggplot(iris, aes(x=Species, y=Petal.Length, color=Species, fill=Species)) +
  geom_jitter(width=0.1, alpha=0.5)+
  geom_violin(alpha = 0.5, fill="grey", position = position_nudge(x=0.6))+
  geom_boxplot(width = 0.3, alpha=0.5)+
  coord_flip()