From cf78c38bef60193c1166a783b7052655ca91ea99 Mon Sep 17 00:00:00 2001 From: albhasan Date: Sun, 19 Nov 2023 12:09:36 -0300 Subject: [PATCH 1/2] Solves #118 --- episodes/04-data-structures-part2.Rmd | 63 +++++++++++++++++++++++++-- 1 file changed, 59 insertions(+), 4 deletions(-) diff --git a/episodes/04-data-structures-part2.Rmd b/episodes/04-data-structures-part2.Rmd index 2ce4db0d..724dcdf2 100644 --- a/episodes/04-data-structures-part2.Rmd +++ b/episodes/04-data-structures-part2.Rmd @@ -37,10 +37,10 @@ So far, you have seen the basics of manipulating data frames with our nordic dat ::::::::::::::::::::::::::::::::::::::::: instructor -Pay attention to and explain the errors and warnings generated from the +Pay attention to and explain the errors and warnings generated from the examples in this episode. -::::::::::::::::::::::::::::::::::::::::: +::::::::::::::::::::::::::::::::::::::::: ```{r, echo=TRUE} gapminder <- read.csv("data/gapminder_data.csv") @@ -75,7 +75,7 @@ gapminder <- read.csv("https://datacarpentry.org/r-intro-geospatial/data/gapmind - You can read directly from excel spreadsheets without converting them to plain text first by using the [readxl](https://cran.r-project.org/package=readxl) package. - + :::::::::::::::::::::::::::::::::::::::::::::::::: @@ -86,10 +86,12 @@ always do is check out what the data looks like with `str`: str(gapminder) ``` -We can also examine individual columns of the data frame with our `class` function: +We can also examine individual columns of the data frame with the `class` or +'typeof' functions.: ```{r} class(gapminder$year) +typeof(gapminder$year) class(gapminder$country) str(gapminder$country) ``` @@ -281,6 +283,59 @@ tail(gapminder_norway) To understand why R is giving us a warning when we try to add this row, let's learn a little more about factors. + +## Removing columns and rows in data frames + +To remove columns from a data frame, we can use the 'subset' function. +This function allows us to remove columns using their names: + +```{r} +life_expectancy <- subset(gapminder, select = -c(continent, pop, gdpPercap)) +head(life_expectancy) +``` + +We can also use a logical vector to achieve the same result. Make sure the +vector's length match the number of columns in the data frame (to avoid vector +recycling): + +```{r} +life_expectancy <- gapminder[c(TRUE, TRUE, FALSE, FALSE, TRUE, FALSE)] +head(life_expectancy) +``` + +Alternatively, we can use column's positions: + +```{r} +life_expectancy <- gapminder[-c(3, 4, 6)] +head(life_expectancy) +``` + +Note that the easy way to remove rows from a data frame is selecting the rows +we want to keep instead. +Anyway, to remove rows from a data frame, we can use their positions: + +```{r} +# Filter data for Afghanistan during the 20th century: +afghanistan_20c <- gapminder[gapminder$country == "Afghanistan" & + gapminder$year > 2000, ] + +# Now remove data for 2002, that is, the first row: +afghanistan_20c[-1, ] +``` + + +An interesting case is removing rows containing NAs: + +```{r} +# Turn some values into NAs: +afghanistan_20c <- gapminder[gapminder$country == "Afghanistan", ] +afghanistan_20c[afghanistan_20c$year < 2007, "year"] <- NA + +# Remove NAs +na.omit(afghanistan_20c) +``` + + ## Factors Here is another thing to look out for: in a `factor`, each different value From 8cb90a89e501130bc6bddf0d201cc3d040aca999 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Alber=20S=C3=A1nchez?= Date: Thu, 11 Jan 2024 11:02:18 -0300 Subject: [PATCH 2/2] Update episodes/04-data-structures-part2.Rmd Co-authored-by: Michael Mahoney --- episodes/04-data-structures-part2.Rmd | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/episodes/04-data-structures-part2.Rmd b/episodes/04-data-structures-part2.Rmd index 724dcdf2..9a92cb27 100644 --- a/episodes/04-data-structures-part2.Rmd +++ b/episodes/04-data-structures-part2.Rmd @@ -87,7 +87,7 @@ str(gapminder) ``` We can also examine individual columns of the data frame with the `class` or -'typeof' functions.: +'typeof' functions: ```{r} class(gapminder$year)