From cf78c38bef60193c1166a783b7052655ca91ea99 Mon Sep 17 00:00:00 2001
From: albhasan <albhasan@gmail.com>
Date: Sun, 19 Nov 2023 12:09:36 -0300
Subject: [PATCH 1/2] Solves #118

---
 episodes/04-data-structures-part2.Rmd | 63 +++++++++++++++++++++++++--
 1 file changed, 59 insertions(+), 4 deletions(-)

diff --git a/episodes/04-data-structures-part2.Rmd b/episodes/04-data-structures-part2.Rmd
index 2ce4db0d..724dcdf2 100644
--- a/episodes/04-data-structures-part2.Rmd
+++ b/episodes/04-data-structures-part2.Rmd
@@ -37,10 +37,10 @@ So far, you have seen the basics of manipulating data frames with our nordic dat
 
 :::::::::::::::::::::::::::::::::::::::::  instructor
 
-Pay attention to and explain the errors and warnings generated from the 
+Pay attention to and explain the errors and warnings generated from the
 examples in this episode.
 
-:::::::::::::::::::::::::::::::::::::::::  
+:::::::::::::::::::::::::::::::::::::::::
 
 ```{r, echo=TRUE}
 gapminder <- read.csv("data/gapminder_data.csv")
@@ -75,7 +75,7 @@ gapminder <- read.csv("https://datacarpentry.org/r-intro-geospatial/data/gapmind
 
 - You can read directly from excel spreadsheets without
   converting them to plain text first by using the [readxl](https://cran.r-project.org/package=readxl) package.
-  
+
 
 ::::::::::::::::::::::::::::::::::::::::::::::::::
 
@@ -86,10 +86,12 @@ always do is check out what the data looks like with `str`:
 str(gapminder)
 ```
 
-We can also examine individual columns of the data frame with our `class` function:
+We can also examine individual columns of the data frame with the `class` or
+'typeof' functions.:
 
 ```{r}
 class(gapminder$year)
+typeof(gapminder$year)
 class(gapminder$country)
 str(gapminder$country)
 ```
@@ -281,6 +283,59 @@ tail(gapminder_norway)
 
 To understand why R is giving us a warning when we try to add this row, let's learn a little more about factors.
 
+
+## Removing columns and rows in data frames
+
+To remove columns from a data frame, we can use the 'subset' function.
+This function allows us to remove columns using their names:
+
+```{r}
+life_expectancy <- subset(gapminder, select = -c(continent, pop, gdpPercap))
+head(life_expectancy)
+```
+
+We can also use a logical vector to achieve the same result. Make sure the
+vector's length match the number of columns in the data frame (to avoid vector
+recycling):
+
+```{r}
+life_expectancy <- gapminder[c(TRUE, TRUE, FALSE, FALSE, TRUE, FALSE)]
+head(life_expectancy)
+```
+
+Alternatively, we can use column's positions:
+
+```{r}
+life_expectancy <- gapminder[-c(3, 4, 6)]
+head(life_expectancy)
+```
+
+Note that the easy way to remove rows from a data frame is selecting the rows
+we want to keep instead.
+Anyway, to remove rows from a data frame, we can use their positions:
+
+```{r}
+# Filter data for Afghanistan during the 20th century:
+afghanistan_20c <- gapminder[gapminder$country == "Afghanistan" &
+                             gapminder$year > 2000, ]
+
+# Now remove data for 2002, that is, the first row:
+afghanistan_20c[-1, ]
+```
+
+
+An interesting case is removing rows containing NAs:
+
+```{r}
+# Turn some values into NAs:
+afghanistan_20c <- gapminder[gapminder$country == "Afghanistan", ]
+afghanistan_20c[afghanistan_20c$year < 2007, "year"] <- NA
+
+# Remove NAs
+na.omit(afghanistan_20c)
+```
+
+
 ## Factors
 
 Here is another thing to look out for: in a `factor`, each different value

From 8cb90a89e501130bc6bddf0d201cc3d040aca999 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Alber=20S=C3=A1nchez?= <albhasan@gmail.com>
Date: Thu, 11 Jan 2024 11:02:18 -0300
Subject: [PATCH 2/2] Update episodes/04-data-structures-part2.Rmd

Co-authored-by: Michael Mahoney <mike.mahoney.218@gmail.com>
---
 episodes/04-data-structures-part2.Rmd | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/episodes/04-data-structures-part2.Rmd b/episodes/04-data-structures-part2.Rmd
index 724dcdf2..9a92cb27 100644
--- a/episodes/04-data-structures-part2.Rmd
+++ b/episodes/04-data-structures-part2.Rmd
@@ -87,7 +87,7 @@ str(gapminder)
 ```
 
 We can also examine individual columns of the data frame with the `class` or
-'typeof' functions.:
+'typeof' functions:
 
 ```{r}
 class(gapminder$year)