Skip to content

Commit 7dc9dc8

Browse files
committed
Remove str() from mcs reshape
1 parent ff83b36 commit 7dc9dc8

File tree

2 files changed

+18
-60
lines changed

2 files changed

+18
-60
lines changed

docs/mcs-reshape_long_wide.md

Lines changed: 17 additions & 59 deletions
Original file line numberDiff line numberDiff line change
@@ -55,68 +55,26 @@ df_wide <- map(3:7, load_height_wide) %>%
5555
reduce(~ full_join(.x, .y, by = c("MCSID", "CNUM00"))) %>%
5656
rename(ECHTCM00 = ECHTCMA0, ECWTCMA00 = ECWTCMA0)
5757

58-
str(df_wide)
58+
df_wide
5959
```
6060

6161
``` text
62-
tibble [16,618 × 12] (S3: tbl_df/tbl/data.frame)
63-
$ MCSID : chr [1:16618] "M10001N" "M10002P" "M10007U" "M10011Q" ...
64-
..- attr(*, "label")= chr "MCS Research ID - Anonymised Family/Household Identifier"
65-
..- attr(*, "format.stata")= chr "%7s"
66-
$ CNUM00 : dbl+lbl [1:16618] 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1...
67-
..@ labels: Named num [1:3] 1 2 3
68-
.. ..- attr(*, "names")= chr [1:3] "1st Cohort Member of the family" "2nd Cohort Member of the family" "3rd Cohort Member of the family"
69-
..@ label : chr "Cohort Member number within an MCS family"
70-
$ CCHTCM00 : dbl+lbl [1:16618] 114, 110, 118, 121, 110, 118, 110, 113, 112, 108, 11...
71-
..@ label : chr "PHYS: Height in cms"
72-
..@ format.stata: chr "%12.0g"
73-
..@ labels : Named num [1:5] -9 -8 -1 99998 99999
74-
.. ..- attr(*, "names")= chr [1:5] "Refusal" "Don't Know" "Not applicable" "Refusal" ...
75-
$ CCWTCM00 : dbl+lbl [1:16618] 21.2, 19.2, 25.3, 32.9, 19.7, 23.0, 18.9, 19.4, 20.6...
76-
..@ label : chr "PHYS: Weight in Kilograms"
77-
..@ format.stata: chr "%12.0g"
78-
..@ labels : Named num [1:3] -9 -8 -1
79-
.. ..- attr(*, "names")= chr [1:3] "Refusal" "Don't Know" "Not applicable"
80-
$ DCHTCM00 : dbl+lbl [1:16618] 128, 123, 129, 137, 122, 130, 121, 128, 123, 121, N...
81-
..@ label : chr "Height in cms"
82-
..@ format.stata: chr "%12.0g"
83-
..@ labels : Named num [1:3] -9 -8 -1
84-
.. ..- attr(*, "names")= chr [1:3] "Refusal" "Don''t Know" "Not applicable"
85-
$ DCWTCM00 : dbl+lbl [1:16618] 25.5, 26.2, 26.5, 51.2, 24.1, 29.0, 21.7, 22.0, 24.6...
86-
..@ label : chr "Weight in Kilos"
87-
..@ format.stata: chr "%12.0g"
88-
..@ labels : Named num [1:3] -9 -8 -1
89-
.. ..- attr(*, "names")= chr [1:3] "Refusal" "Don''t Know" "Not applicable"
90-
$ ECHTCM00 : dbl+lbl [1:16618] NA, 144, 154, 168, 143, 152, NA, 150, 141, 147, 15...
91-
..@ label : chr "Height in cms"
92-
..@ format.stata: chr "%12.0g"
93-
..@ labels : Named num [1:2] -7 -1
94-
.. ..- attr(*, "names")= chr [1:2] "No answer" "Not applicable"
95-
$ ECWTCMA00: dbl+lbl [1:16618] NA, 41.8, 40.6, 74.0, 38.2, 41.5, NA, 37.3, 33.8...
96-
..@ label : chr "Weight in kilos"
97-
..@ format.stata: chr "%12.0g"
98-
..@ labels : Named num [1:2] -7 -1
99-
.. ..- attr(*, "names")= chr [1:2] "No answer" "Not applicable"
100-
$ FCHTCM00 : dbl+lbl [1:16618] NA, 163, 174, NA, 164, 167, NA, 164, 161, 157, 16...
101-
..@ label : chr "Height in centimeters"
102-
..@ format.stata: chr "%12.0g"
103-
..@ labels : Named num [1:2] -5 -1
104-
.. ..- attr(*, "names")= chr [1:2] "UNABLE TO OBTAIN HEIGHT MEASUREMENT" "Not applicable"
105-
$ FCWTCM00 : dbl+lbl [1:16618] NA, 52.3, 57.1, NA, 56.2, 51.5, NA, 56.9, 46.8...
106-
..@ label : chr "Weight in kilos"
107-
..@ format.stata: chr "%12.0g"
108-
..@ labels : Named num [1:2] -5 -1
109-
.. ..- attr(*, "names")= chr [1:2] "UNABLE TO OBTAIN HEIGHT MEASUREMENT" "Not applicable"
110-
$ GCHTCM00 : dbl+lbl [1:16618] NA, 174, 181, NA, 169, 185, NA, 166, NA, 157, 18...
111-
..@ label : chr "Height in cms"
112-
..@ format.stata: chr "%12.0g"
113-
..@ labels : Named num [1:2] -5 -1
114-
.. ..- attr(*, "names")= chr [1:2] "Unable to obtain height measurement" "Not applicable"
115-
$ GCWTCM00 : dbl+lbl [1:16618] NA, 59.4, 71.4, NA, 75.7, 74.1, NA, 56...
116-
..@ label : chr "Weight in kilos"
117-
..@ format.stata: chr "%12.0g"
118-
..@ labels : Named num [1:2] -5 -1
119-
.. ..- attr(*, "names")= chr [1:2] "Unable to obtain weight measurement" "Not applicable"
62+
# A tibble: 16,618 × 12
63+
MCSID CNUM00 CCHTCM00 CCWTCM00 DCHTCM00 DCWTCM00 ECHTCM00 ECWTCMA00 FCHTCM00
64+
<chr> <dbl+l> <dbl+lb> <dbl+lb> <dbl+lb> <dbl+lb> <dbl+lb> <dbl+lbl> <dbl+lb>
65+
1 M100… 1 [1st… 114. 21.2 128. 25.5 NA NA NA
66+
2 M100… 1 [1st… 110. 19.2 123 26.2 144. 41.8 163.
67+
3 M100… 1 [1st… 118 25.3 129 26.5 154. 40.6 174.
68+
4 M100… 1 [1st… 121 32.9 137 51.2 168. 74 NA
69+
5 M100… 1 [1st… 110. 19.7 122. 24.1 143 38.2 164.
70+
6 M100… 1 [1st… 118. 23 130 29 152. 41.5 167
71+
7 M100… 1 [1st… 110. 18.9 121. 21.7 NA NA NA
72+
8 M100… 1 [1st… 113. 19.4 128. 22 150. 37.3 164.
73+
9 M100… 1 [1st… 112. 20.6 123 24.6 141. 33.8 161
74+
10 M100… 1 [1st… 108 18.4 121 24.2 147 40.3 157
75+
# ℹ 16,608 more rows
76+
# ℹ 3 more variables: FCWTCM00 <dbl+lbl>, GCHTCM00 <dbl+lbl>,
77+
# GCWTCM00 <dbl+lbl>
12078
```
12179

12280
`df_wide` has 12 columns. Besides, the identifiers, `MCSID` and `cnum`,

quarto/mcs-reshape_long_wide.qmd

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -47,7 +47,7 @@ df_wide <- map(3:7, load_height_wide) %>%
4747
reduce(~ full_join(.x, .y, by = c("MCSID", "CNUM00"))) %>%
4848
rename(ECHTCM00 = ECHTCMA0, ECWTCMA00 = ECWTCMA0)
4949
50-
str(df_wide)
50+
df_wide
5151
```
5252

5353
`df_wide` has 12 columns. Besides, the identifiers, `MCSID` and `cnum`, there are 10 columns for height and weight measurements at each sweep. Each of these 10 columns is prefixed by a single letter indicating the sweep. We can reshape the dataset into long format (one row per person x sweep combination) using the `pivot_longer()` function so that the resulting data frame has five columns: two person identifiers, a variable for sweep, and variables for height and weight. We specify the columns to be reshaped using the `cols` argument, provide the new variable names in the `names_to` argument, and the pattern the existing column names take using the `names_pattern` argument. For `names_pattern` we specify `"(.)(.*)"`, which breaks the column name into two pieces: the first character (`"(.)"`) and the rest of the name (`"(.*)"`). `names_pattern` uses regular expressions. `.` matches single characters, and `.*` modifies this to make zero or more characters. As noted, the first character holds information on sweep; in the reshaped data frame the character is stored as a value in a new column `sweep`. `.value` is a placeholder for the new columns in the reshaped data frame that store the values from the columns selected by `cols`; these new columns are named using the second piece from `names_pattern` - in this case `CHTCM00` (height) and `CWTCM00` (weight).

0 commit comments

Comments
 (0)