Tidy Data dataset #Creating the Tidy Data datasets# ##Prerequisites to analysis##
- R Version 3.1.0 with R Studio and the default packages
- The PlyR package
- Download the UCI HAR Dataset from this location
- Place the UCI Har Dataset folder into your working directory and rename it to UCI
- Modify the features.txt to make sure no variables are duplicated *Add -X to the end of features 303-344, -Y to features 382-423 and -Z to features 461-502 to represent that they are measurements on the X, Y & Z axes respectively
##How the script operates##
###Reading the files in###
- Create 3 data frames from the training datasets using the read.table function. subj represents the subject number, act represents the activity and meas represents the measurements of each variable
---All are done with header=FALSE since there is no header. Act is done with stringsAsFactors=True since they are factors
- Change the column name for subj to "Subject_ID", act to "Activities" and meas to "Measurements"
---change activities to a factor variable with as.factor and then use the mapvalues function from plyr to change the names from numbers to activities based on activity_labels.txt
-
Create a features list with the read.table function on features.txt, thus creating the feat dataframe, then remove the first column.
-
Set the column names of meas as the feat dataframe with colNames(meas)
-
Bind the columns together using the order subj, meas, act
-
Repeat the process for the test dataset but change the names of each data frame to subj2, meas2 and act2. Features does not have to be re-read.
-
Use rbind to bind the variables together to create a composite dataset called temp, then order it by Subject_ID using the order command.
##Creating the first tidy dataset##
- Create a dataframe(z) that specifies only the numerical values from temp, columns 3-563.
- Get the means and standard deviations for each variable by applying the mean and sd function to the columns in z, convert it to a data frame and save it under the means and sds data frames respectively.
- Bind the feat, means and sds data frames together using the cbind command to create a list of the means and standard devatiations for each variable
- Write this file into tidy_data.csv using the write.csv function
##Creating the second tidy dataset##
- Create the aggdata dataframe by aggregating z according to a list with the Activities and Subject_ID columns in the temp dataframe and set the function to mean. *This gives you the means for each subject by every activity for each feature
- Clean up the file by switching columns 2 & 1 so that it makes more sense
- Write this file into tidy_data2.csv using the write.csv function
- Clean up the global environment using the rm() function.