GitHub - bpreger/Tidy-data

Tidy Data dataset #Creating the Tidy Data datasets# ##Prerequisites to analysis##

R Version 3.1.0 with R Studio and the default packages
The PlyR package
Download the UCI HAR Dataset from this location
Place the UCI Har Dataset folder into your working directory and rename it to UCI
Modify the features.txt to make sure no variables are duplicated *Add -X to the end of features 303-344, -Y to features 382-423 and -Z to features 461-502 to represent that they are measurements on the X, Y & Z axes respectively

##How the script operates##

###Reading the files in###

Create 3 data frames from the training datasets using the read.table function. subj represents the subject number, act represents the activity and meas represents the measurements of each variable

---All are done with header=FALSE since there is no header. Act is done with stringsAsFactors=True since they are factors

Change the column name for subj to "Subject_ID", act to "Activities" and meas to "Measurements"

---change activities to a factor variable with as.factor and then use the mapvalues function from plyr to change the names from numbers to activities based on activity_labels.txt

Create a features list with the read.table function on features.txt, thus creating the feat dataframe, then remove the first column.
Set the column names of meas as the feat dataframe with colNames(meas)
Bind the columns together using the order subj, meas, act
Repeat the process for the test dataset but change the names of each data frame to subj2, meas2 and act2. Features does not have to be re-read.
Use rbind to bind the variables together to create a composite dataset called temp, then order it by Subject_ID using the order command.

##Creating the first tidy dataset##

Create a dataframe(z) that specifies only the numerical values from temp, columns 3-563.
Get the means and standard deviations for each variable by applying the mean and sd function to the columns in z, convert it to a data frame and save it under the means and sds data frames respectively.
Bind the feat, means and sds data frames together using the cbind command to create a list of the means and standard devatiations for each variable
Write this file into tidy_data.csv using the write.csv function

##Creating the second tidy dataset##

Create the aggdata dataframe by aggregating z according to a list with the Activities and Subject_ID columns in the temp dataframe and set the function to mean. *This gives you the means for each subject by every activity for each feature
Clean up the file by switching columns 2 & 1 so that it makes more sense
Write this file into tidy_data2.csv using the write.csv function
Clean up the global environment using the rm() function.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
Code book.pdf		Code book.pdf
readme.md		readme.md
run_analysis.R		run_analysis.R

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages