A template for how to organize a large project that requires many custom functions and thorough documentation. Much of the layout was taken from the following:
- Guide to small project layout: https://nicercode.github.io/blog/2013-04-05-projects/
- Guide to using R packages for reproducible research: https://peerj.com/preprints/3192/
-
/R All the definitions of custom functions go here. It's generally best practice to group similar functions in the same script file so the folder isn't overflowing with files, while at the same time separating function definitions into enough files so that they can be easily found based on what they do. Also important, there should be a data.R file containing roxygen documentation that thoroughly documents the clean data.
-
/final_analysis
- /scripts This is where the functions defined in /R get used to clean and analyze the data. Use good naming schemes and or number the scripts to define a specific order they should be run in.
- /output This is where all cumbersome tables, plots, or simulation output goes. Everything in this folder can be regenerated, but it may be a pain to do so.
- /reports Here we would store a technical .rmd that walks the reader through the statistical analysis and choices made. This would also be where the final report and reports from previous years are stored.
-
/data-raw This is where the raw data that never gets touched goes. The contents of this folder should be provided by whoever manually entered the data into a spreadsheet. Optionally, it can (should?) contain a plain text metadata file that fully describes the data.
-
/data This is where the (clean) data you want users to have immediate access to after loading the package goes. Should be saved as an .rda file. Use devtools::use_data(x) to correctly save data into this folder.
-
/man (stands for manual) This is where the documentation for all the functions in /R goes. The contents of this folder are automatically generated by roxygen and should not be manually edited. All functions defined in /R should have roxygen documenation.
-
/op_plan Similar structure as the final_analysis folder, except this is for everything that was done prior to the collection of data.
For the sake of longevity, some important documents should not depend on a particular programming language, program version, or document type. These documents should be:
- A pdf of the op plan in the /op_plan/reports folder.
- The final project report saved as a pdf in /final_analysis/reports
- The raw data saved as a .csv file in /data-raw
- A readme style metadata .txt document in /data-raw
- Technical document(s) guiding the reader through the statistical analysis saved as a .pdf in the appropriate reports folder