This repository contains a standard template for organising analytical workflows written in R. What follows are instructions for starting your workflow with this template, followed by a skeleton for your project README.
Creating a new repository for your workflow
To use this template for a new workflow, create a new repository on GitHub using the present repository (
Workflow-template-R) as the template. See the GitHub docs for instructions on how to do this. Note: you don't need to include all branches when creating the new repository from the template.Before starting your workflow
Decide on whether to use
renvfor package management. This is strongly recommend because it greatly enhances reproducibility. However, if you don't want to use this, then delete the following:
.Rprofilerenv/renv.lockDecide whether to use the
run.Rfile for running the whole pipeline. Delete it if not using, otherwise edit it and keep it up to date going forward to ensure it runs the correct scripts.Read and delete the
read-and-delete-me.txtfiles in the sub-folders:
data/raw/data/derived/_experimental/R/scripts/models/outputs/Decide which of these folders, if any, you won't be using and delete them as appropriate.
Update this README to reflect your workflow: change the title and fill out the sections below to the extent you're currently able to.
Finally, once you're happy with the project layout and README contents, delete this quoted section of text and start implementing your workflow (remember to keep this README up to date as you go). Good luck with your analysis! 🚀
A brief summary of the project, its purpose, and the biological questions it aims to address.
List of any software, packages, or tools needed to run the analysis, e.g., R version, specific R packages and versions, command line tools.
Note: If using renv, then renv.lock contains a spec of the R version and R packages needed for the workflow (with versions), so in this case you only need to specify any other tools that your project uses.
Instructions on how to set up the environment and install any necessary dependencies.
For example, if using renv, include steps for restoring the environment:
From the root folder of this project, open an R session and run the following:
# Install renv, if needed
install.packages("renv")
# Restore the environment
renv::restore()Explanation of the raw and processed data used in the project, where it comes from, and any important notes about it (e.g., data format, sources). Give enough detail so that someone could reproduce the results (in principle).
A brief overview of the main scripts in the project and what each one does.
Example:
The main workflow scripts scripts can be found in the scripts/ folder:
01_import_data.R: Loads raw data into the workspace.02_clean_data.R: Processes and cleans the raw data.03_analyse_data.R: Performs the main data analysis.04_plot_results.R: Creates visualisations based on the analysis.
Step-by-step instructions on how to execute the analysis.
If your project doesn't have a run.R script, then give step-by-step instructions on how to execute the analysis, typically referencing the scripts in the correct order.
If the pipeline can be run completely using a run.R script, then something like the following suffices:
Having installed the requirements listed above, to execute the pipeline use the run.R script:
- (From the command line e.g. Bash) Navigate to the root folder of this project and run:
Rscript run.R
- (From within an R / RStudio session) Ensure your current working directory is the root folder of this project and then
sourcetherun.Rscript.
Description of the output files generated by the project (e.g., figures, tables) and where they are saved.
List any papers, books, or resources used in the project, including methods or data sources.
You may also wish to add a brief section on any planned improvements, additional analyses, or next steps for the project.