Takes excel file output from ddPCR (or qPCR//obsolete), attaches the sample names and does calculations using metadata from other google sheets'.
- Generates processed data to send to HHD (as local .csv files + googlesheets).
- Also makes neat exploratory plots with appropriate labels and
- Make sure you are on the main branch by typing:
git checkout WW_masterin git bash (black window). This is the default branch for regular analysis - Get the branch upto date to the remote branch (in case others made any changes to improve scripts) using:
git pull
- first time users: Here's a handy guide to simple git commands and setup : git the simple guide; See happygitwithr to setup git within Rstudio for quick access
- Advanced users: Any experimental changes should be made in a different side branch and pushed to the remote. Then submit a pull request and then merge it on github This is to ensure that there are no conflicts if someone changed the master while you were working on a side branch
- ddPCR sheet name (in Raw ddpcr) should include the dd.WWxy ID (good to add a descriptive title after the ID)
- Sample naming in the 'calculations (lab notebook)/Plate layouts' google sheet a. Make sure the WWx or dd.WWx ID matches the ID in the name of the data file b. Set sample names using the automatic labeller to the right side of the plate in this format Target-Sample category_tube name.replicate number Example: BCoV-Vaccine_S32.2 or N1N2-908_W.1
- qPCR excel filename should include the WWxy ID, Target name and standard curve ID, Ex: WW66_831_BCoV_Std48
- Open the .eds file in Quantstudio (Applied Biosystems)
- Check amplification curves if everything seems right a. Check for systematic amplifications in non technical controls (NTC) - Or stochastic amplifications that seem problematic; by looking at raw data (multicomponent plots)
- If using standard curves across runs - check that the threshold is at the desired value for each target (by selecting each target): Used to set thresholds to 0.04. Not doing it right now since each plate has its own standard
- Export excel (.xls) file from Quantstudio with the same name as the qPCR file and include the serial number for standard curve. Example WW66_831 Rice_BCoV_Std45.xlsx
- Once the droplets reader run is done, open the data file in Quantalife or Quantalife analysis Pro
- Ignore the automatic thresholding (each well has a different threshold), and set a manual threshold to be optimally separating the positive and negative clusters in the 2D view. (aim for 1 cluster in each quadrant, without the thresholds cutting through any clusters).You can rely on positive/negative controls for assistance
- Check all the thresholds in the 1D view, by selecting 1 column at a time and correct any thresholds that are set too low or two high.
- Check to make sure all the samples are entered in the Sample registry/concentrated samples sheet and they match the sample names entered in the template file in calculations (lab notebook) Cov2/Plate layouts a. Check that the DI water sample has a unique name and doesn't match with previous weeks. Name it such as : 0304 DI1
- Check that these columns are filled: Biobot/other ID, WW volume filtered (ml). And if samples are being spiked, these columns should be filled in too Stock ID of Spike, Total WW volume received (ml)
- Biobot_ID should be updated with should be updated with any new manhole samples in sheet All manhole
If you are running this set of scripts for the first time, all essential directories should be loaded by git automatically. If running additional functions, make sure that you mimic the directory structures mentioned below. These directories should exist in the folder in which the Rprojct file + all the scripts exist
- qPCR analysis/Standards only if running qPCR files with standards on them (saving plot with standard curves)
Source data and metada is in google sheets, ask Prashant/Camille Mccall for access URLs for all the spreadsheets are in the 0-general_functions_main.R/sheeturls variable
- Open the Rproject file COVID-qPCR work in Rstudio - this will load Rstudio from the current directory (all subpaths are relative to this which will enable the script to run properly)
- File name and other inputs
- Go to the google sheet User inputs, ...for R code
- duplicate the template sheet and rename it after the desired final output file name. Ex 0304 Liftstations
- Drag it the the leftmost position
-
Run (source) the 2-calculations_multiple_runs file (clicking on the save button top left or Run - top right)
-
Authenticate the google drive and follow other prompts and it should lead to the outputs being saved to various other google sheets and local CSV files. This saves data in 3 different locations with the same name as title_name a. qPCR complete data - all data including controls b. WWTP and manhole data for HHD - only the 38 WWTPs c. WWTP and manhole data for HHD - non WWTP samples (Ex: Lift station, congregate facilities, manholes, Bayou etc.) - saved to a sheet with manhole in the name d. Plots are saved to a html file in qPCR analysis folder
-
--- Exceptions ---
- If you have any qPCR data, then run this command:
list_quant_data <- map(read_these_sheets, process_all_pcr)after sourcing 1-processing_functions.R and before running 2-calculations_multiple_runs.R. Comment out the call to this list_quant_data <- map(read_these_sheets ... in the 2-calcs file. There might be errors that need troubleshooting since this has not been extensively tested. : This will process Standard curve, qPCR data and ddPCR data for each file automatically (according to the name)- This dumps the data for each file with the appropriate sample labels in a google sheet 'qPCR data dump' google sheet.
- If you have a single file to run, you can call the function directly, instead of using
mapprocess_all_qpcr('WW66_831_BCoV_Std48') - If you already ran standard curve, and only need to process the qPCR. use
process_qpcr('WW66_831_BCoV_Std48')instead - If the qPCR plate does not have a standard curve in it and you want to use an older standard curve, run something like this:
process_qpcr('WW66_831_BCoV_Std48', std_override = 'Std21') - If you are processing baylor data (whose name/volume information is emailed to us and saved in a local excel sheet) give a regular expression indicating the baylor_samples location to the baylor_wells input to the functions. Example: Baylor samples present in wells A1 to D11, then run:
process_all_pcr('WW66_831_BCoV_Std48', baylor_wells = '[A-D]([1-9]$|10|11)')
In case you see any errors,
- look for
Show Tracebackkey next to the error (not always available) - figure out the line number in the error and trace back from there.
- Do extensive google searches - That's how I learn't coding (PK)
- Run 3-Weekly_comparisons.R with the names of all weeks to include in the analysis
- Ensure that the week's samples have already been processed and saved in the complete qPCR data sheet
If you are looking to run subsets of these scripts to plot specific data (sub)sets, please familiarize yourself with the code organization and how the plotting functions are being called, and the default values.
- The main functions are written in these scripts, divided up by their tasks
0-general_functions_main.R: Essential - loads all packages, all smaller functions (see point 2 below)1-processing_functions.R: first step in processing ddPCR/qPCR data (is called by2-calc..script)2-calculations_multiple_runs.R: Main function for ddPCR data processing - this is the only script you should run for processing raw data to handoff to HHD - single or small sets of runs done in a day/week3-weekly_comparisions.R: Run this for comparing data across ddPCR runs/weeks etc as a metaanalysis with plotting2.1-make_html_plots.Rmd: plotting multiple graphs into a html file - for single run/sets of runs (called by2-calc..script)3.1-weekly_comparison plots.Rmd: plotting multiple graphs into a html file - for the weekly comparisons (called by3-weekly..script)
- Minor tasks (lots of them) are modularized into functions, and written in multiple
.Rfiles in thescripts_general functionsfolder- If you are ever looking for a specific function called by the program (for debugging specific errors and such), and are confused which general_function script it is in: open
git bashorTerminalin linux/mac and typegrep -r "plot_mean_sd" *.R; replaceplot_mean_sdwith the function/code snippet you are looking for. This will search all the.Rfiles in the current folder for this specific text. - There are a couple of custom plotting functions using ggplot2 that do most of the plotting in the
.Rmdscripts.g.8-plot_mean_sd_jitterandg.9-plot_scatter. The main advantage of these functions is that I put in defaults for frequently used kinds of plots so you can make many different plots with slight variants in the x_axis variable :x_var, y_axis variable =y_var, colour_variable =colour_var... quite easily with less repetitive code. Definitely look through the arguments list of the plotting functions before using them/to troubleshoot for unexpected plotting errors. - If you have a readymade csv file output by the script with the data you want to plot, just proceed to source the
0-general_functions_main.Randread.csv('path_to_file', col_names = TRUE)then you can get right into the plotting using the functions mentioned above
- If you are ever looking for a specific function called by the program (for debugging specific errors and such), and are confused which general_function script it is in: open