Skip to content

PerryRichardson/Microclimate-data-pipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Microclimate Logger Data Cleaning & QA Pipeline (R)

A reproducible R pipeline to organise, convert, combine, clean, trim, and visualise temperature logger data collected across repeated field surveys. The workflow produces clearly logged steps, paginated QA plots, and cleaned datasets ready for downstream analysis.

NOTE: This code pipeline runs better within an R project's folder.

Highlights

  • Move humidity .txt files into a clean structure (with audit logs)
  • Convert raw logger .txt.csv per survey
  • Parse and combine per-survey data into tidy tables
  • Stack all surveys into one combined dataset
  • Clean labels, fix logger IDs, remove known bad entries
  • Visualise full/trimmed/filtered series for QA (PDFs)
  • Apply targeted trims/NA patches for problematic loggers
  • Export final cleaned datasets + human-readable trim summary

Script Overview

1) 01_env_pipeline_part1_move_convert.R

  • Moves humidity .txt files to a clean folder structure (skips empty files).
  • Converts logger .txt.csv per survey folder.
  • Writes a timestamped pipeline log for full auditability.

2) 02_pipeline_part2_parse_combine_per_survey.R

  • Parses per-survey CSVs into tidy columns: Timestamp, Temperature_C, etc.
  • Infers and standardises Survey, Location (Farm→WGV, etc.), and Logger_Number.
  • Writes one combined CSV per survey and logs successes/warnings.

3) 03_combine_all_surveys.R

  • Validates required columns across survey CSVs.
  • Binds all surveys into a single master dataset with a source column.
  • Exports Full_combined_25M_data.csv and logs the process.

4a) 04_cleaning_step1_standardise.R

  • Adds Year, Month, Date from Timestamp.
  • Removes _error and known-bad location labels; fixes common name variants.
  • Exports Clean_1_combined_25M_data.csv with a detailed cleaning log.

4b) 04_cleaning_step2_trim_filter_visualise.R

  • Generates QA plots (full series) and “edge-trimmed” plots (drop first/last row per logger).
  • Filters a short list of known faulty loggers.
  • Produces hourly mean plots and saves paginated QA PDFs.

5) 05_targeted_trimming_and_export.R

  • Applies targeted trims/NA patches for specific loggers and date ranges.
  • Replaces adjusted series back into the dataset and re-plots for QA.
  • Exports final Clean_2_combined_25M_data.csv, QA PDF, and Logger_Trim_Summary.csv.

End-to-End Flow

  1. Move + convert raw files → CSVs with logs.
  2. Parse & combine per survey → tidy per-survey CSVs.
  3. Stack all surveys → master Full_combined_25M_data.csv.
  4. Clean labels/time fields → Clean_1_combined_25M_data.csv.
  5. QA plots, edge-trim, filter bad loggers.
  6. Targeted trims/NA patches → final Clean_2_combined_25M_data.csv + summary.

Inputs & Outputs

Inputs

  • data/raw/ (or absolute path): original logger .txt files in survey folders
  • Optional humidity .txt files matched by \d+H

Intermediate outputs

  • data/logger_csv/{Survey}/ — converted per-survey CSVs
  • data/combined_surveys/combined_survey_{n}_data.csv
  • reports/*log*.txt — timestamped logs per step

Final outputs

  • data/Final_datasets/Full_combined_25M_data.csv
  • data/Final_datasets/Clean_1_combined_25M_data.csv
  • data/Final_datasets/Clean_2_combined_25M_data.csv
  • data/Survey_Logger_Plots/All_Surveys_Logger_Plots_Paginated_clean_1.pdf
  • data/Survey_Logger_Plots/All_Surveys_Logger_Plots_Paginated_clean_2_trimmed.pdf
  • data/Survey_Logger_Plots/All_Surveys_Logger_Plots_clean&trimmed_2.pdf
  • data/Survey_Logger_Plots/Logger_Trim_Summary.csv

Logging & provenance

Each script writes timestamped entries (STARTED / SUCCESS / WARNING / ERROR) to a log file.
Keep logs in version control for auditability of your data products.


Citation

If you use or adapt this pipeline, please cite this repository and acknowledge the microclimate QA/QC procedures implemented here.

Maintainer: Perry Richardson

About

A clean, reproducible R pipeline for microclimate logger data. It converts raw `.txt` exports to `.csv`, combines surveys, cleans and standardises labels, generates QA plots, and outputs final analysis-ready datasets with audit logs.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages