A reproducible R pipeline to organise, convert, combine, clean, trim, and visualise temperature logger data collected across repeated field surveys. The workflow produces clearly logged steps, paginated QA plots, and cleaned datasets ready for downstream analysis.
- Move humidity
.txtfiles into a clean structure (with audit logs) - Convert raw logger
.txt→.csvper survey - Parse and combine per-survey data into tidy tables
- Stack all surveys into one combined dataset
- Clean labels, fix logger IDs, remove known bad entries
- Visualise full/trimmed/filtered series for QA (PDFs)
- Apply targeted trims/NA patches for problematic loggers
- Export final cleaned datasets + human-readable trim summary
- Moves humidity
.txtfiles to a clean folder structure (skips empty files). - Converts logger
.txt→.csvper survey folder. - Writes a timestamped pipeline log for full auditability.
- Parses per-survey CSVs into tidy columns:
Timestamp,Temperature_C, etc. - Infers and standardises
Survey,Location(Farm→WGV, etc.), andLogger_Number. - Writes one combined CSV per survey and logs successes/warnings.
- Validates required columns across survey CSVs.
- Binds all surveys into a single master dataset with a
sourcecolumn. - Exports
Full_combined_25M_data.csvand logs the process.
- Adds
Year,Month,DatefromTimestamp. - Removes
_errorand known-bad location labels; fixes common name variants. - Exports
Clean_1_combined_25M_data.csvwith a detailed cleaning log.
- Generates QA plots (full series) and “edge-trimmed” plots (drop first/last row per logger).
- Filters a short list of known faulty loggers.
- Produces hourly mean plots and saves paginated QA PDFs.
- Applies targeted trims/NA patches for specific loggers and date ranges.
- Replaces adjusted series back into the dataset and re-plots for QA.
- Exports final
Clean_2_combined_25M_data.csv, QA PDF, andLogger_Trim_Summary.csv.
- Move + convert raw files → CSVs with logs.
- Parse & combine per survey → tidy per-survey CSVs.
- Stack all surveys → master
Full_combined_25M_data.csv. - Clean labels/time fields →
Clean_1_combined_25M_data.csv. - QA plots, edge-trim, filter bad loggers.
- Targeted trims/NA patches → final
Clean_2_combined_25M_data.csv+ summary.
data/raw/(or absolute path): original logger.txtfiles in survey folders- Optional humidity
.txtfiles matched by\d+H
data/logger_csv/{Survey}/— converted per-survey CSVsdata/combined_surveys/combined_survey_{n}_data.csvreports/*log*.txt— timestamped logs per step
data/Final_datasets/Full_combined_25M_data.csvdata/Final_datasets/Clean_1_combined_25M_data.csvdata/Final_datasets/Clean_2_combined_25M_data.csvdata/Survey_Logger_Plots/All_Surveys_Logger_Plots_Paginated_clean_1.pdfdata/Survey_Logger_Plots/All_Surveys_Logger_Plots_Paginated_clean_2_trimmed.pdfdata/Survey_Logger_Plots/All_Surveys_Logger_Plots_clean&trimmed_2.pdfdata/Survey_Logger_Plots/Logger_Trim_Summary.csv
Each script writes timestamped entries (STARTED / SUCCESS / WARNING / ERROR) to a log file.
Keep logs in version control for auditability of your data products.
If you use or adapt this pipeline, please cite this repository and acknowledge the microclimate QA/QC procedures implemented here.
Maintainer: Perry Richardson