Skip to content

r-lapins/Process-Data-Toolkit

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

121 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Process Data Toolkit (PDT)

CI CUDA optional

Modern C++20 library and CLI tools for CSV time-series processing and WAV signal analysis.


Related project

This library powers a desktop application:

πŸ‘‰ Process Data Viewer (Qt)

The viewer provides an interactive GUI for:

  • CSV anomaly analysis
  • WAV signal and spectrum analysis
  • visualization and export tools

Project goals

This project demonstrates modern C++ development practices and serves as a portfolio example.

Key aspects:

  • Modern C++20 design
  • Clean separation between CLI and reusable core library
  • Reproducible builds using CMake presets
  • CI (GCC + Clang)
  • Sanitizers (ASan + UBSan)
  • Static analysis (clang-tidy)
  • Debugging and memory analysis (GDB + Valgrind)
  • Unit testing

Development

Development notes and CI instructions are available here: docs/DEVELOPMENT.md.


Features

CSV data processing

Notes and instructions are available here: docs/CSV.md.

  • CLI data processing tool for CSV files (pdt_csv_cli)
  • Robust CSV parsing with error reporting and skipped row inspection
  • ISO 8601 timestamp parsing
  • Filtering by sensor and time range
  • Statistical analysis with optional per-sensor breakdown
  • Configurable anomaly detection (zscore, iqr, mad) with threshold and top-N selection
  • JSON and CSV export with anomaly highlighting

WAV signal analysis

Notes and instructions are available here: docs/WAV.md.

  • CLI spectrum analysis tool for WAV files (pdt_wav_cli)
  • Spectrum computation using DFT / FFT with optional CUDA acceleration (cuFFT)
  • Backend abstraction for CPU / GPU spectrum computation
  • Single-sided spectrum computation
  • Window functions: Hann and Hamming
  • Spectral peak detection and dominant peak selection
  • WAV reader (RIFF/WAVE PCM16 mono)
  • Synthetic signal spectrum analysis demo (pdt_wav_synth_demo)
  • CSV and text report export (--out, --out-r)
  • Recommended FFT size listing for CPU/GPU (--list-sizes)
  • FFT benchmark tool (fft_benchmark)

Example outputs

CSV CLI can:

  • print import/skipped row summaries
  • generate JSON reports
  • export anomaly-marked CSV files

WAV CLI can:

  • print spectral peak reports
  • export spectrum CSV
  • export text reports

Quick start

cmake --preset debug
cmake --build --preset debug

Run CSV CLI:

./build/debug/pdt_csv_cli --in examples/sample.csv

Run WAV CLI:

./build/debug/pdt_wav_cli --in examples/HDSDR_20230515_072359Z_15047kHz_AF.wav

CMake presets

  • debug β€” CPU only (default, sanitizers enabled)
  • debug-nosan β€” CPU only, no sanitizers
  • debug-cuda-nosan β€” CUDA enabled, no sanitizers
  • release β€” optimized CPU build
  • release-cuda β€” optimized CUDA build

Note: CUDA builds require sanitizers to be disabled.


Architecture

The project is organized in two complementary views:

Domain modules

  • csv β€” CSV time-series processing, filtering, statistics, anomaly detection
  • wav β€” offline signal/spectrum analysis for WAV input
  • rtlsdr β€” planned live SDR input module

Core layers

  • pdt/io β€” input/output modules (currently WAV I/O, later also RTL-SDR)
  • pdt/dsp β€” core DSP algorithms: DFT, FFT, windows, peak detection, spectrum types
  • pdt/compute β€” spectrum computation backends (IFftBackend, CPU, CUDA/cuFFT)
  • pdt/pipeline β€” backend-driven analysis flow (SpectrumEngine)

Project structure

include/pdt/
β”œβ”€β”€ compute/        FFT/spectrum backends (CPU, CUDA)
β”œβ”€β”€ csv/            CSV processing public API
β”œβ”€β”€ dsp/            DSP public API
β”œβ”€β”€ io/
β”‚   └── wav/        WAV I/O public API
└── pipeline/       analysis pipeline public API

src/
β”œβ”€β”€ compute/        backend implementations
β”œβ”€β”€ csv/            CSV processing implementation
β”œβ”€β”€ dsp/            DSP implementation
β”œβ”€β”€ io/
β”‚   └── wav/        WAV I/O implementation
└── pipeline/       analysis pipeline implementation

app/                CLI applications
bench/              performance benchmarks
tests/              unit tests
examples/           sample CSV and WAV inputs and outputs
.github/            CI workflows

Requirements

  • CMake 3.25+
  • Ninja
  • C++20 compatible compiler
  • Linux environment is recommended

Optional CUDA support

CUDA backend can be enabled to accelerate FFT computation.

Requirements:

  • NVIDIA GPU
  • CUDA Toolkit (with cuFFT)

When enabled:

  • CudaFftBackend and fft_benchmark are available
  • FFT can be executed on GPU (cuFFT)

Algorithms

Standard deviation:

Οƒ = sqrt( Ξ£(x - ΞΌ)Β² / N )

Anomaly detection methods:

The CSV CLI supports three anomaly detection methods:

- Z-score

z = (x - ΞΌ) / Οƒ

Samples with |z| > threshold are reported as anomalies.

- IQR

The interquartile range method uses:

IQR = Q3 - Q1

Samples outside the interval

[Q1 - threshold Β· IQR, Q3 + threshold Β· IQR]

are reported as anomalies.

- MAD

The median absolute deviation method uses:

MAD = median(|x - median(x)|)

A robust anomaly score is computed:

score = (x - median(x)) / MAD

Samples with |score| > threshold are reported as anomalies.

WAV signal processing methods:

- Discrete Fourier Transform (DFT)

X[k] = Ξ£ x[n] Β· e^(βˆ’j2Ο€kn/N),  k = 0..Nβˆ’1

Current implementation is O(NΒ²) and serves as a reference implementation.

- Fast Fourier Transform (FFT)

The project implements a radix-2 Cooley–Tukey FFT algorithm.

The FFT recursively decomposes the DFT into even and odd indexed samples:

X[k] = E[k] + W_N^k Β· O[k]
X[k + N/2] = E[k] - W_N^k Β· O[k]

where:

W_N^k = e^(βˆ’j2Ο€k/N)

The algorithm requires the input size to be a power of two and has time complexity O(N log N)

- Spectral peak detection

Two strategies:

ThresholdOnly

X[i] >= threshold_ratio Β· max(X)

LocalMaxima

X[i] > X[i-1] && X[i] > X[i+1]

Library usage

Example:

#include <pdt/csv/dataset.h>
#include <pdt/csv/csv_reader.h>
#include <fstream>

int main() {
    std::ifstream in("examples/sample.csv");
    auto import = pdt::read_csv(in);
    pdt::DataSet ds{std::move(import.samples)};
    auto stats = ds.stats();
    return 0;
}

Future work

Possible next steps:

  • Streaming / online anomaly detection
  • Additional window functions
  • Spectrogram computation

License

MIT License

About

Modern C++20 project for time-series data processing and signal analysis (DFT/FFT, spectrum, anomaly detection). Emphasizes modular architecture, CPU/GPU (CUDA) backends, reproducible builds (CMake), testing, sanitizers, static analysis, and CI.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors