A Python package for processing CAEN WaveDump data files and extracting PMT/MPPC parameters into ROOT ntuples.
wavedump_processor/
├── README.md # This file
├── install.sh # Installation script
├── setup.py # Package setup file
├── requirements.txt # Python dependencies
├── process_wavedump.py # Main script
├── config_example.py # Example configuration file
└── wavedump_processor/ # Package directory
├── __init__.py
├── config/
│ ├── __init__.py
│ └── channel_config.py # ChannelConfig dataclass
├── processing/
│ ├── __init__.py
│ └── waveform_processor.py # WaveformProcessor class
├── io/
│ ├── __init__.py
│ └── wavedump_reader.py # WaveDumpReader class
└── utils/
├── __init__.py
├── file_utils.py # File finding utilities
└── processor_utils.py # Main processing functions
- Clone this repository:
git clone https://github.com/wihann00/wavepro
cd wavedump_processor- Run the installation script:
chmod +x install.sh
./install.sh- Activate the virtual environment:
source wdenv/bin/activateThis will:
- Create a virtual environment called
wdenv - Install the package and all dependencies
- Create the
waveprocommand
- Now you can use the command:
wavepro --help- When done, deactivate the virtual environment:
deactivate- Clone or download this package
- Create and activate a virtual environment:
python3 -m venv wdenv
source wdenv/bin/activate- Install in editable mode:
pip install -e .This installs the package and all dependencies (numpy, uproot, tqdm, setuptools) and creates the wavepro command.
The following packages will be installed automatically:
- numpy>=1.20.0
- uproot>=4.0.0
- tqdm>=4.60.0
- setuptools>=45.0.0
- The virtual environment (
wdenv) keeps the package isolated from your system Python - You must activate it each time:
source wdenv/bin/activate - Your shell prompt will show
(wdenv)when activated - To deactivate:
deactivate
Process a single binary file:
wavepro input.dat output.rootProcess ASCII files (multiple wave*.txt files):
wavepro "wave*.txt" output.root --file-type asciiProcess all files in subdirectories:
wavepro /path/to/data/ --batchWith custom pattern:
wavepro /path/to/data/ --batch --pattern "run*.dat"Use a custom configuration file:
wavepro input.dat output.root --config my_config.pySee config_example.py for configuration format.
View all options:
wavepro --helpArguments:
input: Input file path or parent directory (for batch mode)output: Output ROOT file (not used in batch mode)
Options:
--config FILE: Path to configuration file--file-type {binary,ascii}: Input file format (default: binary)--batch: Enable batch processing mode--pattern PATTERN: File pattern for batch mode (default: *.dat)--help, -h: Show help message
Create a configuration file defining your channel setup:
# my_config.py
from wavedump_processor.config.channel_config import ChannelConfig
channel_configs = [
ChannelConfig(
channel_id=0,
polarity=1, # 1=positive, -1=negative
baseline_samples=100, # Samples for baseline calculation
charge_method="fixed", # "fixed" or "dynamic"
charge_window=(0, 200), # Integration window
threshold=20.0, # Threshold for timing (ADC counts)
cfd_fraction=0.5 # CFD fraction (0-1)
),
ChannelConfig(
channel_id=1,
polarity=1,
charge_method="dynamic",
charge_window=(50, 150), # (samples_before_peak, samples_after_peak)
threshold=15.0,
cfd_fraction=0.3
),
# Add more channels as needed
]ChannelConfig fields:
channel_id(int): Channel numberpolarity(int): Signal polarity (1 for positive, -1 for negative)baseline_samples(int): Number of samples from start of waveform for baseline calculationcharge_method(str): Integration method"fixed": Fixed window from start"dynamic": Window around peak
charge_window(tuple):- For fixed: (start_sample, end_sample)
- For dynamic: (samples_before_peak, samples_after_peak)
threshold(float): Threshold in ADC counts above baseline for leading edge timingcfd_fraction(float): Fraction of peak height for CFD timing (typically 0.3-0.5)
The output ROOT file contains a TTree named events with the following branches:
Event-level branches:
event_number: Event countertrigger_time: Trigger time tagboard_id: Digitizer board ID
Per-channel branches (for each configured channel N):
chN_baseline_mean: Baseline mean (ADC counts)chN_baseline_rms: Baseline RMS (ADC counts)chN_peak_height: Peak height above baseline (ADC counts)chN_peak_time: Peak time (sample number)chN_charge: Integrated charge (ADC counts × samples)chN_threshold_time: Leading edge threshold crossing time (sample number, -1 if not found)chN_cfd_time: CFD crossing time (sample number, -1 if not found)
Missing channels in an event are filled with -999.0.
- Single file containing all channels
- CAEN WaveDump native binary format
- More compact file size
- Faster processing
- Separate files per channel (wave0.txt, wave1.txt, etc.)
- Human-readable but larger files
- Must specify pattern matching all channel files
wavepro data/run001.dat output/run001.root --config pmt_config.pywavepro data/ --batch --config pmt_config.pywavepro "data/run001/wave*.txt" output/run001.root --file-type asciiwavepro data/ --batch --pattern "run_2024*.dat"The processor displays progress bars during execution:
- File-level progress in batch mode
- Event-level progress within each file
- Real-time event counting
- Clear status indicators (✓ for success, ❌ for errors)
No files found in batch mode:
- Check that the pattern matches your files
- Ensure you're pointing to the correct parent directory
- For ASCII, make sure wave*.txt files exist in subdirectories
Import errors:
- Make sure all dependencies are installed:
pip install -r requirements.txt - Ensure the package structure is intact with all
__init__.pyfiles
Memory issues with large files:
- The processor reads events sequentially, so memory usage should be manageable
- If issues persist, consider processing files individually rather than in batch
"channel_configs not defined" error:
- Make sure your config file defines the
channel_configsvariable - Check that the import statement is correct:
from wavedump_processor.config.channel_config import ChannelConfig
- Binary files process faster than ASCII
- Processing speed: ~1000-10000 events/second depending on hardware
- Batch mode processes files sequentially
- For parallel processing on clusters, use SLURM (coming soon)
For each waveform, the processor extracts:
- Baseline: Mean and RMS from first N samples
- Peak Finding: Maximum value and position after polarity correction
- Charge Integration: Sum of ADC values in specified window
- Leading Edge Time: Threshold crossing with linear interpolation
- CFD Time: Constant Fraction Discriminator timing
- Peak Time: Sample position of maximum value
All timing values use linear interpolation for sub-sample precision.
Suggestions and contributions are welcome! Please test thoroughly before submitting changes.
For issues, questions, or contributions, please contact Wi Han Ng at wihann@student.unimelb.edu.au.
Version: 1.0.0
Last Updated: 2025