Add analysis component #44

Tedsmith100 · 2025-10-08T10:55:31Z

Includes waveform averaging code in the form of ana.py, moves some functions around for cleanliness.

reader() needed to allow writing while also being read. writer() was not allowing 'Overwrite = False' to function correctly when fixed_size was False

jwaiton

Nice PR! It's a good function. I've added mostly some documentation requests and some changes to the IO. After those have been resolved, I'll move onto requesting tests (we can discuss these in person at some point 😸 )

jwaiton · 2025-10-17T16:07:28Z

packs/ana/analysis_utils.py

+This file holds all the relevant functions for analysing data from h5 files.
+"""
+
+def remove_secondaries(threshold, wf_data, time, event_number, verbose, WINDOW_END, bin_size):


Best practices (that I want to implement somewhat retroactively) is to include type-checking for all functions.

In your case that would look like (assuming your time and wf_data arrays are numpy based):

def remove_secondaries(threshold : int, wf_data : np.ndarray, time : np.ndarray, event_number : int, etc, etc) -> np.ndarray

Also, general good practice is to pass the waveform data in first.

jwaiton · 2025-10-17T16:11:34Z

packs/ana/analysis_utils.py

+
+def remove_secondaries(threshold, wf_data, time, event_number, verbose, WINDOW_END, bin_size):
+    '''
+    Removes events with large secondary peaks


Give more context about what this function does. Step by step.

jwaiton · 2025-10-17T16:12:45Z

packs/ana/analysis_utils.py

+    after_window_wf = wf_data[collect_index(time, WINDOW_END) : len(wf_data)-1]
+    time_after = np.linspace(WINDOW_END, 2e6, num=len(after_window_wf), dtype=int) * bin_size # After WINDOW_END


perhaps add a comment explaining these two lines, I understand that its the waveform after the window, but it is a bit dense.

jwaiton · 2025-10-17T16:19:59Z

packs/ana/analysis_utils.py

+        if verbose > 1: 
+            plt.plot(time, wf_data)
+            plt.xlabel('Time (ns)')
+            plt.ylabel('ADCs')
+            plt.yscale('log')
+            plt.axhline(second_peak, 0, 2e5, c = 'r', ls = '--')
+            plt.axvline(WINDOW_END, c = 'r', ls = '--')
+            plt.title(f'Event {event_number} subtracted waveform')
+            plt.show()
+        print(f'Event {event_number} excluded due to large secondary peak')


The docstring suggests that putting verbose = 0 will omit the print statement, but this isn't the case in the code.

I dislike the use of a verbosity argument, but there isn't a good alternative implemented within MULE at the moment (logging to be implemented in the future).

jwaiton · 2025-10-17T16:21:10Z

packs/ana/analysis_utils.py

+        return wf_data
+
+
+def suppress_baseline(wf_data, threshold):


Same comment as above wrt the types.

jwaiton · 2025-10-17T16:49:27Z

packs/ana/analysis_utils.py

+    # Return subtracted waveforms
+    return sub_data
+
+def average_waveforms(files, bin_size, window_args, chunk_size = 5, negative=False, baseline_mode='median', verbose=1, peak_threshold=800):


same as above wrt type checking

jwaiton · 2025-10-17T16:51:51Z

packs/ana/analysis_utils.py

+
+def average_waveforms(files, bin_size, window_args, chunk_size = 5, negative=False, baseline_mode='median', verbose=1, peak_threshold=800):
+    '''
+    Averages waveforms


expand please :) since its an encapsulating function (does many things within it), you either need each function within this one to have docstrings fully explaining their functionality, or a more extensive docstring here (preferably both).

jwaiton · 2025-10-17T16:55:27Z

packs/ana/analysis_utils.py

+        if os.path.exists(filepath):
+            print(f"Processing file: {filepath}")
+
+            x = io.load_rwf_info(filepath, samples=2)


Its preferable to load the data in lazily, you can use the appropriate functions (found here), and an example of it being used here.

It would also be good to then modify cook_data() to instead expect singular waveforms.

jwaiton · 2025-10-17T16:55:55Z

packs/ana/analysis_utils.py

+            # Process the data in chunks to avoid memory overload, cooks data in chunks also
+            for start_idx in range(0, total_waveforms, chunk_size):
+                end_idx = min(start_idx + chunk_size, total_waveforms)
+                waveform_chunk = waveforms[start_idx:end_idx]


Lazy loading would avoid this roughness, although this may be a quicker method!

jwaiton · 2025-10-17T16:57:08Z

packs/configs/average_waveform.conf

+[required]
+
+files = ['run19.h5']
+
+window_args = {'WINDOW_START'     : 4e2,
+    'WINDOW_END'       : 3e4,
+    'BASELINE_POINT_1' : 1e6,
+    'BASELINE_POINT_2' :  1.5e6,
+    'BASELINE_RANGE_1'  : 40e3,
+    'BASELINE_RANGE_2'  : 40e3}
+
+bin_size = 4 
+chunk_size = 5 
+negative = True 
+baseline_mode = 'median'
+verbose = 1 
+peak_threshold = 1000
+
+save_path = 'test.csv'


Would any of these be considered optional? You could set the save_path to be optional if it wrote out to a h5 file, and it would be stored within the same h5 from which it takes the data.

jwaiton and others added 23 commits April 16, 2025 18:39

include missing config check and test

3171663

add lazy reader and writer

dafdd4c

add reader and writer test

d67b88e

add WD1 rwf type

a68cf2e

add MalformedHeaderError

7f062af

add lazy WD1 processing

ea71fc8

include test for process_event_lazy_WD1

8627d35

add WD1 processing

c0b9f39

add tests

27d632d

include calibration config

914a0f6

alter reader and writer to fix bugs

b059786

reader() needed to allow writing while also being read. writer() was not allowing 'Overwrite = False' to function correctly when fixed_size was False

include calibration info type

8b6ca5c

include functions related to waveform processing

2ea8290

include charge and height calculations into processing

e0ea1d2

add backwards compatibility with chunked data

d804343

alter logic for fixed_size, decreasing runtime by half

4cf4c90

cosmetics

81588db

rearrange scripts for reusability

86a56b4

include a waveform averaging function

cda5ba2

add inclusion of window args as a dictionary

8496aad

add a config for analysis

34beda7

add ana function for analysis

1c5984f

edit mule to include ana

88184a3

jwaiton requested changes Oct 17, 2025

View reviewed changes

Tedsmith100 added 4 commits October 23, 2025 16:10

Add argument types

6bbec1e

Add arg types, make PR changes

dfde694

add tests

59dcce7

clean functions

f99cf6b

		after_window_wf = wf_data[collect_index(time, WINDOW_END) : len(wf_data)-1]
		time_after = np.linspace(WINDOW_END, 2e6, num=len(after_window_wf), dtype=int) * bin_size # After WINDOW_END

Add analysis component #44

Are you sure you want to change the base?

Add analysis component #44

Uh oh!

Conversation

Tedsmith100 commented Oct 8, 2025

Uh oh!

jwaiton left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jwaiton left a comment •

edited

Loading