Skip to content

Conversation

@Tedsmith100
Copy link

Includes waveform averaging code in the form of ana.py, moves some functions around for cleanliness.

Copy link
Member

@jwaiton jwaiton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice PR! It's a good function. I've added mostly some documentation requests and some changes to the IO. After those have been resolved, I'll move onto requesting tests (we can discuss these in person at some point 😸 )

This file holds all the relevant functions for analysing data from h5 files.
"""

def remove_secondaries(threshold, wf_data, time, event_number, verbose, WINDOW_END, bin_size):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Best practices (that I want to implement somewhat retroactively) is to include type-checking for all functions.

In your case that would look like (assuming your time and wf_data arrays are numpy based):

def remove_secondaries(threshold  :  int,
             wf_data              :  np.ndarray,
             time                 : np.ndarray,
             event_number         :  int,
             etc, etc) -> np.ndarray

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, general good practice is to pass the waveform data in first.


def remove_secondaries(threshold, wf_data, time, event_number, verbose, WINDOW_END, bin_size):
'''
Removes events with large secondary peaks
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Give more context about what this function does. Step by step.

Comment on lines 35 to 36
after_window_wf = wf_data[collect_index(time, WINDOW_END) : len(wf_data)-1]
time_after = np.linspace(WINDOW_END, 2e6, num=len(after_window_wf), dtype=int) * bin_size # After WINDOW_END
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps add a comment explaining these two lines, I understand that its the waveform after the window, but it is a bit dense.

Comment on lines 40 to 49
if verbose > 1:
plt.plot(time, wf_data)
plt.xlabel('Time (ns)')
plt.ylabel('ADCs')
plt.yscale('log')
plt.axhline(second_peak, 0, 2e5, c = 'r', ls = '--')
plt.axvline(WINDOW_END, c = 'r', ls = '--')
plt.title(f'Event {event_number} subtracted waveform')
plt.show()
print(f'Event {event_number} excluded due to large secondary peak')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstring suggests that putting verbose = 0 will omit the print statement, but this isn't the case in the code.

I dislike the use of a verbosity argument, but there isn't a good alternative implemented within MULE at the moment (logging to be implemented in the future).

return wf_data


def suppress_baseline(wf_data, threshold):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as above wrt the types.

# Return subtracted waveforms
return sub_data

def average_waveforms(files, bin_size, window_args, chunk_size = 5, negative=False, baseline_mode='median', verbose=1, peak_threshold=800):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above wrt type checking


def average_waveforms(files, bin_size, window_args, chunk_size = 5, negative=False, baseline_mode='median', verbose=1, peak_threshold=800):
'''
Averages waveforms
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

expand please :) since its an encapsulating function (does many things within it), you either need each function within this one to have docstrings fully explaining their functionality, or a more extensive docstring here (preferably both).

if os.path.exists(filepath):
print(f"Processing file: {filepath}")

x = io.load_rwf_info(filepath, samples=2)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its preferable to load the data in lazily, you can use the appropriate functions (found here), and an example of it being used here.

It would also be good to then modify cook_data() to instead expect singular waveforms.

Comment on lines +197 to +200
# Process the data in chunks to avoid memory overload, cooks data in chunks also
for start_idx in range(0, total_waveforms, chunk_size):
end_idx = min(start_idx + chunk_size, total_waveforms)
waveform_chunk = waveforms[start_idx:end_idx]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lazy loading would avoid this roughness, although this may be a quicker method!

Comment on lines +1 to +19
[required]

files = ['run19.h5']

window_args = {'WINDOW_START' : 4e2,
'WINDOW_END' : 3e4,
'BASELINE_POINT_1' : 1e6,
'BASELINE_POINT_2' : 1.5e6,
'BASELINE_RANGE_1' : 40e3,
'BASELINE_RANGE_2' : 40e3}

bin_size = 4
chunk_size = 5
negative = True
baseline_mode = 'median'
verbose = 1
peak_threshold = 1000

save_path = 'test.csv' No newline at end of file
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would any of these be considered optional? You could set the save_path to be optional if it wrote out to a h5 file, and it would be stored within the same h5 from which it takes the data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants