-
Notifications
You must be signed in to change notification settings - Fork 5
Add analysis component #44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
reader() needed to allow writing while also being read. writer() was not allowing 'Overwrite = False' to function correctly when fixed_size was False
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice PR! It's a good function. I've added mostly some documentation requests and some changes to the IO. After those have been resolved, I'll move onto requesting tests (we can discuss these in person at some point 😸 )
packs/ana/analysis_utils.py
Outdated
| This file holds all the relevant functions for analysing data from h5 files. | ||
| """ | ||
|
|
||
| def remove_secondaries(threshold, wf_data, time, event_number, verbose, WINDOW_END, bin_size): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Best practices (that I want to implement somewhat retroactively) is to include type-checking for all functions.
In your case that would look like (assuming your time and wf_data arrays are numpy based):
def remove_secondaries(threshold : int,
wf_data : np.ndarray,
time : np.ndarray,
event_number : int,
etc, etc) -> np.ndarray
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, general good practice is to pass the waveform data in first.
packs/ana/analysis_utils.py
Outdated
|
|
||
| def remove_secondaries(threshold, wf_data, time, event_number, verbose, WINDOW_END, bin_size): | ||
| ''' | ||
| Removes events with large secondary peaks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Give more context about what this function does. Step by step.
packs/ana/analysis_utils.py
Outdated
| after_window_wf = wf_data[collect_index(time, WINDOW_END) : len(wf_data)-1] | ||
| time_after = np.linspace(WINDOW_END, 2e6, num=len(after_window_wf), dtype=int) * bin_size # After WINDOW_END |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perhaps add a comment explaining these two lines, I understand that its the waveform after the window, but it is a bit dense.
packs/ana/analysis_utils.py
Outdated
| if verbose > 1: | ||
| plt.plot(time, wf_data) | ||
| plt.xlabel('Time (ns)') | ||
| plt.ylabel('ADCs') | ||
| plt.yscale('log') | ||
| plt.axhline(second_peak, 0, 2e5, c = 'r', ls = '--') | ||
| plt.axvline(WINDOW_END, c = 'r', ls = '--') | ||
| plt.title(f'Event {event_number} subtracted waveform') | ||
| plt.show() | ||
| print(f'Event {event_number} excluded due to large secondary peak') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The docstring suggests that putting verbose = 0 will omit the print statement, but this isn't the case in the code.
I dislike the use of a verbosity argument, but there isn't a good alternative implemented within MULE at the moment (logging to be implemented in the future).
packs/ana/analysis_utils.py
Outdated
| return wf_data | ||
|
|
||
|
|
||
| def suppress_baseline(wf_data, threshold): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as above wrt the types.
packs/ana/analysis_utils.py
Outdated
| # Return subtracted waveforms | ||
| return sub_data | ||
|
|
||
| def average_waveforms(files, bin_size, window_args, chunk_size = 5, negative=False, baseline_mode='median', verbose=1, peak_threshold=800): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above wrt type checking
packs/ana/analysis_utils.py
Outdated
|
|
||
| def average_waveforms(files, bin_size, window_args, chunk_size = 5, negative=False, baseline_mode='median', verbose=1, peak_threshold=800): | ||
| ''' | ||
| Averages waveforms |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
expand please :) since its an encapsulating function (does many things within it), you either need each function within this one to have docstrings fully explaining their functionality, or a more extensive docstring here (preferably both).
| if os.path.exists(filepath): | ||
| print(f"Processing file: {filepath}") | ||
|
|
||
| x = io.load_rwf_info(filepath, samples=2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| # Process the data in chunks to avoid memory overload, cooks data in chunks also | ||
| for start_idx in range(0, total_waveforms, chunk_size): | ||
| end_idx = min(start_idx + chunk_size, total_waveforms) | ||
| waveform_chunk = waveforms[start_idx:end_idx] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lazy loading would avoid this roughness, although this may be a quicker method!
| [required] | ||
|
|
||
| files = ['run19.h5'] | ||
|
|
||
| window_args = {'WINDOW_START' : 4e2, | ||
| 'WINDOW_END' : 3e4, | ||
| 'BASELINE_POINT_1' : 1e6, | ||
| 'BASELINE_POINT_2' : 1.5e6, | ||
| 'BASELINE_RANGE_1' : 40e3, | ||
| 'BASELINE_RANGE_2' : 40e3} | ||
|
|
||
| bin_size = 4 | ||
| chunk_size = 5 | ||
| negative = True | ||
| baseline_mode = 'median' | ||
| verbose = 1 | ||
| peak_threshold = 1000 | ||
|
|
||
| save_path = 'test.csv' No newline at end of file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would any of these be considered optional? You could set the save_path to be optional if it wrote out to a h5 file, and it would be stored within the same h5 from which it takes the data.
Includes waveform averaging code in the form of ana.py, moves some functions around for cleanliness.