-
Notifications
You must be signed in to change notification settings - Fork 5
Create processing_utils_lecroy and add first parser #45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
jwaiton
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good looking PR! Just needs a bit more documentation and some alterations to the readability in due time.
| """ | ||
|
|
||
| def parse_lecroy_segmented(lines): | ||
| # Line 1 has to have: Segments,1000,SegmentSize,5002 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add documentation explaining what the function does, the input parameters and the expected output.
An example can be seen here
| This file holds all the relevant functions for the processing of data from csv files to h5. | ||
| """ | ||
|
|
||
| def parse_lecroy_segmented(lines): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Best practices (that I want to implement somewhat retroactively) is to include type-checking for all functions.
In your case that would look like:
def parse_lecroy_segmented(lines : str) --> Tuple(pd.DataFrame, pd.DataFrame):
If lines is a string, otherwise use the correct type
You'll have to import Typing for tuples:
from typing import Tuple
| segments = int(lines[1][1]) | ||
| seg_size = int(lines[1][3]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps:
| segments = int(lines[1][1]) | |
| seg_size = int(lines[1][3]) | |
| segments, seg_size = int(lines[1][1]), int(lines[1][3]) |
but this is picky, and perhaps a bit more unreadable. The choice is yours 🐱
| # Find the "Time,Ampl" line | ||
| for i, line in enumerate(lines): | ||
| if line[0].strip() == "Time": | ||
| data_start = i + 1 | ||
| break |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the "Time,Ampl" line inconsistent within the data you use? Like, does it sometimes occur 5 lines in, and other times 10 lines?
| for k in range(seg_size): | ||
| x = j * seg_size + k | ||
| if x >= len(raw_data): # x = line in the file | ||
| segment_data.append(None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This means that a segment will be lost if it is a bit shorter than expected right? Can you quantify how many events you lose this way? Either through a test or otherwise.
| value_list.append(segment_data) | ||
|
|
||
| value_df = pd.DataFrame(value_list) | ||
| return value_df, header_df |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A sensible test for this function would be to take a small input file (you can save it within the repository) and ensure the output of said file is as expected. On the second revision I'll think of a nicer way to format the main 'work loop' here.
|
On top of this, I'd perhaps specify more explicitly in the PR or the function definition what it does. It's harder to gauge if the code is correct without having a good understanding of the input/output 😸 |
add processing_utils_lecroy.py to the packs to be used within proc.py and add the first parser needed for the processing