Skip to content

Conversation

@MattZur
Copy link
Collaborator

@MattZur MattZur commented Oct 10, 2025

add processing_utils_lecroy.py to the packs to be used within proc.py and add the first parser needed for the processing

@MattZur MattZur requested a review from jwaiton October 10, 2025 15:33
Copy link
Member

@jwaiton jwaiton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good looking PR! Just needs a bit more documentation and some alterations to the readability in due time.

"""

def parse_lecroy_segmented(lines):
# Line 1 has to have: Segments,1000,SegmentSize,5002
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add documentation explaining what the function does, the input parameters and the expected output.

An example can be seen here

This file holds all the relevant functions for the processing of data from csv files to h5.
"""

def parse_lecroy_segmented(lines):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Best practices (that I want to implement somewhat retroactively) is to include type-checking for all functions.

In your case that would look like:

def parse_lecroy_segmented(lines  :  str) --> Tuple(pd.DataFrame, pd.DataFrame):

If lines is a string, otherwise use the correct type

You'll have to import Typing for tuples:
from typing import Tuple

Comment on lines +16 to +17
segments = int(lines[1][1])
seg_size = int(lines[1][3])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps:

Suggested change
segments = int(lines[1][1])
seg_size = int(lines[1][3])
segments, seg_size = int(lines[1][1]), int(lines[1][3])

but this is picky, and perhaps a bit more unreadable. The choice is yours 🐱

Comment on lines +27 to +31
# Find the "Time,Ampl" line
for i, line in enumerate(lines):
if line[0].strip() == "Time":
data_start = i + 1
break
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the "Time,Ampl" line inconsistent within the data you use? Like, does it sometimes occur 5 lines in, and other times 10 lines?

for k in range(seg_size):
x = j * seg_size + k
if x >= len(raw_data): # x = line in the file
segment_data.append(None)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This means that a segment will be lost if it is a bit shorter than expected right? Can you quantify how many events you lose this way? Either through a test or otherwise.

value_list.append(segment_data)

value_df = pd.DataFrame(value_list)
return value_df, header_df
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A sensible test for this function would be to take a small input file (you can save it within the repository) and ensure the output of said file is as expected. On the second revision I'll think of a nicer way to format the main 'work loop' here.

@jwaiton
Copy link
Member

jwaiton commented Oct 14, 2025

On top of this, I'd perhaps specify more explicitly in the PR or the function definition what it does. It's harder to gauge if the code is correct without having a good understanding of the input/output 😸

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants