Skip to content

Create ALARA Post-Processor#153

Merged
gonuke merged 6 commits intosvalinn:mainfrom
eitan-weinstein:post-processing-ops
Nov 11, 2025
Merged

Create ALARA Post-Processor#153
gonuke merged 6 commits intosvalinn:mainfrom
eitan-weinstein:post-processing-ops

Conversation

@eitan-weinstein
Copy link
Contributor

Closes #150.

This PR changes the current Python script tools/alara_output_parser.py to being a more robust postprocessor, tools/alara_output_processing.py. This is done by joining some of the pure data processing methods from tools/ALARAJOYWrapper/alarajoy_QA.py that have been merged already and from #144.

To combine these two scripts, I also rewrote alara_output_processing.py as object-oriented, per our discussion yesterday (although we hadn't come to a firm conclusion on whether or not it would be necessary). I created two classes TableParser and DataProcessing to further separate concerns between the actual reading of the ALARA output data and then the handling of said data from the former QA methods.

Currently, the __main__ method only encapsulates the parsing functionality, more or less as it did before, with my thinking that that is something that can be valuable as a standalone usage in converting the output tables to CSVs, whereas the actual data processing would be best used within other applications, such as a new QA Jupyter notebook that I am going to work on in upcoming PRs or in other postprocessing applications that the user wants to create.

I'll note that this is my first time converting a program to an object-oriented paradigm, so it's possible that I've missed some programmatic conventions that I am unaware of, but I think in principle, this change is worthwhile to create this combined module.

Copy link
Member

@gonuke gonuke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like this change and thanks for introducing the object oriented approach - it will make future use of this all much cleaner!!!

I've made a few design suggestions based on these changes. Happy to discuss more if you have concerns.

import argparse
from io import StringIO

class TableParser:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably more of a FileParser

# ---------- Utility and Helper Methods ----------

@staticmethod
def normalize_header(header_line: str):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might want all of these "internal" methods to start with a "_" to indicate that they are intended for internal use only

key = f'{current_parameter} - {current_block}'
self.results[key] = df

def parse_output(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can call this extract_tables()

}

@staticmethod
def process_time_vals(df, seconds=True):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It occurs to me that we could make a new class that is derived from DataFrame and make some of these methods be applied to those objects.

class ALARADFrame(DataFrame):

Then your parser could create ALARADFrame objects and we could call

times = adf.process_time_vals()

without passing the DataFrame.

I saw this with the various methods (and potentially more in the future?) that all take a DataFrame as their first argument.

other_row[df.columns[0]] = 'Other'
other_row[cols] = df.loc[small_mask, cols].sum()

mask_df = df.loc[~small_mask].reset_index(drop=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this change the df and also copy it to mask_df? That is, will df be changed by this method still?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does not change the df, it operates on a copy.

dfs (list of dicts): List of dictionaries containing ALARA output
DataFrames and their metadata, of the form:
df_dict = {
'Data Source' : (Either 'fendl2' or 'fendl3'),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably want to generalize this away from the notion of Data Source. I think this method still makes sense for comparing data that comes from different runs of ALARA for whatever reason, but they may not be because of Data Source

Comment on lines +364 to +370
if isinstance(data_source, pd.DataFrame):
dfs.update(
cls.make_entry(
inp_datalib, inp_variable, inp_unit, data_source
)
)
return dfs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we just have users call make_entry directly in these cases, and only use this method for processing multiple output files? Then we could have one method called make_entry and one called make_entries? Or something like that...

filename = self.sanitize_filename(key) + '.csv'
df.to_csv(filename, index=False)

class DataProcessing:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we separate out the ALARADFrame to its own class, then maybe this one can be called DataLibrary?

@gonuke
Copy link
Member

gonuke commented Nov 10, 2025

Did you mean for this PR to now also include some changes to ALARAjoy_QA?

@eitan-weinstein
Copy link
Contributor Author

Did you mean for this PR to now also include some changes to ALARAjoy_QA?

Oh no, that was accidental. I am working on another PR for the implementation of the changes in alara_output_processing and unintentionally included them here. Thanks for catching that.

Copy link
Member

@gonuke gonuke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One last little change for future readability

Comment on lines +366 to +368
parser = FileParser(args().filepath[0])
parser.extract_tables()
parser.write_csv_files()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor suggestion on naming to make it easier to understand what's happening

Suggested change
parser = FileParser(args().filepath[0])
parser.extract_tables()
parser.write_csv_files()
alara_data = FileParser(args().filepath[0])
alara_data.extract_tables()
alara_data.write_csv_files()

Copy link
Member

@gonuke gonuke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - thanks @eitan-weinstein

@gonuke gonuke merged commit cffafb7 into svalinn:main Nov 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Create a python module with many single post-processing operations

2 participants