Conversation
gonuke
left a comment
There was a problem hiding this comment.
I really like this change and thanks for introducing the object oriented approach - it will make future use of this all much cleaner!!!
I've made a few design suggestions based on these changes. Happy to discuss more if you have concerns.
tools/alara_output_processing.py
Outdated
| import argparse | ||
| from io import StringIO | ||
|
|
||
| class TableParser: |
There was a problem hiding this comment.
This is probably more of a FileParser
tools/alara_output_processing.py
Outdated
| # ---------- Utility and Helper Methods ---------- | ||
|
|
||
| @staticmethod | ||
| def normalize_header(header_line: str): |
There was a problem hiding this comment.
Might want all of these "internal" methods to start with a "_" to indicate that they are intended for internal use only
tools/alara_output_processing.py
Outdated
| key = f'{current_parameter} - {current_block}' | ||
| self.results[key] = df | ||
|
|
||
| def parse_output(self): |
There was a problem hiding this comment.
Maybe we can call this extract_tables()
tools/alara_output_processing.py
Outdated
| } | ||
|
|
||
| @staticmethod | ||
| def process_time_vals(df, seconds=True): |
There was a problem hiding this comment.
It occurs to me that we could make a new class that is derived from DataFrame and make some of these methods be applied to those objects.
class ALARADFrame(DataFrame):
Then your parser could create ALARADFrame objects and we could call
times = adf.process_time_vals()
without passing the DataFrame.
I saw this with the various methods (and potentially more in the future?) that all take a DataFrame as their first argument.
tools/alara_output_processing.py
Outdated
| other_row[df.columns[0]] = 'Other' | ||
| other_row[cols] = df.loc[small_mask, cols].sum() | ||
|
|
||
| mask_df = df.loc[~small_mask].reset_index(drop=True) |
There was a problem hiding this comment.
Does this change the df and also copy it to mask_df? That is, will df be changed by this method still?
There was a problem hiding this comment.
It does not change the df, it operates on a copy.
tools/alara_output_processing.py
Outdated
| dfs (list of dicts): List of dictionaries containing ALARA output | ||
| DataFrames and their metadata, of the form: | ||
| df_dict = { | ||
| 'Data Source' : (Either 'fendl2' or 'fendl3'), |
There was a problem hiding this comment.
We probably want to generalize this away from the notion of Data Source. I think this method still makes sense for comparing data that comes from different runs of ALARA for whatever reason, but they may not be because of Data Source
tools/alara_output_processing.py
Outdated
| if isinstance(data_source, pd.DataFrame): | ||
| dfs.update( | ||
| cls.make_entry( | ||
| inp_datalib, inp_variable, inp_unit, data_source | ||
| ) | ||
| ) | ||
| return dfs |
There was a problem hiding this comment.
Maybe we just have users call make_entry directly in these cases, and only use this method for processing multiple output files? Then we could have one method called make_entry and one called make_entries? Or something like that...
tools/alara_output_processing.py
Outdated
| filename = self.sanitize_filename(key) + '.csv' | ||
| df.to_csv(filename, index=False) | ||
|
|
||
| class DataProcessing: |
There was a problem hiding this comment.
If we separate out the ALARADFrame to its own class, then maybe this one can be called DataLibrary?
|
Did you mean for this PR to now also include some changes to |
Oh no, that was accidental. I am working on another PR for the implementation of the changes in |
gonuke
left a comment
There was a problem hiding this comment.
One last little change for future readability
tools/alara_output_processing.py
Outdated
| parser = FileParser(args().filepath[0]) | ||
| parser.extract_tables() | ||
| parser.write_csv_files() |
There was a problem hiding this comment.
Minor suggestion on naming to make it easier to understand what's happening
| parser = FileParser(args().filepath[0]) | |
| parser.extract_tables() | |
| parser.write_csv_files() | |
| alara_data = FileParser(args().filepath[0]) | |
| alara_data.extract_tables() | |
| alara_data.write_csv_files() |
gonuke
left a comment
There was a problem hiding this comment.
LGTM - thanks @eitan-weinstein
Closes #150.
This PR changes the current Python script
tools/alara_output_parser.pyto being a more robust postprocessor,tools/alara_output_processing.py. This is done by joining some of the pure data processing methods fromtools/ALARAJOYWrapper/alarajoy_QA.pythat have been merged already and from #144.To combine these two scripts, I also rewrote
alara_output_processing.pyas object-oriented, per our discussion yesterday (although we hadn't come to a firm conclusion on whether or not it would be necessary). I created two classesTableParserandDataProcessingto further separate concerns between the actual reading of the ALARA output data and then the handling of said data from the former QA methods.Currently, the
__main__method only encapsulates the parsing functionality, more or less as it did before, with my thinking that that is something that can be valuable as a standalone usage in converting the output tables to CSVs, whereas the actual data processing would be best used within other applications, such as a new QA Jupyter notebook that I am going to work on in upcoming PRs or in other postprocessing applications that the user wants to create.I'll note that this is my first time converting a program to an object-oriented paradigm, so it's possible that I've missed some programmatic conventions that I am unaware of, but I think in principle, this change is worthwhile to create this combined module.