Updated Comparison Tool Parts 1-3: Refactored for New ALARA Output Parser#154
Updated Comparison Tool Parts 1-3: Refactored for New ALARA Output Parser#154gonuke merged 11 commits intosvalinn:mainfrom
Conversation
|
Does this need a rebase for the |
gonuke
left a comment
There was a problem hiding this comment.
Just reviewed things in the QA script here while we wait for a rebase.
tools/ALARAJOYWrapper/alarajoy_QA.py
Outdated
| else: | ||
| element, A = isotope.split('-') | ||
| element = element.capitalize() | ||
| return f'$^{{{A}}}\\mathrm{{{element}}}$' |
There was a problem hiding this comment.
This might be simpler and equivalent since mathrm is basically the same as non-math
| return f'$^{{{A}}}\\mathrm{{{element}}}$' | |
| return f'$^{{{A}}}${element}' |
tools/ALARAJOYWrapper/alarajoy_QA.py
Outdated
| # Preprocess data to user specifications | ||
| all_labels = set() | ||
| all_data = [] | ||
| for df_dict in df_dicts: | ||
| adf = df_dict[data_key] | ||
| times = adf.process_time_vals(seconds=seconds) | ||
| adf = adf.T | ||
| for col in adf.columns: | ||
| label_text = f"{adf[col].iloc[0]}" | ||
| all_labels.add(label_text) | ||
|
|
||
| all_data.append((df_dict, adf, times)) | ||
|
|
||
| labels_sorted = sorted(all_labels) | ||
|
|
||
| cmap = plt.cm.get_cmap('Dark2') | ||
| color_map = {lbl: cmap(i % cmap.N) for i, lbl in enumerate(labels_sorted)} |
There was a problem hiding this comment.
This might be good in a function of its own
tools/ALARAJOYWrapper/alarajoy_QA.py
Outdated
| color = color_map[label_text] | ||
| label = label_text | ||
| if data_comp: | ||
| label = f"{label_text} ({df_dict['Run Label']})" | ||
|
|
||
| ax.plot( | ||
| times, | ||
| list(adf[col])[1:], | ||
| label=label, | ||
| color=color, | ||
| linestyle=linestyle, | ||
| ) |
There was a problem hiding this comment.
| color = color_map[label_text] | |
| label = label_text | |
| if data_comp: | |
| label = f"{label_text} ({df_dict['Run Label']})" | |
| ax.plot( | |
| times, | |
| list(adf[col])[1:], | |
| label=label, | |
| color=color, | |
| linestyle=linestyle, | |
| ) | |
| label_suffix = "" | |
| if data_comp: | |
| label_suffix = f" ({df_dict['Run Label']})" | |
| ax.plot( | |
| times, | |
| list(adf[col])[1:], | |
| label=label_text + label_suffix, | |
| color=color_map[label_text], | |
| linestyle=linestyle, | |
| ) |
tools/ALARAJOYWrapper/alarajoy_QA.py
Outdated
|
|
||
| ax.set_title(title_prefix + title_suffix) | ||
| if not relative: | ||
| ax.set_ylabel(f'{df_dict['Variable']} [{df_dict['Unit']}]') |
There was a problem hiding this comment.
What's the ylabel if it is relative?
8fc31b0 to
6e00665
Compare
gonuke
left a comment
There was a problem hiding this comment.
Thanks for all the work on this. I have a lot of high-level thoughts about the data model, and maybe they'll evolve over the development of this capability.
tools/ALARAJOYWrapper/pyalara.py
Outdated
| import subprocess | ||
| from string import Template | ||
| from pathlib import Path | ||
| import matplotlib.pyplot as plt |
| adf = df_dict[data_key] | ||
| times = adf.process_time_vals(seconds=seconds) | ||
| adf = adf.T | ||
| for col in adf.columns: |
There was a problem hiding this comment.
Now that this is transposed, each column is a different nuclide, right? (and possibly a total)
Maybe we can note that:
| for col in adf.columns: | |
| for nuc in adf.columns: |
There was a problem hiding this comment.
If we didn't transpose first, could we just perform this operation on all the entries in the first column? The transpose is not a necessary step yet (although it may make life easier for plotting)
| (Defaults to True) | ||
|
|
||
| Returns: | ||
| all_data (list of tuples): A list of all relevant data for each table |
There was a problem hiding this comment.
What if instead of returning a list of tuples, this method added a new entry for the times to the df_dict for each entry in df_dicts, and collected the labels to make the color_map.
Notwithstanding the discussion about putting some of the metadata into the tables, I like the idea of having the df_dicts cache the processed metadata rather than making new data structures that may be less transparent.
| # Plot data | ||
| for i, (df_dict, adf, times) in enumerate(all_data): | ||
| linestyle = line_styles[i % len(line_styles)] | ||
| for col in adf.columns: |
There was a problem hiding this comment.
These columns represent different nuclides
| for col in adf.columns: | |
| for nuc in adf.columns: |
1c1f99b to
cd6e1d3
Compare
1bad73f to
37c67e8
Compare
|
It looks like you tried a rebase here, but all the changes from the previous PR are still here. |
37c67e8 to
1867789
Compare
I just rebased again to |
gonuke
left a comment
There was a problem hiding this comment.
This looks pretty good.
Mostly minor comments here.
| if head: | ||
| sort_by_time = aop.extract_time_vals([sort_by_time])[0] | ||
| piv = piv.sort_values(sort_by_time, ascending=False).head(head) |
There was a problem hiding this comment.
- why does this sorting only happen if
head? - the sorting would be a good stand alone function as I anticipate a user wanting to sort a dataframe and although it's only a couple of lines, it's not super obvious how
| for nuc in piv.index: | ||
| all_nucs.add(nuc) |
There was a problem hiding this comment.
| for nuc in piv.index: | |
| all_nucs.add(nuc) | |
| all_nucs.add(set(piv.index)) |
| all_nucs.add(nuc) | ||
|
|
||
| nucs_sorted = sorted(all_nucs) | ||
| cmap = plt.cm.get_cmap('Dark2') |
There was a problem hiding this comment.
Maybe add a cmap as an input to this function, with default of 'Dark2'?
| nucs_sorted = sorted(all_nucs) | ||
| cmap = plt.cm.get_cmap('Dark2') | ||
|
|
||
| return {lbl: cmap(i % cmap.N) for i, lbl in enumerate(nucs_sorted)} |
There was a problem hiding this comment.
Might not need nucs_sorted if it's only used once
| return {lbl: cmap(i % cmap.N) for i, lbl in enumerate(nucs_sorted)} | |
| return {lbl: cmap(i % cmap.N) for i, lbl in enumerate(sorted(all_nucs))} |
| else: | ||
| element, A = isotope.split('-') | ||
| element = element.capitalize() | ||
| return f'$^{{{A}}}${element}' |
| for run_lbl, times, filtered, piv, linestyle in data_list: | ||
|
|
There was a problem hiding this comment.
| for run_lbl, times, filtered, piv, linestyle in data_list: | |
| for run_lbl, times, filtered, piv, linestyle in data_list: |
| ) | ||
| data_list.append((run_lbl, times, filtered, piv, linestyle)) | ||
|
|
||
| color_map = build_color_map([piv for (_, _, _, piv, _) in data_list]) |
There was a problem hiding this comment.
| color_map = build_color_map([piv for (_, _, _, piv, _) in data_list]) | |
| color_map = build_color_map([data[3] for data in data_list]) |
| sorted(times), | ||
| piv.loc[nuc].tolist(), |
There was a problem hiding this comment.
Sorting the times separately from the values seems fragile
|
|
||
| if data_comp: | ||
| title_prefix = ( | ||
| f'{run_lbls[0]}, {run_lbls[1]} Comparison: \n' |
There was a problem hiding this comment.
Do you want to support more than 2?
gonuke
left a comment
There was a problem hiding this comment.
LGTM - thanks @eitan-weinstein
Closes #168 .
Closes #169 .
This PR implements changes made to
tools/alara_output_parser.pyforalarajoy_QA, as far as the most up-to-date version, as well as the newest plotting suggestions otherwise in #144 . This PR partially completes #151, however, I will be making adjustments to PRs #145, #146, and #147 as well to get them up to date with the new methods developed in #153.