Comparison Tool 4.5: Grid of Pie Charts across Cooling Times#179
Comparison Tool 4.5: Grid of Pie Charts across Cooling Times#179gonuke merged 25 commits intosvalinn:mainfrom
Conversation
gonuke
left a comment
There was a problem hiding this comment.
Nicely formatted and put together here... I have a number of suggestions to
- help separate concerns a little and
- take advantage of dataframes without creating so many new data structures
| # Build and populate table | ||
| table_ax = fig.add_subplot(gs[0,:]) | ||
| table_ax.axis('off') | ||
| table_data = populate_table_rows(agg, all_nucs, times) |
There was a problem hiding this comment.
Is there a reason this can't just be a pivoted dataframe, instead of the list of lists that you generate? You could then use pandas.plotting.table() perhaps?
There was a problem hiding this comment.
No reason other that I just didn't know about pandas.plotting.table() until now!
There was a problem hiding this comment.
Full disclosure to reveal my through process: I didn't know about it either. I just had a hunch that once we had a dataframe in the right shape (which pivoting does), it should be possible to sort it easily and then guessed that it was possible to generate a "pretty" version in output, so went hunting/googling for that....
There was a problem hiding this comment.
I just updated the PR with a bunch of changes, and after having messed around with the pandas.plotting.table() vs the matplotlib.pyplot.table(), I actually think that using the matplotlib version will work better because it allows for custom row and column labels, whereas the pandas version demanded the labels be the same as those from the input DataFrame, which would add unnecessary steps to arrive at the same end.
| row_labels = [reformat_isotope(nuc) for nuc in all_nucs] | ||
|
|
||
| table_data_float = np.array([ | ||
| [float(cell.strip('%')) for cell in row] for row in table_data |
There was a problem hiding this comment.
It looks like you add the "%" when you process the data only to remove it here. Why not just put the float data in the table in the first place?
| @@ -87,7 +90,7 @@ def preprocess_data( | |||
|
|
|||
| return times, filtered, piv | |||
There was a problem hiding this comment.
In hindsight, it's not clear to me why return times since those times will be available in filtered (and piv). From a data integrity point of view, it always seems safest to extract the times from those dataframes rather than pass them around as a separate data object
|
|
||
| return times, agg | ||
|
|
||
| def populate_table_rows(agg, all_nucs, times): |
There was a problem hiding this comment.
see comment below, but it's not clear why you need this method when you could just use a dataframe rather than a list of lists?
| # Create table gridspec | ||
| left = 0.05 | ||
| top = 0.95 | ||
| gs = fig.add_gridspec( | ||
| nrows + 1, ncols, | ||
| height_ratios=([1.0] + [1]*nrows), | ||
| hspace=0.4, | ||
| wspace=0.3, | ||
| left=left, right=(1-left), top=top, bottom=(1-top) | ||
| ) |
There was a problem hiding this comment.
Separation of concerns... I think the GridSpec should be generated at a different level, and then the specific table_ax passed into this method to populate it.
| None | ||
| ''' | ||
|
|
||
| times, agg = pie_chart_aggregation( |
There was a problem hiding this comment.
separation of concerns: could we have the user do this aggregation and then pass the result into the method to generate the pie chart?
| # Color in index | ||
| table[(i+1, -1)].set_facecolor(rgba) | ||
|
|
||
| # Color all other columns | ||
| for j in range(len(times)): | ||
| table[(i+1, j)].set_facecolor(rgba) |
There was a problem hiding this comment.
These can be combined into a single loop
| # Color in index | |
| table[(i+1, -1)].set_facecolor(rgba) | |
| # Color all other columns | |
| for j in range(len(times)): | |
| table[(i+1, j)].set_facecolor(rgba) | |
| # Color all columns in this row | |
| for j in range(len(times)+1): | |
| table[(i+1, j-1)].set_facecolor(rgba) |
| table.scale(1.0, 2.0) | ||
|
|
||
| # Format table for readability | ||
| for i in range(len(row_labels)): |
There was a problem hiding this comment.
prefer row and col vs i and j... or at least r and c
| if np.any(fractions == 1.0): | ||
| for nuc, frac in zip(time_slice['nuclide'], fractions): | ||
| labels.append(reformat_isotope(nuc) if frac == 1.0 else '') | ||
| return labels |
There was a problem hiding this comment.
Isn't this edge case already covered by the case below, as long as you handle Other more carefully? Maybe the condition is:
if frac >= threshold or (nuc == 'Other' && frac > 0):
gonuke
left a comment
There was a problem hiding this comment.
Just a few more things to clarify...
|
|
||
| return {lbl: cmap(i % cmap.N) for i, lbl in enumerate(sorted(all_nucs))} | ||
|
|
||
| def build_color_map_from_nucs(all_nucs, cmap_name): |
There was a problem hiding this comment.
It's not obvious how this is substantially different from build_color_map_from_pivs?
| None | ||
| ''' | ||
|
|
||
| table_ax = fig.add_subplot(gs[0,:]) |
There was a problem hiding this comment.
I think it makes sense to do this before calling this method, and pass in table_ax. Then it's possible for someone to decide where in the gridspec if goes before calling this
| plt.tight_layout(rect=[0, 0, 0.85, 1]) | ||
| plt.show() | ||
|
No newline at end of file |
||
| plt.close(fig) |
There was a problem hiding this comment.
Do we need to close the figure? Is it a scope issue... because we created the figure here in this function?
| if np.array(all_nucs).size == 0: | ||
| if pivs: | ||
| all_nucs = set() | ||
| for piv in pivs: | ||
| all_nucs.update(piv.index) | ||
|
|
||
| else: | ||
| raise ValueError( | ||
| 'Must input either all_nucs or pivs.' | ||
| ) |
There was a problem hiding this comment.
| if np.array(all_nucs).size == 0: | |
| if pivs: | |
| all_nucs = set() | |
| for piv in pivs: | |
| all_nucs.update(piv.index) | |
| else: | |
| raise ValueError( | |
| 'Must input either all_nucs or pivs.' | |
| ) | |
| all_nucs = set(all_nucs) | |
| if len(all_nucs) == 0: | |
| for piv in pivs: | |
| all_nucs.update(piv.index) | |
| if len(all_nucs) == 0: | |
| raise ValueError( | |
| 'Must input either all_nucs or pivs.' | |
| ) |
| times = sorted(agg['time'].unique()) | ||
| N = len(times) | ||
| nrows = int(np.ceil(N / ncols)) | ||
| fig = plt.figure(figsize=(5 * ncols + 3, 5 * (nrows + 0.7))) |
There was a problem hiding this comment.
What are these magic numbers 5, 3, 0.7?
There was a problem hiding this comment.
Simply formatting values that made the plot come out looking nice. Would you prefer they be optional parameters that the user can assign?
There was a problem hiding this comment.
or give them some name that evokes some kind of meaning...
padding = 5
vspace = 3
hspace = 4.5
|
|
||
| return wedges | ||
|
|
||
| def open_with_new_manager(fig): |
9ba9ddd to
1bea9d2
Compare
gonuke
left a comment
There was a problem hiding this comment.
LGTM - thanks @eitan-weinstein
Closes #174.
Expanding on pie chart functionality added in #145 to create a grid of pie charts for all cooling times for a given run and response variable. An example of what one of these grids looks like is attached: