Skip to content

Spells#567

Merged
remicousin merged 4 commits intomasterfrom
spells
Jan 30, 2026
Merged

Spells#567
remicousin merged 4 commits intomasterfrom
spells

Conversation

@remicousin
Copy link
Contributor

Little suite of spell related functions needed by PepsiCo. They all are meant to then feed "seasonal" functions to be applied to multiple years for same season of the year. The core of the word comes from enacts' longest_run_length.

The "public" functions are a bit dummy, but the "private" one definitely should not be callable because it returns something a bit funky. Not sure if there's a better way to engineer this.

Tests produced (also heavily inspired from enacts')

@aaron-kaplan
Copy link
Collaborator

Recording the highlights of our conversation...

_cumsum_flagged_diff[n] represents the length of the spell that starts at position n+1, except for n == 0, where it represents either the length of the spell that starts at position 0 or the spell that starts at position 1. It seems like this loss of information is acceptable if all we need is to find the number and length of spells (there can't be a spell starting at 0 and a different spell starting at 1), but if we ever want to know the dates of the spells, we would have to switch to a different implementation.

An alternative would be padding the sequence with zeros at the left end. The current implementation is an optimization that avoids the cost of padding (= copying) the array.

It might be worth putting some of this into a comment in the code.

@aaron-kaplan
Copy link
Collaborator

Taking inspiration from https://stackoverflow.com/questions/13828599/generalized-cumulative-functions-in-numpy-scipy ...

spells = np.frompyfunc(lambda x, y: 0 if y == 0 else x + y)
run_length_so_far = spells.accumulate(flagged_data['dim'])

seems a more straightforward way to calculate the length of each run. (Some post-processing needed to keep only the last value in each run--.diff and then keep only the negative values?). I'm curious to know if this is slow because it calls into python on each iteration, of if frompyfunc manages to make it fast.

@remicousin
Copy link
Contributor Author

remicousin commented Nov 13, 2025

I came up with an alternative that doesn't use diff which looses a point, and that doesn't use an initial concat. Still requires some manipulation to get things right at the end.

test__cumsum_flagged_diff took 1.08s
while
test_spells_length took 1.12s

I imagine now that spell_length needs not be hidden function and in that case, I may not need to make all the derived calculations as functions as they can somewhat trivially be written live.

Then depending on review, will add appropriate doc

@remicousin
Copy link
Contributor Author

@xchourio please review as Aaron will be left out of calculations PRs in the future. I think we should review each other because in case one leaves IRI before the other, we keep familiar with the other's work and can pick up more easily. In that vein, you might want to start having me PR-review your PepsiCo work around the crop app.

@remicousin remicousin merged commit 614de6a into master Jan 30, 2026
1 check passed
@remicousin remicousin deleted the spells branch January 30, 2026 18:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants

Comments