Fix EVALID year parsing for single-digit state FIPS codes#79
Merged
Conversation
find_evalid() parsed EVALID as fixed-width SSYYTT using positional string slicing, which fails for states with single-digit FIPS codes (AL=1, AR=5). A 5-digit EVALID like 12401 (AL, year=24, type=01) was misparsed as state=12, year=40, causing clip_most_recent() to select 2003 periodic inventory instead of 2024 annual inventory. Replace EVALID string parsing with END_INVYR from POP_EVAL, which is an unambiguous 4-digit year already available in the joined dataframe. Remove the now-unused _add_parsed_evalid_columns() function. Closes #78
Add EVALID as secondary sort key (descending) for deterministic tiebreaking when multiple evaluations share the same END_INVYR. Add unit tests covering single-digit FIPS codes (Alabama, Arkansas), standard 2-digit codes (Georgia), multi-state selection, and tiebreaking behavior.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
find_evalid()parsed EVALID as fixed-widthSSYYTTusing positional string slicing, which fails for states with single-digit FIPS codes (AL=1, AR=5)12401(state=1, year=24, type=01) was misparsed as state=12, year=40 → interpreted as 1940, causingclip_most_recent()to select 2003 periodic inventory instead of 2024 annual dataEND_INVYRfromPOP_EVAL, an unambiguous 4-digit year already in the joined dataframe_add_parsed_evalid_columns()function and itsEVALIDYearParsingimportTest plan
Closes #78