fix: skip non-numeric columns in calc_r2#318
Open
drussellmrichie wants to merge 1 commit intolarsiusprime:masterfrom
Open
fix: skip non-numeric columns in calc_r2#318drussellmrichie wants to merge 1 commit intolarsiusprime:masterfrom
drussellmrichie wants to merge 1 commit intolarsiusprime:masterfrom
Conversation
calc_r2 called data[var].astype(float) on every variable without
checking whether the column is numeric. Any string/categorical column
(e.g. luc, bldg_com_struct) raised:
ValueError: could not convert string to float: '100A'
Add a pd.api.types.is_numeric_dtype guard immediately after the
existing ill-posed-model check. Non-numeric vars now get NaN R² and
are skipped cleanly, consistent with the rest of the function's
error-handling pattern.
Contributor
|
Thank you for your contribution. I affirm that this contributor has signed the CLA You can retrigger this bot by commenting recheck in this Pull Request. Posted by the CLA Assistant Lite bot. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Bug
calc_r2inopenavmkit/utilities/stats.pycallsdata[var].astype(float)on every variable inind_varswithout first checking whether the column is numeric. Any string/categorical column triggers:This crash occurs in practice when
ind_varsincludes columns likeluc(land use code) orbldg_com_struct(commercial structure type), which are legitimate categorical features for LightGBM models but cannot be cast to float for OLS R² computation.Fix
Add a
pd.api.types.is_numeric_dtypeguard immediately after the existing ill-posed-model check (thelen(data) < 3 or nunique() < 2block). Non-numeric variables now receiveNaNR² / adj-R² / coef_sign and are skipped cleanly viacontinue— consistent with the function's existing error-handling pattern for ill-posed models.Before / After
Before:
After:
Notes
pd.api.types.is_numeric_dtypeis already available (pandas is a core dependency).