Conversation
| empty_cols = {col for col in table.columns if table[col].dropna().empty} | ||
| empty_cols = ( | ||
| # if NA values should be considered valid, then we can skip empty columns | ||
| {col for col in table.columns if table[col].isna().all()} if skipna else {} |
There was a problem hiding this comment.
empty_cols is also used to restore tests for columns that are all-NA in the first chunk but have data in later chunks (lines 222-228). Setting it to {} when skipna=False breaks that restoration mechanism — those columns will never be tested even if later batches contain data.
I have a local test I can push on your branch to expose this bug (and avoid it in the futur?)
There was a problem hiding this comment.
I am not sure I see the problem: setting empty_cols to {} here is the same as considering all columns have non-NA values (which could happen), and the rest of the function works the same afterwards. We'll never enter the loop at line 222, and rebuild remaining_tests_per_col at line 250, at the end of the batch
Also changed the empty column test for a ~10% faster method