Skip to content

Only skip empty columns if skipna=True#242

Open
Pierlou wants to merge 2 commits intomainfrom
fix/consider-skipna
Open

Only skip empty columns if skipna=True#242
Pierlou wants to merge 2 commits intomainfrom
fix/consider-skipna

Conversation

@Pierlou
Copy link
Copy Markdown
Collaborator

@Pierlou Pierlou commented Apr 7, 2026

Also changed the empty column test for a ~10% faster method

@Pierlou Pierlou requested a review from ThibaudDauce April 7, 2026 14:03
empty_cols = {col for col in table.columns if table[col].dropna().empty}
empty_cols = (
# if NA values should be considered valid, then we can skip empty columns
{col for col in table.columns if table[col].isna().all()} if skipna else {}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

empty_cols is also used to restore tests for columns that are all-NA in the first chunk but have data in later chunks (lines 222-228). Setting it to {} when skipna=False breaks that restoration mechanism — those columns will never be tested even if later batches contain data.

I have a local test I can push on your branch to expose this bug (and avoid it in the futur?)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure I see the problem: setting empty_cols to {} here is the same as considering all columns have non-NA values (which could happen), and the rest of the function works the same afterwards. We'll never enter the loop at line 222, and rebuild remaining_tests_per_col at line 250, at the end of the batch

Comment thread csv_detective/parsing/columns.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants