-
Notifications
You must be signed in to change notification settings - Fork 538
[Prep-refactor-2] Refactor processing of differential logic and group code #720
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
This change is part of the following stack: Change managed by git-spice. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request is a significant refactoring effort that improves the codebase's structure by grouping related logic into new modules. The separation of validation, data cleaning, and feature tagging logic is well-executed and enhances maintainability. The refactoring of the initialization logic in the classifier and regressor is also a good improvement. I've identified a couple of minor issues: an incorrect TODO comment and a misleading error message, for which I've provided suggestions. Overall, this is a high-quality refactoring.
|
Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits. |
|
So all in all it feels like the changes are a good step towards what we want to do. It is indeed very necessary to start splitting into more logical files rather than what we had. A few general comments:
Anyway - all of this is very subjective, and shouldn't stop us from moving fast. As long as (1) holds, feel free to go ahead. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR refactors preprocessing and validation logic by consolidating shared code between classifier and regressor, separating differentiable and standard input paths, and organizing functionality into dedicated modules. The changes improve code maintainability without altering functionality.
Key changes:
- Created
validation.pymodule to centralize input validation logic - Split classifier initialization into separate methods for differentiable vs standard inputs
- Introduced
tag_features_and_sanitize_datahelper to consolidate data preprocessing
Reviewed changes
Copilot reviewed 13 out of 13 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
src/tabpfn/validation.py |
New module centralizing validation logic previously scattered across utils and base |
src/tabpfn/preprocessing/clean.py |
New module for data cleaning operations (dtype fixing, NA processing) |
src/tabpfn/preprocessing/type_detection.py |
New module for categorical feature inference |
src/tabpfn/preprocessing/initialization.py |
New module with tag_features_and_sanitize_data helper |
src/tabpfn/classifier.py |
Split _initialize_dataset_preprocessing into separate methods for differentiable vs standard inputs |
src/tabpfn/regressor.py |
Refactored to use new validation/preprocessing modules and handle ensemble config reuse |
src/tabpfn/utils.py |
Removed validation functions moved to dedicated modules |
src/tabpfn/base.py |
Removed check_cpu_warning function moved to validation module |
tests/test_utils.py |
Updated imports to reflect new module structure |
tests/test_regressor_interface.py |
Fixed mock patch to use mock.patch.object |
changelog/720.added.md |
Added changelog entry |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
Thanks a lot for the review @alanprior ! I did some renaming and commented on the things I kept as is. RE your general remarks:
|
LeoGrin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Love it!!
9930223 to
66557af
Compare
Sounds good! |
This is another broad refactoring without any functionality change or introducing any new objects.
I think it's useful to get an overview and before starting to define better abstractions in the next step.
Changes
classifier.py.