Add multi-output classifier support and refresh examples #207

noahho · 2025-11-12T14:43:01Z

Summary

extend the multi-output helper to cover both regression and classification by sharing TabPFN initialisation logic
refresh the multi-output example to demonstrate both tasks without emphasising imputation in the naming
expand the regression test and add a classification variant to ensure missing value handling works for either backend

Testing

FAST_TEST_MODE=1 pytest tests/test_multioutput.py

gemini-code-assist · 2025-11-12T14:43:20Z

Summary of Changes

Hello @noahho, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the tabpfn_extensions library by adding comprehensive support for multi-output classification, alongside improvements to multi-output regression. It achieves this by introducing a new TabPFNMultiOutputClassifier and refactoring shared initialization logic. The changes also include updated examples and robust tests to validate the functionality, particularly focusing on transparent handling of missing values for both regression and classification tasks.

Highlights

Multi-output Classification Support: Introduced TabPFNMultiOutputClassifier to handle multi-label classification tasks, complementing the existing regression capabilities.
Shared Logic Refactoring: Consolidated initialization logic for multi-output wrappers into a shared mixin, _TabPFNMultiOutputMixin, improving code reusability and maintainability.
Refreshed Examples: Updated the multi-output examples to demonstrate both regression and classification workflows, including synthetic data generation and missing value handling.
Enhanced Testing: Added new test cases for multi-output classification and expanded existing regression tests to ensure robust handling of missing values across both tasks and different TabPFN backends.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces support for multi-output regression and classification through new scikit-learn compatible wrappers, TabPFNMultiOutputRegressor and TabPFNMultiOutputClassifier. The changes include the core implementation, an illustrative example, and new tests. While the feature is a valuable addition, I've identified a critical issue in src/tabpfn_extensions/multioutput.py where the implementation of get_params is incompatible with scikit-learn's clone function, which would break its use in cross-validation or hyperparameter tuning. Additionally, the new tests in tests/test_multioutput.py do not cover this cloning functionality or the case where the estimator is created internally, which is why the bug was missed. My review includes detailed feedback on how to resolve the compatibility issue and improve test coverage.

src/tabpfn_extensions/multioutput.py

tests/test_multioutput.py

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

src/tabpfn_extensions/multioutput.py

Copilot

Pull Request Overview

This pull request extends TabPFN's multi-output capabilities to support both regression and classification tasks through unified wrappers, while refreshing examples to demonstrate both use cases with missing value handling.

Key Changes:

Introduced TabPFNMultiOutputClassifier alongside the existing regressor functionality using a shared mixin pattern for initialization logic
Refactored examples to demonstrate both regression and classification workflows without focusing specifically on imputation
Expanded test coverage to verify missing value handling works correctly for both regression and classification across different backends

Reviewed Changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
src/tabpfn_extensions/multioutput.py	Implements shared mixin class for multi-output wrappers supporting both regression and classification
src/tabpfn_extensions/init.py	Exports new multi-output classifier class alongside existing regressor
tests/test_multioutput.py	Adds comprehensive tests for both regression and classification with missing values across backends
tests/conftest.py	Adds sys.path configuration to ensure test imports work correctly
examples/multioutput/multioutput_prediction.py	Demonstrates both multi-output regression and classification with missing data
examples/multioutput/README.md	Documents the multi-output functionality for both regression and classification
examples/README.md	Updates directory listing to include multioutput examples

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/tabpfn_extensions/multioutput.py

…ion-with-imputation

noahho · 2025-11-12T17:25:02Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces multi-output classifier support by creating a shared mixin, which is a solid design choice. The new examples and tests are also well-structured. I've identified a high-severity bug in the set_params implementation that breaks scikit-learn compatibility and should be addressed. Additionally, I've provided a few medium-severity suggestions to enhance the maintainability of the test setup and improve the code quality within the new example script.

src/tabpfn_extensions/multioutput.py

tests/conftest.py

examples/multioutput/multioutput_prediction.py

Copilot

Pull Request Overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

tests/conftest.py

noahho · 2025-11-12T17:48:11Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces support for multi-output regression and classification by wrapping scikit-learn's MultiOutputRegressor and MultiOutputClassifier. The implementation is clean and correctly integrates with scikit-learn's clone mechanism by handling tabpfn_params. The PR also adds a new example and tests. My review focuses on improving the examples and test coverage. I've pointed out that the example and tests don't actually demonstrate the handling of missing values, despite it being mentioned. More importantly, the new TabPFNMultiOutputClassifier is completely untested. I've provided suggestions to add a test for the classifier and to include missing values in both the example and the existing test case.

gemini-code-assist · 2025-11-12T17:49:13Z

tests/test_multioutput.py

+    assert np.isfinite(cloned_predictions).all()
+
+    cloned_score = r2_score(y, cloned_predictions, multioutput="uniform_average")
+    assert cloned_score > 0.2


The current tests only cover TabPFNMultiOutputRegressor. To ensure the new TabPFNMultiOutputClassifier is also working as expected and is robust, please add a corresponding test case for it. This test should cover fitting, prediction, and cloning, similar to test_multioutput_regression. You can use sklearn.datasets.make_multilabel_classification to generate test data. The absence of tests for the classification wrapper is a significant gap.

gemini-code-assist · 2025-11-12T17:49:13Z

examples/multioutput/multioutput_prediction.py

+)
+
+# ---------------------------------------------------------------------------
+# 1. Multi-output regression with missing features


The comment here states "with missing features", but the synthetic data generated for X_reg does not contain any missing values. This is misleading for users looking at this example. To align the comment with the code, you could either remove "with missing features" or, preferably, demonstrate the model's capability by actually introducing some missing values into the training data.

For example, you could add this after the train_test_split call:

# Introduce some missing values to demonstrate robustness rng = np.random.default_rng(0) missing_mask = rng.random(X_reg_train.shape) < 0.1 X_reg_train[missing_mask] = np.nan

gemini-code-assist · 2025-11-12T17:49:13Z

tests/test_multioutput.py

+    X, y = make_regression(
+        n_samples=30,
+        n_features=4,
+        n_targets=2,
+        n_informative=4,
+        noise=0.2,
+        random_state=1,
+    )


The PR description and example file mention handling of missing values, but the test data created here is complete. To properly verify that the multi-output wrappers work correctly with missing data, it would be beneficial to introduce some missing values (np.nan) into the X array. This would ensure that the underlying TabPFN estimator's missing value handling is correctly leveraged through the wrapper.

For example, you can add this line after the make_regression call:

X[::5, 0] = np.nan # Introduce some missing values

bejaeger

Nice! Just have a few nits. Otherwise LGTM!

bejaeger · 2025-11-17T08:50:36Z

examples/multioutput/multioutput_prediction.py

+    random_state=1,
+)
+
+X_clf = X_clf.astype(np.float32)


why convert here but not for regression above?

bejaeger · 2025-11-17T08:51:17Z

src/tabpfn_extensions/multioutput.py

@@ -0,0 +1,75 @@
+"""Utilities for multi-output learning with TabPFN."""


Suggested change

"""Utilities for multi-output learning with TabPFN."""

"""Wrapper for multi-output learning with TabPFN."""

bejaeger · 2025-11-17T09:03:55Z

src/tabpfn_extensions/multioutput.py

+    ) -> dict[str, Any]:  # pragma: no cover - delegating to sklearn
+        """Return parameters for this estimator with TabPFN kwargs included."""
+        params = super().get_params(deep=deep)
+        if getattr(self, "_estimator_is_default", False):


Suggested change

if getattr(self, "_estimator_is_default", False):

if self._estimator_is_default:

We always set the variable so would be a bit more readable.

bejaeger · 2025-11-17T09:05:14Z

src/tabpfn_extensions/multioutput.py

+        self,
+        estimator: _EstimatorT | None = None,
+        *,
+        n_jobs: int | None = None,


Suggested change

n_jobs: int | None = None,

n_preprocessing_jobs: int | None = None,

We are deprecating n_jobs right now. Please change to n_preprocessing_jobs across the board.

Add multi-output classifier support and refresh examples

c019ef6

noahho requested a review from a team as a code owner November 12, 2025 14:43

noahho requested review from bejaeger and Copilot and removed request for a team November 12, 2025 14:43

noahho added the codex label Nov 12, 2025 — with ChatGPT Codex Connector

Copilot started reviewing on behalf of noahho November 12, 2025 14:43 View session

gemini-code-assist bot reviewed Nov 12, 2025

View reviewed changes

src/tabpfn_extensions/multioutput.py Outdated Show resolved Hide resolved

tests/test_multioutput.py Outdated Show resolved Hide resolved

chatgpt-codex-connector bot reviewed Nov 12, 2025

View reviewed changes

src/tabpfn_extensions/multioutput.py Outdated Show resolved Hide resolved

Copilot finished reviewing on behalf of noahho November 12, 2025 14:46

Copilot AI reviewed Nov 12, 2025

View reviewed changes

src/tabpfn_extensions/multioutput.py Outdated Show resolved Hide resolved

src/tabpfn_extensions/multioutput.py Show resolved Hide resolved

Fix multi-output wrappers cloning and tests

45c8c25

noahho closed this Nov 12, 2025

noahho reopened this Nov 12, 2025

Merge branch 'main' into codex/create-example-for-multioutput-predict…

013dd39

…ion-with-imputation

noahho requested a review from Copilot November 12, 2025 17:24

Copilot started reviewing on behalf of noahho November 12, 2025 17:25 View session

gemini-code-assist bot reviewed Nov 12, 2025

View reviewed changes

src/tabpfn_extensions/multioutput.py Outdated Show resolved Hide resolved

tests/conftest.py Outdated Show resolved Hide resolved

examples/multioutput/multioutput_prediction.py Show resolved Hide resolved

examples/multioutput/multioutput_prediction.py Show resolved Hide resolved

Copilot finished reviewing on behalf of noahho November 12, 2025 17:27

Copilot AI reviewed Nov 12, 2025

View reviewed changes

tests/conftest.py Outdated Show resolved Hide resolved

noahho added 4 commits November 12, 2025 18:39

- fix codex

ba93368

- fix codex

3a203c8

- fix codex

c8496e5

- fix codex

be0c93b

noahho mentioned this pull request Nov 12, 2025

Multi-output regression PriorLabs/TabPFN#248

Open

gemini-code-assist bot reviewed Nov 12, 2025

View reviewed changes

bejaeger approved these changes Nov 17, 2025

View reviewed changes

		@@ -0,0 +1,75 @@
		"""Utilities for multi-output learning with TabPFN."""

	"""Utilities for multi-output learning with TabPFN."""
	"""Wrapper for multi-output learning with TabPFN."""

	if getattr(self, "_estimator_is_default", False):
	if self._estimator_is_default:

	n_jobs: int \| None = None,
	n_preprocessing_jobs: int \| None = None,

Add multi-output classifier support and refresh examples #207

Are you sure you want to change the base?

Add multi-output classifier support and refresh examples #207

Uh oh!

Conversation

noahho commented Nov 12, 2025

Summary

Testing

Uh oh!

gemini-code-assist bot commented Nov 12, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

noahho commented Nov 12, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Uh oh!

noahho commented Nov 12, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

bejaeger left a comment

Choose a reason for hiding this comment

Uh oh!

bejaeger Nov 17, 2025

Choose a reason for hiding this comment

Uh oh!

bejaeger Nov 17, 2025

Choose a reason for hiding this comment

Uh oh!

bejaeger Nov 17, 2025

Choose a reason for hiding this comment

Uh oh!

bejaeger Nov 17, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels