-
Notifications
You must be signed in to change notification settings - Fork 537
Use of RPS for better distributional regression #689
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use of RPS for better distributional regression #689
Conversation
>Co-authored-by: Jonas Landsgesell jonaslandsgesell@gmail.com >Co-authored-by: Pascal Knoll knollpascal00@gmail.com
|
Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits. |
|
Jonas Landsgesell seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account. You have signed the CLA already but the status is still pending? Let us recheck it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces the Ranked Probability Score (RPS) for fine-tuning, which is a valuable enhancement for distributional regression. The implementation of the rps_loss function is clear, well-documented, and correctly integrated into the finetune_regressor.py example. The associated refactoring to use regressor.model_ improves the code structure. I have one minor suggestion to make the one-hot encoding in the loss function more idiomatic. Overall, this is a high-quality contribution.
| targets = targets.long() | ||
|
|
||
| pred_cdf = torch.cumsum(outputs, dim=1) | ||
| target_one_hot = torch.zeros_like(outputs).scatter_(1, targets.unsqueeze(1), 1.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For creating the one-hot encoded target tensor, consider using torch.nn.functional.one_hot. It's more idiomatic for this operation and can make the code's intent clearer. This function is specifically designed for one-hot encoding and might offer better performance.
| target_one_hot = torch.zeros_like(outputs).scatter_(1, targets.unsqueeze(1), 1.) | |
| target_one_hot = torch.nn.functional.one_hot(targets, num_classes=outputs.shape[1]).to(outputs.dtype) |
|
Hi @jonaslandsgesell and Pascal! We’d love to get your eyes on this new version to make sure we’ve captured your intent correctly. Closing this PR now so we can centralize the discussion in #711 ! |
Motivation and Context
In this PR, we (Jonas Landsgesell and Pascal Knoll) implement the Ranked Probability Score (RPS) for fine-tuning TabPFN.
Ranked Probability Score (RPS) for distributional regression is a distance-sensitive scoring rule—unlike Cross Entropy (CE) loss, which is a local scoring rule. Therefore, RPS has advantages in the regression setting.
We would love to see how a TabPFN model pretrained with the RPS loss performs compared to a TabPFN model pretrained on the Cross Entropy loss. Preliminary evaluation on the regression finetuning example shows benefits of RPS over CE, when run with multiple seeds.
Below you can see our RPS vs. CE evaluation on a custom NN architecture (not TabPFN) on holdout test sets for different datasets:
Better MAE with RPS-trained NN than CE-trained NN:

Better R2 with RPS-trained NN than CE-trained NN:

Better RPS with RPS-trained NN than CE-trained NN:

General Literature:
Making and Evaluating Point Forecasts Tilmann Gneiting
https://scoringrules.readthedocs.io/en/latest/theory.html
Rage against the mean
Gneiting, Strictly Proper Scoring Rules, Prediction, and Estimation
Public API Changes
How Has This Been Tested?
Yes on the example finetuning script for regression (classification is unchanged)
Checklist
CHANGELOG.md(if relevant for users).🎅🎄