From 6d5f6cd60d275348b175445c6074cc773afe58fa Mon Sep 17 00:00:00 2001 From: Noah Hollmann Date: Mon, 3 Nov 2025 12:35:14 +0100 Subject: [PATCH 1/4] Add usage tips for TabPFN in README Added important tips for using TabPFN effectively, including batch prediction mode, data preprocessing advice, GPU usage, and dataset size limitations. --- README.md | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/README.md b/README.md index cb8413d65..f973a50ce 100644 --- a/README.md +++ b/README.md @@ -97,6 +97,13 @@ print("Mean Squared Error (MSE):", mse) print("R² Score:", r2) ``` +### Important Tips + +Always use TabPFN in batch prediction mode - each predict call requires the training set to be recomputed so calling predict on 100 samples separately is almost 100 times slower and ore expensive than a single call. +Do not apply data scaling or one-hot encoding when feeding data to the model. +Make sure a GPU is available - on CPU TabPFN is slow to execute. +Dataset size is limited - TabPFN works best on datasets with less than 10,000 samples and 500 features. If they are larger we recommend looking at the [Large datasets guide](https://github.com/PriorLabs/tabpfn-extensions/blob/main/examples/large_datasets/large_datasets_example.py). + ### Best Results For optimal performance, use the `AutoTabPFNClassifier` or `AutoTabPFNRegressor` for post-hoc ensembling. These can be found in the [TabPFN Extensions](https://github.com/PriorLabs/tabpfn-extensions) repository. Post-hoc ensembling combines multiple TabPFN models into an ensemble. From 09a963b2189afeea40fde8ddc2afabd0465783da Mon Sep 17 00:00:00 2001 From: Noah Hollmann Date: Mon, 3 Nov 2025 12:35:42 +0100 Subject: [PATCH 2/4] Update README.md --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index f973a50ce..1e101acc9 100644 --- a/README.md +++ b/README.md @@ -97,7 +97,7 @@ print("Mean Squared Error (MSE):", mse) print("R² Score:", r2) ``` -### Important Tips +### Usage Tips Always use TabPFN in batch prediction mode - each predict call requires the training set to be recomputed so calling predict on 100 samples separately is almost 100 times slower and ore expensive than a single call. Do not apply data scaling or one-hot encoding when feeding data to the model. From bccaedc9bb706cefcc9da1826ae9741ff84df1b4 Mon Sep 17 00:00:00 2001 From: Noah Hollmann Date: Mon, 3 Nov 2025 12:36:42 +0100 Subject: [PATCH 3/4] Update usage tips for TabPFN performance Added a recommendation to split large test sets into chunks of 1000 samples for better performance. --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 1e101acc9..f393dcdef 100644 --- a/README.md +++ b/README.md @@ -99,7 +99,7 @@ print("R² Score:", r2) ### Usage Tips -Always use TabPFN in batch prediction mode - each predict call requires the training set to be recomputed so calling predict on 100 samples separately is almost 100 times slower and ore expensive than a single call. +Always use TabPFN in batch prediction mode - each predict call requires the training set to be recomputed so calling predict on 100 samples separately is almost 100 times slower and ore expensive than a single call. If the test set is very large split it into chunks of 1000 samples each. Do not apply data scaling or one-hot encoding when feeding data to the model. Make sure a GPU is available - on CPU TabPFN is slow to execute. Dataset size is limited - TabPFN works best on datasets with less than 10,000 samples and 500 features. If they are larger we recommend looking at the [Large datasets guide](https://github.com/PriorLabs/tabpfn-extensions/blob/main/examples/large_datasets/large_datasets_example.py). From f232e5ef1105c1496f0ca429ddaa238957cc352c Mon Sep 17 00:00:00 2001 From: Noah Hollmann Date: Mon, 3 Nov 2025 12:54:03 +0100 Subject: [PATCH 4/4] Update README.md Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> --- README.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/README.md b/README.md index f393dcdef..3effc52a7 100644 --- a/README.md +++ b/README.md @@ -99,10 +99,10 @@ print("R² Score:", r2) ### Usage Tips -Always use TabPFN in batch prediction mode - each predict call requires the training set to be recomputed so calling predict on 100 samples separately is almost 100 times slower and ore expensive than a single call. If the test set is very large split it into chunks of 1000 samples each. -Do not apply data scaling or one-hot encoding when feeding data to the model. -Make sure a GPU is available - on CPU TabPFN is slow to execute. -Dataset size is limited - TabPFN works best on datasets with less than 10,000 samples and 500 features. If they are larger we recommend looking at the [Large datasets guide](https://github.com/PriorLabs/tabpfn-extensions/blob/main/examples/large_datasets/large_datasets_example.py). +- **Use batch prediction mode**: Each `predict` call recomputes the training set. Calling `predict` on 100 samples separately is almost 100 times slower and more expensive than a single call. If the test set is very large, split it into chunks of 1000 samples each. +- **Avoid data preprocessing**: Do not apply data scaling or one-hot encoding when feeding data to the model. +- **Use a GPU**: TabPFN is slow to execute on a CPU. Ensure a GPU is available for better performance. +- **Mind the dataset size**: TabPFN works best on datasets with fewer than 10,000 samples and 500 features. For larger datasets, we recommend looking at the [Large datasets guide](https://github.com/PriorLabs/tabpfn-extensions/blob/main/examples/large_datasets/large_datasets_example.py). ### Best Results