From 6d5f6cd60d275348b175445c6074cc773afe58fa Mon Sep 17 00:00:00 2001
From: Noah Hollmann <noah@priorlabs.ai>
Date: Mon, 3 Nov 2025 12:35:14 +0100
Subject: [PATCH 1/4] Add usage tips for TabPFN in README

Added important tips for using TabPFN effectively, including batch prediction mode, data preprocessing advice, GPU usage, and dataset size limitations.
---
 README.md | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/README.md b/README.md
index cb8413d65..f973a50ce 100644
--- a/README.md
+++ b/README.md
@@ -97,6 +97,13 @@ print("Mean Squared Error (MSE):", mse)
 print("R² Score:", r2)
 ```
 
+### Important Tips
+
+Always use TabPFN in batch prediction mode - each predict call requires the training set to be recomputed so calling predict on 100 samples separately is almost 100 times slower and ore expensive than a single call.
+Do not apply data scaling or one-hot encoding when feeding data to the model.
+Make sure a GPU is available - on CPU TabPFN is slow to execute.
+Dataset size is limited - TabPFN works best on datasets with less than 10,000 samples and 500 features. If they are larger we recommend looking at the [Large datasets guide](https://github.com/PriorLabs/tabpfn-extensions/blob/main/examples/large_datasets/large_datasets_example.py).
+
 ### Best Results
 
 For optimal performance, use the `AutoTabPFNClassifier` or `AutoTabPFNRegressor` for post-hoc ensembling. These can be found in the [TabPFN Extensions](https://github.com/PriorLabs/tabpfn-extensions) repository. Post-hoc ensembling combines multiple TabPFN models into an ensemble.

From 09a963b2189afeea40fde8ddc2afabd0465783da Mon Sep 17 00:00:00 2001
From: Noah Hollmann <noah@priorlabs.ai>
Date: Mon, 3 Nov 2025 12:35:42 +0100
Subject: [PATCH 2/4] Update README.md

---
 README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/README.md b/README.md
index f973a50ce..1e101acc9 100644
--- a/README.md
+++ b/README.md
@@ -97,7 +97,7 @@ print("Mean Squared Error (MSE):", mse)
 print("R² Score:", r2)
 ```
 
-### Important Tips
+### Usage Tips
 
 Always use TabPFN in batch prediction mode - each predict call requires the training set to be recomputed so calling predict on 100 samples separately is almost 100 times slower and ore expensive than a single call.
 Do not apply data scaling or one-hot encoding when feeding data to the model.

From bccaedc9bb706cefcc9da1826ae9741ff84df1b4 Mon Sep 17 00:00:00 2001
From: Noah Hollmann <noah@priorlabs.ai>
Date: Mon, 3 Nov 2025 12:36:42 +0100
Subject: [PATCH 3/4] Update usage tips for TabPFN performance

Added a recommendation to split large test sets into chunks of 1000 samples for better performance.
---
 README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/README.md b/README.md
index 1e101acc9..f393dcdef 100644
--- a/README.md
+++ b/README.md
@@ -99,7 +99,7 @@ print("R² Score:", r2)
 
 ### Usage Tips
 
-Always use TabPFN in batch prediction mode - each predict call requires the training set to be recomputed so calling predict on 100 samples separately is almost 100 times slower and ore expensive than a single call.
+Always use TabPFN in batch prediction mode - each predict call requires the training set to be recomputed so calling predict on 100 samples separately is almost 100 times slower and ore expensive than a single call. If the test set is very large split it into chunks of 1000 samples each.
 Do not apply data scaling or one-hot encoding when feeding data to the model.
 Make sure a GPU is available - on CPU TabPFN is slow to execute.
 Dataset size is limited - TabPFN works best on datasets with less than 10,000 samples and 500 features. If they are larger we recommend looking at the [Large datasets guide](https://github.com/PriorLabs/tabpfn-extensions/blob/main/examples/large_datasets/large_datasets_example.py).

From f232e5ef1105c1496f0ca429ddaa238957cc352c Mon Sep 17 00:00:00 2001
From: Noah Hollmann <noah@priorlabs.ai>
Date: Mon, 3 Nov 2025 12:54:03 +0100
Subject: [PATCH 4/4] Update README.md

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
---
 README.md | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/README.md b/README.md
index f393dcdef..3effc52a7 100644
--- a/README.md
+++ b/README.md
@@ -99,10 +99,10 @@ print("R² Score:", r2)
 
 ### Usage Tips
 
-Always use TabPFN in batch prediction mode - each predict call requires the training set to be recomputed so calling predict on 100 samples separately is almost 100 times slower and ore expensive than a single call. If the test set is very large split it into chunks of 1000 samples each.
-Do not apply data scaling or one-hot encoding when feeding data to the model.
-Make sure a GPU is available - on CPU TabPFN is slow to execute.
-Dataset size is limited - TabPFN works best on datasets with less than 10,000 samples and 500 features. If they are larger we recommend looking at the [Large datasets guide](https://github.com/PriorLabs/tabpfn-extensions/blob/main/examples/large_datasets/large_datasets_example.py).
+- **Use batch prediction mode**: Each `predict` call recomputes the training set. Calling `predict` on 100 samples separately is almost 100 times slower and more expensive than a single call. If the test set is very large, split it into chunks of 1000 samples each.
+- **Avoid data preprocessing**: Do not apply data scaling or one-hot encoding when feeding data to the model.
+- **Use a GPU**: TabPFN is slow to execute on a CPU. Ensure a GPU is available for better performance.
+- **Mind the dataset size**: TabPFN works best on datasets with fewer than 10,000 samples and 500 features. For larger datasets, we recommend looking at the [Large datasets guide](https://github.com/PriorLabs/tabpfn-extensions/blob/main/examples/large_datasets/large_datasets_example.py).
 
 ### Best Results