Add TabArena single-table OpenML datasets by pc0618 · Pull Request #369 · snap-stanford/relbench

pc0618 · 2026-03-02T01:03:28Z

This PR adds TabArena single-table translations to RelBench via a lightweight OpenML-backed integration.

New datasets: tabarena-<slug> (single records table).
New tasks: per-dataset fold-<N> tasks (OpenML CV folds; train/val/test tables).
Optional dependency: pip install relbench[tabarena] (adds openml).
Server downloads are skipped for tabarena-*; datasets/tasks are generated locally from OpenML and cached under ~/.cache/relbench/tabarena-*/.
Includes a small helper script: examples/translate_tabarena_to_relbench.py.
Adds offline unit tests that stub OpenML so CI doesn’t require network access.

Example usage:

from relbench.datasets import get_dataset
from relbench.tasks import get_task

dataset = get_dataset("tabarena-credit-g", download=False)
task = get_task("tabarena-credit-g", "fold-0", download=False)
train = task.get_table("train")

pc0618 · 2026-03-02T01:14:15Z

CI is green and the PR should be ready for review. Pinging @rishabh-ranjan and @matthiasf.

rishabh-ranjan · 2026-03-02T22:28:47Z

@JustinGu32 please see that everything looks alright.

We should support downloading from server by uploading the zip files.

I don't understand what's going on with the folds. I will get Vignesh's opinion on it.

ParthShroff · 2026-03-02T23:02:09Z

@rishabh-ranjan TabArena is built on OpenML tasks which already have predefined CV resampling splits (folds) instead of timestamp splits. Fold-N names which OpenML train/test split we use. The underlying target is still the same across folds and only the partitioning changes. The timestamps in the task tables are synthetic and only exist to fit the Relbnech interface

for more information, see https://pre-commit.ci

pc0618 · 2026-03-04T04:26:16Z

Implemented requested refactor on this PR:

Renamed TabArena task names from fold-N to split-N in registration and docs/examples.
Refactored task class to TabArenaSplitEntityTask (kept TabArenaFoldEntityTask as a backward-compatible alias in code only).
Removed synthetic timestamps from TabArena task tables (time_col=None, task tables now contain only record_id + target before masking).
Added split-oriented dataset helpers (available_splits, get_openml_split_indices) while keeping fold-named aliases for backward compatibility.

On the AutoCompleteTask point: I kept a custom task class because AutoCompleteTask is time-window based, while TabArena uses predefined OpenML split indices; this keeps split semantics exact.

Branch is updated (tabarena-single-table) with commits f8557b6 + f796cba (and pre-commit.ci formatting commit c7d62a9).

pc0618 · 2026-03-04T05:21:01Z

Follow-up update pushed to tabarena-single-table (commit 790b5db):

TabArena task tables are now edge-free (fkey_col_to_pkey_table = {}) to keep single-table context single-node for RT/rustler sampling.
Added custom masking in TabArena task to keep record_id available for inference while staying edge-free.
Added a detailed PluRel-16B runbook at examples/tabarena_plurel16b_inference.md with:
- exact pinned SHAs
- split-name compatibility notes (split-*)
- random-sampling behavior notes
- seq_len=2048 and seq_len=4096 command templates
- no-FK and single-node context sanity checks.

for more information, see https://pre-commit.ci

rishabh-ranjan · 2026-03-17T18:04:33Z

+
+TabArena datasets are generated locally (from OpenML) and cached under `~/.cache/relbench/tabarena-*/`. Passing `download=True` will skip the RelBench server download step for these datasets/tasks.
+
+For an end-to-end PluRel-16B TabArena inference runbook (including `split-*` task naming, random sampling behavior, and `seq_len=2048/4096` commands), see:


please remove both these files from examples. you can make a PR to the internal RT repo for these files.

Done. I removed the internal RT/PluRel runbook from examples and removed the README reference to it. The PR now only contains public-facing example scripts.

rishabh-ranjan · 2026-03-17T18:05:45Z

 ```


+**Using TabArena datasets**


please rewrite this section to strictly follow the other integration sections (e.g. add citation) in the readme and remove any extra material (e.g. the line about download=True).

Updated. The TabArena README section now follows the same style as the other integrations: install line, short description, links to public example scripts, and a citation block. I also removed the extra download=True note.

kvignesh1420 · 2026-03-17T20:07:01Z

+- task names are `split-N` (not `fold-N`)
+- task tables are edge-free (`fkey_col_to_pkey_table = {}`)
+- no synthetic task timestamps (`time_col=None`)


these seem to be internal notes that can be removed

This seems to be like an internal exploration file with inference experiment details. This can be removed from the public code

kvignesh1420 · 2026-03-17T20:10:08Z

+TabArena datasets are generated locally (from OpenML) and cached under `~/.cache/relbench/tabarena-*/`. Passing `download=True` will skip the RelBench server download step for these datasets/tasks.
+
+For an end-to-end PluRel-16B TabArena inference runbook (including `split-*` task naming, random sampling behavior, and `seq_len=2048/4096` commands), see:
+[`examples/tabarena_plurel16b_inference.md`](examples/tabarena_plurel16b_inference.md)


The example file needs to be rewritten based only on publicly available data and code.

I asked above to simply remove these example files entirely

We should include examples of how to use these tabular datasets, especially since we have the splits involved.
Reproducing the results with tabpfn on our dataset will help ensure people can trust the data.

Done. The old internal example was replaced with public examples built only around OpenML data and standard public Python packages. examples/translate_tabarena_to_relbench.py now compares the original OpenML task with the relbenchified records and split-* tables.

Added. examples/translate_tabarena_to_relbench.py now shows how the source rows map to records, how the RelBench test split matches the OpenML test split, and how RelBench train+val partition the OpenML train side. I also added examples/validate_tabarena_baseline.py, which trains a public baseline on the original OpenML rows and verifies that the relbenchified view produces identical features, labels, predictions, and metrics. I did not add an RT example, in line with the later feedback that this does not belong in RelBench.

kvignesh1420 · 2026-03-17T20:54:20Z

It would be great to have the following:

An example script showing how relbench-tabarena data can be loaded and explored. This script should also load data from the original tabarena data and compare. Show how the splits differ.
Next, in a different script, show how one can load TabPFN/XGBoost models and infer on relbench-tabarena data as well as original-tabarena data to show that our data and evaluation is correct.
Then in a third script, show how RT can be used to train/tested on relbench-tabarena

rishabh-ranjan · 2026-03-17T21:32:48Z

the RT script does not belong here (in RelBench). agree with the other two scripts.

for more information, see https://pre-commit.ci

pc0618 · 2026-03-31T05:22:06Z

Pushed an update to pc0618/relbench-fork:tabarena-single-table at commit 4e607bd.

This addresses the review feedback as follows:

removed the internal RT/PluRel runbook from examples
rewrote the TabArena README section to match the style of the other integrations
added a proper TabArena citation
removed the extra README material about download=True
rewrote the public example to compare the original OpenML task with the relbenchified records and split-* tables
added a second public example that validates the relbenchified view against the original OpenML data using a standard baseline

I did not add an RT example, in line with the feedback that this does not belong in RelBench.

Validation run locally before pushing:

pytest test/datasets/test_tabarena.py -> 3 passed
smoke-tested both new example scripts on credit-g
the baseline validation script shows identical features, labels, predictions, and AUROC between the original OpenML view and the relbenchified view

Add TabArena single-table OpenML datasets

6bb795c

rishabh-ranjan requested a review from JustinGu32 March 2, 2026 22:27

pc0618 and others added 3 commits March 4, 2026 04:25

Rename TabArena folds to splits and remove synthetic task timestamps

f8557b6

[pre-commit.ci] auto fixes from pre-commit.com hooks

c7d62a9

for more information, see https://pre-commit.ci

Guard TabArena stats overlap when test split is empty

f796cba

Make TabArena task tables edge-free and add PluRel-16B runbook

790b5db

[pre-commit.ci] auto fixes from pre-commit.com hooks

7aa1425

for more information, see https://pre-commit.ci

rishabh-ranjan requested changes Mar 17, 2026

View reviewed changes

kvignesh1420 reviewed Mar 17, 2026

View reviewed changes

Pranshu Chaturvedi and others added 2 commits March 30, 2026 10:48

TabArena: replace internal example with public validation scripts

4e607bd

[pre-commit.ci] auto fixes from pre-commit.com hooks

4558a9d

for more information, see https://pre-commit.ci


		TabArena datasets are generated locally (from OpenML) and cached under `~/.cache/relbench/tabarena-*/`. Passing `download=True` will skip the RelBench server download step for these datasets/tasks.

		For an end-to-end PluRel-16B TabArena inference runbook (including `split-*` task naming, random sampling behavior, and `seq_len=2048/4096` commands), see:

Conversation

pc0618 commented Mar 2, 2026

Uh oh!

pc0618 commented Mar 2, 2026

Uh oh!

rishabh-ranjan commented Mar 2, 2026

Uh oh!

ParthShroff commented Mar 2, 2026

Uh oh!

pc0618 commented Mar 4, 2026

Uh oh!

pc0618 commented Mar 4, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kvignesh1420 commented Mar 17, 2026

Uh oh!

rishabh-ranjan commented Mar 17, 2026

Uh oh!

pc0618 commented Mar 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants