Accelerating Scenescript with multi-token prediction

Thank you, authors, for the foundation work on the language-based 3D perception.

We've applied multi-token prediction to Scenescript and achieved 5x acceleration without sacrificing accuracy.

This work has been accepted to CVPR26: https://arxiv.org/pdf/2512.05597