The TopK operation relies on the Candle backend's asort() method which is unstable due to using partial_cmp().
This means inputs containing NaN/Inf may be sorted nondeterministically. While in MoE router layers TopK should only get regular float valued logits, we should consider reimplementing it with stable sort if/when this becomes necessary.
The TopK operation relies on the Candle backend's asort() method which is unstable due to using partial_cmp().
This means inputs containing NaN/Inf may be sorted nondeterministically. While in MoE router layers TopK should only get regular float valued logits, we should consider reimplementing it with stable sort if/when this becomes necessary.