HighRateMOS is the first non-intrusive MOS prediction model that explicitly models sampling rates, achieving first place in five out of eight metrics in AudioMOS Challenge 2025 Track3.
Most speech quality assessment models are trained and evaluated at a fixed sampling rate (e.g., 16 kHz). In practice, however, speech signals use a range of sampling rates—16 kHz, 24 kHz, 48 kHz, which can affect perceived quality. This project addresses this mismatch by adding sampling-rate-aware components to an SSL-based MOS prediction framework, supporting robust quality estimation across diverse real-world conditions regardless of sampling rate.
This project extends the SHEET toolkit. Before getting started, please follow the installation instructions in the original repository.
This recipe uses data from the AudioMOS Challenge 2025 Track 3. Please follow the challenge's official data usage policy. For convenience, we provide the original download link for the data.