Add Shrutam-2 contrib model: multilingual Indic ASR on Neuron#142
Open
jimburtoft wants to merge 2 commits intoaws-neuron:mainfrom
Open
Add Shrutam-2 contrib model: multilingual Indic ASR on Neuron#142jimburtoft wants to merge 2 commits intoaws-neuron:mainfrom
jimburtoft wants to merge 2 commits intoaws-neuron:mainfrom
Conversation
Three-stage ASR pipeline for 12 Indian languages: - Conformer encoder (607.7M, traced via torch_neuronx.trace) - SMEAR-MoE projector (50.4M, 8 experts, traced) - LLM decoder (1.2B LlamaForCausalLM, NxDI ImageToTextModelWrapper) Validated on trn2.3xlarge (SDK 2.29, LNC=2): - 20.8 audio-seconds/s single-core, 61.1 audio-s/s DP=4 - +1.3% WER delta vs CPU (18/20 FLEURS samples) - 113 tok/s LLM decode, 9ms encoder, 1.6ms SMEAR
The 24-layer Conformer with BF16 auto-cast produces large relative errors on near-zero elements after LayerNorm/attention, even when cosine similarity is >0.99 and WER delta is only +1.3%. Removed unreliable element-wise relative error assertion; cosine similarity is the validated accuracy metric.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Note: The below template includes items meant for model contributions only. For other contributions such as bug fixes, features, etc., only fill out the relevant portions of the form.
Description
Three-stage ASR pipeline for Shrutam-2 (bharatgenai/Shrutam-2), supporting 12 Indian languages on Trainium2. The pipeline consists of:
torch_neuronx.trace(), 9ms latency for 10s audiotorch_neuronx.trace(), 1.6ms latencyImageToTextModelWrapperwith audio embedding scatter, 113 tok/sEnd-to-end: 20.8 audio-seconds/s single-core, 61.1 audio-seconds/s with DP=4 on trn2.3xlarge. WER delta vs CPU: +1.3% (18/20 FLEURS samples).
Model Information
Model Name: Shrutam-2 (bharatgenai/Shrutam-2)
Model Architecture: Conformer encoder + SMEAR-MoE projector + LlamaForCausalLM decoder
Purpose: Multilingual automatic speech recognition for 12 Indian languages (Hindi, Tamil, Telugu, Bengali, Kannada, Malayalam, Marathi, Gujarati, Odia, Punjabi, Assamese, Urdu)
Checklist
Required Components
Accuracy Test (ex.
test/integration/test_model.py)README.md with the following sections:
Source Code (
src/)modeling_shrutam2.py: Complete pipeline implementation including Conformer encoder, SMEAR-MoE projector, NxDI LLM wrapper,Shrutam2Pipelineclass, and trace/compile utilitiesOptional Components
Folder Structure
Confirm your contribution follows this structure:
Testing
How did you test this change?
All 9 tests executed on trn2.3xlarge (LNC=2, Neuron SDK 2.29, NxDI 0.9.x):
Test Results:
Compatibility
Tested with:
Additional Information
--auto-cast matmultfor BF16 computerepetition_penalty=1.3is recommended for greedy decoding to avoid hallucination on ~5% of samples that are beam-search-dependentRelated Issues
N/A
vLLM Integration
By submitting this PR, I confirm that: