⚡️ Speed up method ClapTextEmbeddings.create_position_ids_from_input_ids by 22%
#871
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
📄 22% (0.22x) speedup for
ClapTextEmbeddings.create_position_ids_from_input_idsinsrc/transformers/models/clap/modeling_clap.py⏱️ Runtime :
1.14 milliseconds→933 microseconds(best of76runs)📝 Explanation and details
The optimization refactors the tensor computation pipeline in
create_position_ids_from_input_idsto eliminate redundant type conversions and streamline operations:Key optimizations:
Eliminated redundant type conversions: The original code computed
mask = input_ids.ne(padding_idx).int()and later used.type_as(mask), creating unnecessary int→float→int conversions. The optimized version keepsmaskas a boolean tensor throughout, which PyTorch handles efficiently.Separated operations for clarity and efficiency: Instead of the complex single-line expression
(torch.cumsum(mask, dim=1).type_as(mask) + past_key_values_length) * mask, the optimized version breaks this into discrete steps:torch.cumsum(mask, dim=1)operates directly on the boolean maskpast_key_values_lengthonly when needed (avoiding unnecessary addition when it's 0)Conditional optimization: The
if past_key_values_length != 0check avoids adding zero in the common case where no past key values exist, reducing computation overhead.Performance impact: The line profiler shows the original bottleneck was the complex single-line operation (48.4% of total time). The optimized version spreads this across simpler operations, with the most expensive being
torch.cumsumat 32.8% of total time. This results in a 21% speedup overall.Test case analysis: The optimization shows consistent improvements across all test scenarios, with particularly strong gains (34-52%) on edge cases like single tokens, empty tensors, and padding-heavy sequences. Large-scale tests show more modest but still meaningful improvements (6-21%), indicating the optimization scales well for production workloads.
The refactoring maintains identical functionality while leveraging PyTorch's efficient boolean operations and eliminating unnecessary type conversions in the tensor computation pipeline.
✅ Correctness verification report:
🌀 Generated Regression Tests and Runtime
To edit these changes
git checkout codeflash/optimize-ClapTextEmbeddings.create_position_ids_from_input_ids-misdxw9uand push.