⚡️ Speed up method ProphetNetTokenizer._convert_id_to_token by 233%
#885
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
📄 233% (2.33x) speedup for
ProphetNetTokenizer._convert_id_to_tokeninsrc/transformers/models/prophetnet/tokenization_prophetnet.py⏱️ Runtime :
2.25 milliseconds→676 microseconds(best of200runs)📝 Explanation and details
The optimization replaces a dictionary lookup with array indexing for token ID to token conversion, achieving a 232% speedup.
Key optimization: Added a pre-computed list
_ids_to_tokens_listduring initialization that maps token IDs directly to tokens using array indices, enabling O(1) lookups instead of dictionary operations.What changed:
_ids_to_tokens_list[id] = tokenfor fast direct accessWhy it's faster:
list[index]) is significantly faster than dictionary lookups (dict.get(key)) in Python0 <= index < len(list)) and type check (isinstance(index, int)) are very fast operationsPerformance characteristics:
The optimization is particularly effective for transformer tokenizers where
_convert_id_to_tokenis frequently called during text generation and processing, making the fast path for valid indices highly valuable.✅ Correctness verification report:
🌀 Generated Regression Tests and Runtime
To edit these changes
git checkout codeflash/optimize-ProphetNetTokenizer._convert_id_to_token-miskb7uland push.