⚡️ Speed up function text_to_word_sequence by 42%
#165
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
📄 42% (0.42x) speedup for
text_to_word_sequenceinkeras/src/legacy/preprocessing/text.py⏱️ Runtime :
931 microseconds→655 microseconds(best of250runs)📝 Explanation and details
The optimized code achieves a 42% speedup through three key optimizations:
1. Conditional filter processing: The original code always created a translation dictionary and map even when
filterswas empty. The optimized version adds anif filters:check, avoiding unnecessary string translation when no filtering is needed.2. Inline dictionary comprehension: Instead of creating a separate
translate_dictvariable, the optimized code passes the dictionary comprehension directly tostr.maketrans(), eliminating an intermediate object creation.3. Smart space splitting: When
split=" "(the default), the optimized code usesinput_text.split()instead ofinput_text.split(split). Python's parameterlesssplit()is significantly faster as it's optimized for whitespace splitting and automatically handles multiple consecutive spaces without needing the list comprehension filter[i for i in seq if i].Performance Impact by Test Category:
filters=""): Up to 165% faster by completely skipping translationsplit()and avoiding list comprehensionHot Path Relevance: Based on the function references,
text_to_word_sequenceis called within loops infit_on_texts()andtexts_to_sequences_generator()methods of the Tokenizer class, and in thehashing_trick()function. These are preprocessing steps that process many texts in batches, making this optimization particularly valuable for text preprocessing pipelines where it could significantly reduce training and inference time.The optimization maintains identical functionality while being most effective for common use cases: default parameters with space-separated text and scenarios with empty filters.
✅ Correctness verification report:
🌀 Generated Regression Tests and Runtime
To edit these changes
git checkout codeflash/optimize-text_to_word_sequence-mirguop5and push.