⚡️ Speed up function _excel2num by 16%
#378
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
📄 16% (0.16x) speedup for
_excel2numinpandas/io/excel/_util.py⏱️ Runtime :
2.94 milliseconds→2.53 milliseconds(best of66runs)📝 Explanation and details
The optimized code achieves a 16% speedup by eliminating redundant operations inside the character processing loop and optimizing string validation.
Key optimizations:
Moved string preprocessing outside the loop: The original code called
x.upper().strip()on every loop iteration. The optimized version calls this once and stores the result ins, eliminating repeated string method calls.Precomputed
ord('A')values: Instead of callingord('A')andord('Z')multiple times within the loop, these values are computed once and reused, reducing function call overhead.Faster character validation: Replaced
cp < ord("A") or cp > ord("Z")withnot ('A' <= c <= 'Z'). This avoids callingord()on the character for validation and uses Python's optimized string comparison operators, which are faster for single ASCII characters.Reduced
ord()calls per iteration: The original code calledord()three times per character (once forc, once for"A", once for"Z"). The optimized version callsord()only once per character.Performance impact by test case:
"A" * 1000show dramatic improvements (71-72% faster), indicating the optimizations scale well with input lengthFunction usage context:
Based on
function_references,_excel2numis called by_range2colswhich processes comma-separated column ranges. This means_excel2numcan be called multiple times per range specification (e.g., "A:Z,AA:AZ"), making the per-call optimization significant for Excel file processing workflows where column ranges are frequently parsed.The optimization maintains identical functionality while providing meaningful performance gains, especially for longer column names and batch processing scenarios common in pandas Excel operations.
✅ Correctness verification report:
🌀 Generated Regression Tests and Runtime
To edit these changes
git checkout codeflash/optimize-_excel2num-mihdoy1nand push.