⚡️ Speed up method EventSource._get_charset by 82%
#105
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
📄 82% (0.82x) speedup for
EventSource._get_charsetinskyvern/client/core/http_sse/_api.py⏱️ Runtime :
472 microseconds→260 microseconds(best of234runs)📝 Explanation and details
The optimized code achieves an 81% speedup by replacing expensive regex operations and string encode/decode validation with faster manual string parsing and
codecs.lookup().Key Optimizations:
Eliminates expensive regex: The original uses
re.search(r"charset=([^;\s]+)", content_type, re.IGNORECASE)which compiles and executes a regex pattern. The optimized version uses simple string operations -content_type.lower().find("charset=")followed by manual character-by-character parsing to find the charset value boundary.Faster charset validation: Instead of
"test".encode(charset).decode(charset)which performs actual string encoding/decoding operations, the optimized code usescodecs.lookup(charset)which only validates that the charset name exists in Python's codec registry without performing expensive encoding operations.Fast path for common case: When no charset is present (27% of test cases based on profiler data), the optimized version immediately returns "utf-8" after a single
find()operation, avoiding all regex processing.Manual boundary detection: The optimized code manually walks through characters to find where the charset value ends (at
;, whitespace, or end of string), which is faster than regex capture groups for simple parsing.Performance Impact by Test Case:
The optimization is particularly effective for server-sent events (SSE) parsing where this function may be called frequently during streaming operations, making the cumulative performance gain significant for high-throughput applications.
✅ Correctness verification report:
🌀 Generated Regression Tests and Runtime
To edit these changes
git checkout codeflash/optimize-EventSource._get_charset-miobypa5and push.