Fix broken unicode escape sequence in JSX text #1754
Draft
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixes a bug where unicode characters on standalone lines in JSX text were being corrupted and output as
\uFFFD
(the Unicode replacement character) instead of their correct escape sequences.Problem
When JSX contained unicode characters on separate lines, they were being incorrectly encoded:
Before (broken):
After (fixed):
Root Cause
The issue was in the
fixupWhitespaceAndDecodeEntities
function ininternal/transformers/jsxtransforms/jsx.go
. The code was using byte indices to track character positions but slicing strings at byte boundaries, which corrupted multi-byte UTF-8 characters.For the 3-byte UTF-8 sequence ⚠ (
E2 9A A0
), the code was only including the first byte (E2
) when slicing, resulting in an invalid UTF-8 sequence that gets replaced with\uFFFD
.Solution
Fixed the string slicing logic to properly handle multi-byte UTF-8 characters by tracking the end byte position (
i + size - 1
) instead of just the start position (i
) of non-whitespace characters. This ensures that when slicing the string, we include complete UTF-8 character sequences.The fix is minimal and surgical - only changing the variable name and assignment to make the byte boundary handling correct.
Testing
Added comprehensive test coverage in
jsxUnicodeEscapeSequence.tsx
that verifies:All existing tests continue to pass, confirming no regressions.
Original prompt
💬 Share your feedback on Copilot coding agent for the chance to win a $200 gift card! Click here to start the survey.