Skip to content

Conversation

Copilot
Copy link
Contributor

@Copilot Copilot AI commented Sep 26, 2025

Fixes a bug where unicode characters on standalone lines in JSX text were being corrupted and output as \uFFFD (the Unicode replacement character) instead of their correct escape sequences.

Problem

When JSX contained unicode characters on separate lines, they were being incorrectly encoded:

export const Component = () => {
    return (<div><span></span></div>)
}

Before (broken):

return (jsx_runtime_1.jsxs("div", { children: [jsx_runtime_1.jsx("span", { children: "\u26A0" }), "\uFFFD"] }));

After (fixed):

return (jsx_runtime_1.jsxs("div", { children: [jsx_runtime_1.jsx("span", { children: "\u26A0" }), "\u26A0"] }));

Root Cause

The issue was in the fixupWhitespaceAndDecodeEntities function in internal/transformers/jsxtransforms/jsx.go. The code was using byte indices to track character positions but slicing strings at byte boundaries, which corrupted multi-byte UTF-8 characters.

For the 3-byte UTF-8 sequence ⚠ (E2 9A A0), the code was only including the first byte (E2) when slicing, resulting in an invalid UTF-8 sequence that gets replaced with \uFFFD.

Solution

Fixed the string slicing logic to properly handle multi-byte UTF-8 characters by tracking the end byte position (i + size - 1) instead of just the start position (i) of non-whitespace characters. This ensures that when slicing the string, we include complete UTF-8 character sequences.

The fix is minimal and surgical - only changing the variable name and assignment to make the byte boundary handling correct.

Testing

Added comprehensive test coverage in jsxUnicodeEscapeSequence.tsx that verifies:

  • Inline unicode characters (already worked)
  • Standalone unicode characters on separate lines (the bug case)
  • Multiple unicode characters (also affected)

All existing tests continue to pass, confirming no regressions.

Original prompt

This section details on the original issue you should resolve

<issue_title>broken unicode escape sequence in jsx</issue_title>
<issue_description>## Steps to reproduce

tsconfig.json

{
    "compilerOptions": {
        "outDir": "./dist",
        "declaration": true,
        "moduleResolution": "node",
        "strict": true,
        "skipLibCheck": true,
        "jsx": "react-jsx"
    }
}

repro.tsx

export const Repro = () => {
    return (<div><span>⚠</span>
        ⚠
    </div>)
}

Behavior with typescript@5.8

> tsc
> grep children dist/repro.js
    return ((0, jsx_runtime_1.jsxs)("div", { children: [(0, jsx_runtime_1.jsx)("span", { children: "\u26A0" }), "\u26A0"] }));

Behavior with tsgo

> tsgo
> grep children dist/repro.js
    return (jsx_runtime_1.jsxs("div", { children: [jsx_runtime_1.jsx("span", { children: "\u26A0" }), "\uFFFD"] }));

Its worth noting that the issue happens with the unicode character on a line by itself but not when there is other stuff in the same line.</issue_description>

Comments on the Issue (you are @copilot in this section)

@jakebailey Probably related to #1701 but, why have you done this? 😭 @jakebailey I see, you're saying it's literally https://www.compart.com/en/unicode/U+26A0. @jakebailey Yeah, it's just a bug then, this is a valid character so not sure where it's gone wrong.
Fixes #1715

💬 Share your feedback on Copilot coding agent for the chance to win a $200 gift card! Click here to start the survey.

Co-authored-by: jakebailey <5341706+jakebailey@users.noreply.github.com>
@Copilot Copilot AI changed the title [WIP] broken unicode escape sequence in jsx Fix broken unicode escape sequence in JSX text Sep 26, 2025
Copilot finished work on behalf of jakebailey September 26, 2025 15:23
@Copilot Copilot AI requested a review from jakebailey September 26, 2025 15:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

broken unicode escape sequence in jsx
2 participants