Fix an un-normalized encoding… #9

savetheclocktower · 2024-05-22T02:30:15Z

…saving the C++ code from receiving a Node-style encoding name that it doesn’t recognize.

Our TextBuffer object acts as a bridge between TextEditor and lower-level APIs like pathwatcher and superstring. And superstring, in turn, presents a JavaScript API but implements lots of its internals in C++ for speed.

But that’s where a bug can slip in, because JavaScript and C++ use different conventional names for common text encodings… including UTF-8, the commonest of all (in our particular domain). Node likes to call it utf8, but the superstring C++ code insists upon the more formal UTF-8. There’s a utility function called normalizeEncoding within the superstring JavaScript that converts Node-style encoding names to C++-style encoding names.

I’m not going to be able to demonstrate exactly how this bug happens, but take a look at superstring/index.js and search for the string encoding. You’ll see that, aside from the aforementioned normalizeEncoding utility, three different methods take an encoding parameter in some form: TextBuffer#load, TextBuffer#save, and TextBuffer#baseTextMatchesFile.

You might also notice that the first two of those methods pass their encoding parameter through the normalizeEncoding function almost immediately — but baseTextMatchesFile doesn’t. After lots of analysis and reflection, I can’t come up with a single reason why it shouldn’t normalize the encoding; it would seem to save me from this exception I kept getting today while working on a new package:

I got to this method from text-buffer code — specifically subscribeToFile, which my new package apparently called in the course of adding an onDidSave callback to a TextEditor.

It feels like this code path should be hit quite often, and I’m sure I’ve gotten this exact error before, but not for ages; so I definitely don’t have reproduction steps here. But I can assert that hot-fixing this in a debugger (via a conditional breakpoint that simply does encoding = normalizeEncoding(encoding) on the appropriate line) prevents this error, and it makes logical sense why.

…saving the C++ code from receiving a Node-style encoding name that it doesn’t recognize.

savetheclocktower · 2024-05-22T02:34:26Z

I am… not surprised that CI feels dusty here.

Fix an un-normalized encoding…

046f1ba

…saving the C++ code from receiving a Node-style encoding name that it doesn’t recognize.

savetheclocktower mentioned this pull request May 22, 2024

Fix CI Again #8

Merged

This was referenced Jul 3, 2024

Vendorize libiconv on macOS #11

Closed

(Test) Fix install name _and_ encoding name conversion issue #12

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Fix an un-normalized encoding… #9

Fix an un-normalized encoding… #9

Uh oh!

savetheclocktower commented May 22, 2024

Uh oh!

savetheclocktower commented May 22, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Fix an un-normalized encoding… #9

Are you sure you want to change the base?

Fix an un-normalized encoding… #9

Uh oh!

Conversation

savetheclocktower commented May 22, 2024

Uh oh!

savetheclocktower commented May 22, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant