Skip to content

Conversation

@savetheclocktower
Copy link

…saving the C++ code from receiving a Node-style encoding name that it doesn’t recognize.


Our TextBuffer object acts as a bridge between TextEditor and lower-level APIs like pathwatcher and superstring. And superstring, in turn, presents a JavaScript API but implements lots of its internals in C++ for speed.

But that’s where a bug can slip in, because JavaScript and C++ use different conventional names for common text encodings… including UTF-8, the commonest of all (in our particular domain). Node likes to call it utf8, but the superstring C++ code insists upon the more formal UTF-8. There’s a utility function called normalizeEncoding within the superstring JavaScript that converts Node-style encoding names to C++-style encoding names.

I’m not going to be able to demonstrate exactly how this bug happens, but take a look at superstring/index.js and search for the string encoding. You’ll see that, aside from the aforementioned normalizeEncoding utility, three different methods take an encoding parameter in some form: TextBuffer#load, TextBuffer#save, and TextBuffer#baseTextMatchesFile.

You might also notice that the first two of those methods pass their encoding parameter through the normalizeEncoding function almost immediately — but baseTextMatchesFile doesn’t. After lots of analysis and reflection, I can’t come up with a single reason why it shouldn’t normalize the encoding; it would seem to save me from this exception I kept getting today while working on a new package:

Screenshot 2024-05-21 at 7 21 06 PM

I got to this method from text-buffer code — specifically subscribeToFile, which my new package apparently called in the course of adding an onDidSave callback to a TextEditor.

It feels like this code path should be hit quite often, and I’m sure I’ve gotten this exact error before, but not for ages; so I definitely don’t have reproduction steps here. But I can assert that hot-fixing this in a debugger (via a conditional breakpoint that simply does encoding = normalizeEncoding(encoding) on the appropriate line) prevents this error, and it makes logical sense why.

…saving the C++ code from receiving a Node-style encoding name that it doesn’t recognize.
@savetheclocktower
Copy link
Author

I am… not surprised that CI feels dusty here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant