Skip to content

Bad path encoding handling in the CCScript compiler #327

@PhoenixBound

Description

@PhoenixBound

Reported by YasuYasu64 on the forums. Putting it here because the ccscript compiler repo doesn't have an issue tracker yet, as a fork made through the GitHub website.

Symptoms

With the code page set to 932 and a non-ASCII username in Windows, CoilSnake reportedly can't compile any projects that have CCScript files in them, complaining in a CCScriptCompilationError, Couldn't find module '{temp-folder}\coilsnake\assets\mobile-sprout\lib\stdarg.ccs'.

Changing the code page to 65001 by the "Use Unicode UTF-8 for worldwide language support" checkbox in Windows 10 causes an exception in Tk that bubbles up to PyInstaller, where it can't find basic Python libraries for the GUI, making the issue difficult to work around.

Unconfirmed ideas about the cause

For the Tk part, there are a few issues on the CPython issue tracker that mention discrepancies between how CPython normally sets up environment variables for the path to TCL stuff vs PyInstaller, but I'm not sure how much those help explain what's going on with the switch to the UTF-8 codepage. The rest of this issue will focus on the part about finding stdarg.ccs.

When compiling a project, CoilSnake invokes the CCScript compiler via the ccscript package and passes it a custom --libs path in CoilSnake's assets package. In PyInstaller/exe builds on Windows, this means that the libs path includes the name of the current user account, since that's where Windows puts temporary files.

The C++ glue code in the ccscript converts all arguments, including paths, from their internal unspecified Python Unicode encoding into UTF-8 and passes the resulting array of char *s to cccmain. After this... they get put into std::strings, and those strings get passed directly to the constructor of the filesystem libraries' path objects in many assorted places around the codebase. I assume this works the same way in these unofficial libraries as in C++17 and onward, where std::string is assumed to contain text in the "native narrow encoding" (i.e., the current Windows codepage!) and you need to use a special function or a char8_t-based string type to get things handled as UTF-8 properly. In other words, the encoding bookkeeping gets lost along the way and we end up doing a second extra conversion to UTF-8 that mangles the path.

I assume the on-demand conversions to fs::path were done out of concern for memory usage or something...? std::filesystem::path and boost::filesystem::path before it probably natively store the path as UTF-16 on Windows. (It looks like there's a default fmt argument to the constructor that might be able to change that? Can't tell.) That said, it's probably easier and less error-prone to convert everything that's treated as a file path to an fs::path as soon as possible, and store those in struct fields and everything. That way the conversion from UTF-8 only needs to be handled correctly once.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions