Bad path encoding handling in the CCScript compiler

Reported by [YasuYasu64](https://forum.starmen.net/forum/Community/PKHack/The-Help-and-Troubleshooting-Topic-Mk-III/2348649) on the forums. Putting it here because the [ccscript compiler repo](https://github.com/pk-hack/ccscript_legacy) doesn't have an issue tracker yet, as a fork made through the GitHub website.

## Symptoms

With the code page set to 932 and a non-ASCII username in Windows, CoilSnake reportedly can't compile any projects that have CCScript files in them, complaining in a CCScriptCompilationError, `Couldn't find module '{temp-folder}\coilsnake\assets\mobile-sprout\lib\stdarg.ccs'`.

Changing the code page to 65001 by the "Use Unicode UTF-8 for worldwide language support" checkbox in Windows 10 causes an exception in Tk that bubbles up to PyInstaller, where it can't find basic Python libraries for the GUI, making the issue difficult to work around.

## Unconfirmed ideas about the cause

For the Tk part, there are a few issues on the CPython issue tracker that mention discrepancies between how CPython normally sets up environment variables for the path to TCL stuff vs PyInstaller, but I'm not sure how much those help explain what's going on with the switch to the UTF-8 codepage. The rest of this issue will focus on the part about finding stdarg.ccs.

When compiling a project, CoilSnake invokes the CCScript compiler via the `ccscript` package and [passes it a custom `--libs` path](https://github.com/pk-hack/CoilSnake/blob/d806fa6e094000156184af8cbf8fb973d3dc34a8/coilsnake/ui/common.py#L126) in [CoilSnake's `assets` package](https://github.com/pk-hack/CoilSnake/blob/d806fa6e094000156184af8cbf8fb973d3dc34a8/coilsnake/util/common/assets.py). In PyInstaller/exe builds on Windows, this means that the libs path includes the name of the current user account, since that's where Windows puts temporary files.

The [C++ glue code](https://github.com/pk-hack/ccscript_legacy/blob/80a03df13cfe9bd3aab5f0d7d34aad7dd2c7bae0/src/pythonlib.cpp) in the ccscript converts all arguments, including paths, from their internal unspecified Python Unicode encoding into UTF-8 and passes the resulting array of `char *`s to `cccmain`. After this... they get [put into `std::string`s](https://github.com/pk-hack/ccscript_legacy/blob/80a03df13cfe9bd3aab5f0d7d34aad7dd2c7bae0/src/ccc.cpp#L151), and those strings get passed directly to the constructor of the filesystem libraries' path objects in many assorted places around the codebase. I assume this works the same way in these unofficial libraries as in C++17 and onward, where std::string is [assumed to contain text in the "native narrow encoding"](https://en.cppreference.com/w/cpp/filesystem/path/path.html) (i.e., the current Windows codepage!) and you need to use [a special function](https://en.cppreference.com/w/cpp/filesystem/path/u8path.html) or a `char8_t`-based string type to get things handled as UTF-8 properly. In other words, the encoding bookkeeping gets lost along the way and we end up doing a second extra conversion to UTF-8 that mangles the path.

I assume the on-demand conversions to `fs::path` were done out of concern for memory usage or something...? std::filesystem::path and boost::filesystem::path before it probably natively store the path as UTF-16 on Windows. (It looks like there's a default `fmt` argument to the constructor that might be able to change that? Can't tell.) That said, it's probably easier and less error-prone to convert everything that's treated as a file path to an fs::path as soon as possible, and store those in struct fields and everything. That way the conversion from UTF-8 only needs to be handled correctly once.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bad path encoding handling in the CCScript compiler #327

Symptoms

Unconfirmed ideas about the cause

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Bad path encoding handling in the CCScript compiler #327

Description

Symptoms

Unconfirmed ideas about the cause

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions