Bad UTF-8 filename encoding

I have some files with a bad encoding. I think they came from a FAT32 pendrive using the latin-1 encoding, and are now in a EXT4 filesystem. Once I try to see the directory in rover, those files appear in a empty line; not even the size is shown.

I can replicate the behaviour by creating a bogus file (I'm using spanish locale, but any UTF-8 should work):
```
$ touch $(echo "bad\0355char")
$ ls
'bad'$'\355''char'
$ locale
LANG=es_ES.UTF-8
LC_CTYPE="es_ES.UTF-8"
LC_NUMERIC="es_ES.UTF-8"
LC_TIME="es_ES.UTF-8"
LC_COLLATE="es_ES.UTF-8"
LC_MONETARY="es_ES.UTF-8"
LC_MESSAGES="es_ES.UTF-8"
LC_PAPER="es_ES.UTF-8"
LC_NAME="es_ES.UTF-8"
LC_ADDRESS="es_ES.UTF-8"
LC_TELEPHONE="es_ES.UTF-8"
LC_MEASUREMENT="es_ES.UTF-8"
LC_IDENTIFICATION="es_ES.UTF-8"
LC_ALL=
$ rover

```
The problem is that the \355 character (í in latin encoding) is 0xED and hence a 2 multibyte starting byte in UTF-8. As the next character does't continue the 2 multibyte encoding, is an incorrect UTF-8 string.

The functions [mbstowcs()](https://github.com/lecram/rover/blob/e6fea6580e309f2bf98fcd74156c5d4d3e0a5f9d/rover.c#L500) and [swprintf()](https://github.com/lecram/rover/blob/e6fea6580e309f2bf98fcd74156c5d4d3e0a5f9d/rover.c#L504) are failling silently, returning -1, as they cannot deal with the string. So nothing gets copied to the WBUF buffer, and the row remains empty.

If you create a bogus directory, the behavior is even more interesting. The WBUF gets reused from the last usage, and the filename seems to be named as the CWD or the previous directory.

```
$ mkdir bad-$'\355'
$ rover
```

I was thinking in how to solve the issue, perhaps some workaround like the ls(1) program [does](http://git.savannah.gnu.org/cgit/coreutils.git/tree/src/ls.c#n4317), replacing the spurious character with an ? symbol or similar. Deletion and other operations work fine.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bad UTF-8 filename encoding #30

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Bad UTF-8 filename encoding #30

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions