Replacement character handling is inconsistent/incomplete

The following formats have decoders that can emit `#\Replacement_Character` even though their encoders don't accept that: `:cp1251`, `:iso-8859-3`, `:iso-8859-6`, `:iso-8859-7`, `:iso-8859-8`, `:iso-8859-11`. `:ebcdic-international` has a similar issue, but with `#\U+FFFF` instead. `:ebcdic-us` seems to substitute various Latin-1 code points such as the private use characters, but for what little I know about EBCDIC, that might actually be the correct behavior.

I would expect `octets-to-string` output to be valid input to `string-to-octets`, even if chaining the two need not result in the same bytes. It's not quite clear what the behavior should be because the only other encodings in babel that run into this edge case (`:cp1252`, `:gbk`, `:eucjp`, `:cp932`) lack error checks for it entirely. I actually have a patch more or less prepared for that already, but it should be consistent with the rest.

In my opinion, signalling an error is the right thing to do when `errorp` is set and otherwise the ASCII substitution byte (which seems to be available in all supported encodings) could be used. `decoding-error` conveniently does this out of the box.

Note that this overlaps heavily with the first half of #41. Both have the same underlying issue.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Replacement character handling is inconsistent/incomplete #42

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Replacement character handling is inconsistent/incomplete #42

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions