-
Notifications
You must be signed in to change notification settings - Fork 31
Description
The following formats have decoders that can emit #\Replacement_Character even though their encoders don't accept that: :cp1251, :iso-8859-3, :iso-8859-6, :iso-8859-7, :iso-8859-8, :iso-8859-11. :ebcdic-international has a similar issue, but with #\U+FFFF instead. :ebcdic-us seems to substitute various Latin-1 code points such as the private use characters, but for what little I know about EBCDIC, that might actually be the correct behavior.
I would expect octets-to-string output to be valid input to string-to-octets, even if chaining the two need not result in the same bytes. It's not quite clear what the behavior should be because the only other encodings in babel that run into this edge case (:cp1252, :gbk, :eucjp, :cp932) lack error checks for it entirely. I actually have a patch more or less prepared for that already, but it should be consistent with the rest.
In my opinion, signalling an error is the right thing to do when errorp is set and otherwise the ASCII substitution byte (which seems to be available in all supported encodings) could be used. decoding-error conveniently does this out of the box.
Note that this overlaps heavily with the first half of #41. Both have the same underlying issue.