Fixed national character encoding. by sp9usb · Pull Request #104 · smiley22/S22.Imap

sp9usb · 2015-01-30T22:41:06Z

No description provided.

NiKiZe · 2015-01-30T23:02:45Z

This is not a real fix, just a "it works in more cases" (if the encoding is 8bit Shift-JIS for example this would fail)
It is wrong to make an assumption about the encoding.
The real fix would be to handle everything internally as bytes, find the encoding of each part of the message itself and then decode to strings with the correct encoding. (Decoding headers the same way as the main body, if the encoding is defined there as an 8bit variant)

An intermediate solution would be to have the "global" S22Imap encoding as a setting that can be changed on runtime. An example of this is #96 but that PR has several other changes that makes it "invalid"
I'm highly against changing ASCII to some other hard coded value.

Some of this is discussed in great detail in #47

jstedfast · 2015-03-04T15:27:03Z

For what it's worth (and I don't meant to beat a dead horse), MimeKit handles undeclared 8-bit text in headers in what I would describe as probably the only real sane way possible.

MimeKit's parser optionally takes a ParserOptions instance which provides various configuration options for the parser. One of which is a CharsetEncoding option which is used as a fallback charset when the parser encounters undeclared 8-bit text in headers.

The process goes something like this: The parser tries to convert the 8-bit header value[1] into a System.String using UTF-8. If that fails, then the parer tries the charset provided in the ParserOptions. If that also fails, then it falls back to ISO-8859-1.

The reason for this order of preference is that the latest email specifications allow for UTF-8 encoded email headers, and so going forward, it's probably reasonable to try UTF-8 first since that is an accepted standard. Even if it weren't, though, UTF-8 is still a good charset to try first since it is quite common (due largely to the fact that many systems these days use UTF-8 as their default locale charset). The user supplied charset is tried next because if the user selects ISO-8859-1 (since that is their locale charset, perhaps? or because they live in a western country where latin1 is the most common?), ISO-8859-1 will convert any sequence of 8-bit bytes cleanly, whether it really is ISO-8859-1 or not, so it should ONLY be tried last (which is coincidentally why it is the last charset always tried).

Now... even if none of those 3 charsets was the correct charset (e.g. maybe the actual charset is Big5 but the user set ParserOptions.CharsetEncoding to, say, koi8-r), then the user will still have the option of trying the conversion again after the parser has finished parsing the message by locating the header in the MimeMessage.Headers list and calling Header.GetValue (Encoding encoding) and passing in some other charset encoding to try over and over as many times as they want until the user is satisfied.

There's actually finer granularity than this since in an address header (such as To, Cc, etc) it's possible for the name string for each email address to be in some other charset (it's convoluted, I know, but I think we've all seen how convoluted email software out there can be).

Fixed national character encoding.

49afdfb

NiKiZe mentioned this pull request Mar 6, 2015

Change Encoding to Windows-1251 #109

Open

NiKiZe mentioned this pull request May 29, 2015

Bad encoding with gmail #115

Open

sp9usb closed this Mar 19, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixed national character encoding.#104

Fixed national character encoding.#104
sp9usb wants to merge 1 commit intosmiley22:masterfrom
sp9usb:master

sp9usb commented Jan 30, 2015

Uh oh!

NiKiZe commented Jan 30, 2015

Uh oh!

jstedfast commented Mar 4, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

sp9usb commented Jan 30, 2015

Uh oh!

NiKiZe commented Jan 30, 2015

Uh oh!

jstedfast commented Mar 4, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants