.@ Tony Finch – blog


I’ve started paying attention to the effort to specify fully internationalized email. The main thing that is not yet i14ed is email addresses. There is now a spec for i14ed domain names, so you can have éxample.com (with the accent) if you want, but you can’t yet use it in an email address - and even if you could, you still couldn’t have an i14ed local part, which is a pain if you are Chinese and want your name to appear in your email address.

Unfortunately, fixing this limitation requires upgrading all of the email infrastructure. Fortunately, this means we can fix the failure to move to 8 bit email when we last upgraded the infrastructure for MIME. At the moment, if you send a message with non-ascii characters in the subject or body of a message, it gets encoded in an inefficient and ugly way in order to remain compatible with the mentality of American computing in the 1970s. Email i18n will at last fix that.

The key question is how to manage the transition gracefully and maintain backwards-compatibility with what’s currently deployed. The MIME answer was to upgrade user software but leave the transport as it was. MIME messages are mostly comprehensible to users of old MUAs, and in theory the transport doesn’t have to care about message content. The latter is no longer true and all kinds of email software now has to understand MIME.

The email address i18n (EAI) answer is to downgrade an i14ed message when it has to be transported to a host running old code. There’s some precedent for this in the SMTP extensions for 8 bit MIME and binary MIME, and (as well as 8 bit headers) what EAI adds to these is email address downgrading.

At the moment the draft specification defines two extra parameters that are sent with an i14ed address. The badly-named ATOMIC parameter has a value which can be y or n, to indicate whether the i14ed address can be algorithmically downgraded - something like the translation from an i14ed domain name into punycode. The optional ALT-ADDRESS parameter has a value which is a trad-ascii address which can be used in place of the i14ed address. Yes, these parameters have overlapping functionality: an i14ed address with an alt-address and with atomic=y can be downgraded in two different ways. Ugh.

I haven’t seen a rationale for this - perhaps it is the result of merging two alternative proposals - but apart from being ugly it seems to me that it will be seriously problematic. So I sent the following to the IETF’s EAI mailing list :-

Why not just require that downgrading is always possible? This would make both options unnecessary, and I like the idea because it reduces the number of protocol options and I can see some awkward interop and usability problems caused by their existence.

I see problems coming from the fact that the correct values for the ALT-ADDRESS and ATOMIC options must be accurately communicated from the recipient to the sender before the message is sent. In the case of a reply, how can senders extract these values from the messages they are replying to?

In other cases, I imagine that it might be possible to get the values from some structured electronic medium, such as an internationalized vcard or an ldap directory with an internationalized schema or perhaps an extended form of mailto: URI. In many cases the utf8 address will be cut-and-pasted from a document or manually typed in from paper, and there’s not the slightest chance that you can expect users to understand the importance of the IMA metadata or to transcribe it correctly. Furthermore, there’s no way that any automatic system in the sender’s MUA or MSA can correct a mistake.

What happens when there is a mistake? Does bouncing an erroneously downgraded message have the same effect on the sender as bouncing a message because it cannot be downgraded? If a sender’s address book mixes up the recipient’s alt-address with one of the recipient’s unrelated non-utf8 addresses, the incorrect end site may or may not handle the downgraded message sensibly.

Note that if all utf8 addresses are downgradeable, then internationalizing email addresses in vcards or LDAP or URIs etc. is relatively simple: the email address is still just a single field (without new IMA metadata); it just has a more relaxed syntax.