.@ Tony Finch – blog


(This article is a much-expanded version of a comment I wrote months ago on mathew's blog.)

There's a programmers' rule of thumb that timestamps should always be stored in a form that's unambiguously inter-convertible with UTC, or some reasonable approximation such as POSIX time_t. In particular, you should never store local time without also storing its timezone, and you should represent timezones as UTC offsets instead of using a familiar but ambiguous abbreviation. For textual representations the right answer is usually ISO 8601 / RFC 3339.

This rule of thumb is good if you are storing the times of events that happened in the past, such as in logs or in message headers. However it isn't good for events that happen in the future, when those events have any bearing on time as used by people in the outside world. The reason for this is the instability of time zones.

Bad solutions to timezone problems

The problem is particularly clear for repeating events. If you specify an event's time of day using a fixed offset from UTC then it will be an hour wrong for half of the year when because the time zone offset is different in winter and summer time. This is why Unix's cron scheduler works in local time.

The solution chosen for iCalendar is to store the complete timezone data (summer and winter offsets and the changeover schedule) alongside the event. It has a number of problems. Firstly bloat, though iCalendar reduces that by allowing multiple time stamps to refer to the same timezone data. Secondly, it isn't robust against changes to a timezone's DST schedule.

An outstanding example of failure caused by storing timestamps in the wrong format was provided by the US DST schedule change in 2007. People running Microsoft Exchange had to run a special tool that scanned the entire database to find timestamps that needed adjusting. If the data model had been designed properly this would not have been necessary.

The underlying error is to do the timezone data lookup too early. As we learned from David Wheeler, the fix is to add a layer of indirection so that the lookup can be delayed. Instead of storing the numerical offset, store a reference to the timezone, e.g. its name from the Olson tz database.

iCalendar TZID values are typically Olson timezone names, or something very similar, but this is not required by the specification. There is still no interoperable standard for timezone names, so iCalendar objects have to include the complete VTIMEZONE data, not just the name. There are plans to fix this, but it's unclear if the standard timezone name registry will be based on the Unicode Common Locale Data Repository, or perhaps the Olson tz database (depending on how its management changes around ADO's retirement) or something else.

Unfortunately timezone names are still not a complete solution. As well as DST schedule changes, there are often timezone boundary changes. If an event is to happen in a place that is affected by a boundary change, and its time is recorded with respect to the place's old timezone, then this time will be wrong after the change. Indiana has provided many instances of this problem, since it straddles a timezone boundary, each county in the state chooses its timezone independently, and every so often some of them will change their mind about whether they want to follow Central Time or Eastern Time or even both (depending on the time of year). The solution to unpredictable timezone boundary changes is, of course, another layer of indirection.

My solution

The time of an event in the future should be recorded in local time coupled with the event's location. The location is used to look up the timezone, and the timezone data determines the UTC offset. (I should probably clarify that Olson tz names are not locations even though they are derived from locations. It's nonsense to say that the Edinburgh Tattoo will occur in Europe/London.)

Recording the location of an event instead of its timezone makes all sorts of problems simpler, not just problems resulting from timezone mutations. A lot of the benefit comes from just making the data aware of locations and the effect they have on scheduling. Also, perhaps unexpectedly, it allows extremely simple platforms that are unaware of timezones!

Often all that is required of a PDA calendar is to keep a single person's appointments, and the times only need to be meaningful wherever that person is going to be when the event occurs. In this simple case, if all the times are stored in local time at the appointment's location, the PDA does not need to do any timezone translation in order to display them in a useful way: the stored time is good enough. In this scenario, the only timezone manipulation that occurs is the user manually resetting the PDA's clock when a timezone offset change happens (because of travel or because of DST).

It's more usual to want to share calendar events, in which case you soon encounter situations where it's useful to know when events in other timezones will occur according to your own local time. If the software knows your current location, it's a straightforward matter to translate times from place to place. This should not be done significantly earlier than when displaying the time. For example, in a calendaring app based on early binding of events to timezones, the programmer might be tempted to translate an event's time to the user's local timezone when importing the event. This optimization is clearly bogus in a location-based app, because it amounts to moving the event to a location where it is not occurring!

One case where it seems not to make sense to fix an event in a location is when it occurs in more than one place: telephone calls or (worse) conference calls. The thing to do in this situation is to decide on a primary location, such as the location of the organizer, and list the other locations as supplementary. This allows the software to display all the relevant times, so it's immediately apparent what the timing is for each participant and if it happens to be inconvenient for any of them. If politicians happen to muck around with any of the timezones the organizer is naturally responsible for any adjustments that may be necessary, so it makes sense to keep their view of the event as straight-forward as possible.

An interesting case is travel between timezones. It's usual for flight bookings to give departure and arrival times in the local time of the origin and destination locations, which I always find confusing. However if a computer has this information, it can easily display both times in both timezones and work out the total travel time. It would be even nicer if your PDA could use this information to automatically update its idea of your location, and therefore its idea of local time. If it can use this method to work out where you will be in the future it could also display future events with all three relevant times: their native time, the time according to your current timezone, and according to the timezone of your location when the event occurs.

iCalendar has the concept of a "floating" timestamp, which represents the time in whatever is your current timezone. Floating timestamps cannot be communicated reliably to another person, because the time they represent will be interpreted according to the recipient's location, not yours. One way to make them reliable would be to add another layer of indirection: attach an event to a person and provide a way of looking up the person's location. This is absurdly complicated and an invasion of privacy, and I think it shows that the concept of floating events (occurring wherever you are at the time) is unwise. They do make sense for purely personal events, such as wake-up alarms or medication reminders - you don't want your PDA to tell you to wake up in Cambridge as usual when you are currently in New York. But if an event involves more than one person and its location is in doubt, it's better to give it a provisional location so that changes have to be communicated explicitly.

With a local time plus location model, if a timezone does change, the only events that are affected by the change from the point of view of the software are also affected from the point of view of the human world. For instance, a conference call that spans multiple timezones may need to be rescheduled because its local time may change in some of the participants' locations, and this may lead to scheduling clashes that were not there when the call was originally organized. Events at a single location that occur near the old and new clock changes may need to be rescheduled to cope with inserted or omitted hours - but it's rare to schedule events for the small hours of Sunday morning. The majority of events that fall between the old clock change and the new clock change are not affected: no special bulk data fix-up tools are required.

Complications

The local time plus location model is not quite sufficient as I have described it so far. If an event is scheduled near the time the clocks go back, the local time by itself is not enough to tell if it occurs in the hour before or the hour after the change. The way to fix this is to add a disambiguation flag. However, once again the usual way this is done is wrong. POSIX struct tm, for example, has a tm_isdst flag, which states whether the broken-down time is expressed in summer time or not. The problem here is that this flag can disagree with the timezone data: it's nonsense for the flag to be zero for a time in the middle of summer. It also means correct timestamps get turned into nonsense when politicians mess around with timezones.

The correct solution is for the flag to apply only when the the time is ambiguous. At other times the flag must be ignored and should be omitted when generating timestamps. In effect the semantics of the flag are "prefer the earlier/later time if there's more than one". When phrased like this, the flag also works in weird cases. William Willett's original proposal was to phase DST in and out by skipping or repeating 20 minutes on four successive Sundays in April and September. My silly "sunrise time" idea involves changing the clock by a minute or so most nights. The isdst flag doesn't have enough bits to identify which version of local time a timestamp belongs to when there are more than two, but the disambiguation flag never needs to distinguish between more than two.

The second complication is those odd locations that do not have a single agreed idea of local time. Decades ago in the USA, arguments over DST sometimes meant that different parts of government (federal/state/local) would have different ideas of local time; see David Prerau's book "Saving the Daylight" for examples. At present the most well-known instance of this problem is Xinjiang, the Uighur Autonomous Region of China. Officially, the whole of China is on Beijing time, UTC+8. This is a bit uncomfortable in Xinjiang in the far west of the country, so the independent-minded Uighurs use their own time, UTC+6, even though their Han neighbours use the national time. (See the LA Times for a report on this subject.)

I think the way to accommodate places like Xinjiang is to treat locations as a geo-political concept rather than a purely geographical one. So you might have "Xinjiang (Han)" and "Xinjiang (Uighur)" in your location database. Xinjiang also breaks the Olson/Eggert tz naming scheme, so I think there's unlikely to be any particularly elegant way to handle it.

The third complication is how to specify locations. A significant problem for many calendaring applications is that we lack a database of which locations are in which timezone. This is usually viewed as a user-friendliness problem, but for my proposal it is more fundamental. Furthermore there's an incompatibility of scale between the kind of location that makes sense for a timezone database (e.g. centred around large cities) and the kind of location that makes sense for a meeting (e.g. room C304). I think it's reasonable to make people enter enough detail about locations to fill the gap between the room-level resolution and the city-level resolution. Only one person should ever have to enter the details of a particular location into a system (or set of connected systems) after which everyone else can re-use the data, so the burden should be small.

This leads to another problem with iCalendar: its idea of a location is both too weak and too complicated. You can specify latitude and longitude, which isn't very practical for software that lacks a built-in map, nor can it be translated into a timezone in Xinjiang. You can also (as well as or instead) specify a human-friendly location as a free text string with an optional URL pointing to a more computer-friendly representation. This latter can be anything, though you hope it is something sensible like a vCard containing a postal address. A vCard address has a fixed format which I suppose can be stretched a bit to cover meeting rooms and other ad-hoc locations in such a way that they can be tied to a timezone, but it isn't designed for the purpose.

Conclusion

Sadly it seems that the world is stuck with iCalendar, and when timezone-related problems occur calendar programmers blame politics or DST, rather than their inadequate data model. What is worse is that it appears to be very unlikely that a properly designed calendar program could interoperate with iCalendar data without loads of ad-hockery and lossage because of the mis-match between the data models.

How annoying.