User:Omegatron/Date formatting

Proposal for ISO 8601 dates to be interpreted as a type of wikicode and formatted as per your user preferences. See my comments on 4582.

Adding an HTML-like  tag to the wiki syntax (intended to be placed around almost every date in the encyclopedia) is silly and degrades the editing experience even further for non-technical users. Almost all dates should be formatted, so we should just recognize them in plain text and format them by default. We can use  tags for the exceptions, just like we do for autoformatted URLs, ISBNs, and RFCs.

But this is apparently too difficult, so the ISO-only format is a compromise between explicit HTML tags and parsing of free text.

This would also hopefully stop the endless and pointless discussions about excessive date linking.

Current syntax
Currently, dates are written like this:



which renders (for a reader who is not logged in) as:

"On December 8, 1941, Roosevelt said, "yesterday, December 7th, 1941 - a date which will live in infamy". On September 11, 2001, the aircraft on this route was hijacked"

or

"On 8 December 1941, Roosevelt said, "Yesterday, December 7th, 1941 - a date which will live in infamy". On 11 September 2001, the aircraft on this route was hijacked"

or other things, depending on preferences.

Proposed syntax
In the proposed system, it would be written as:



which renders as:

"On December 8, 1941, Roosevelt said, "Yesterday, December 7th, 1941 - a date which will live in infamy". On September 11, 2001, the aircraft on this route was hijacked"

or

"On 8 December 1941, Roosevelt said, "Yesterday, December 7th, 1941 - a date which will live in infamy". On 11 September 2001, the aircraft on this route was hijacked"

The proposed syntax would simplify and internationalize the markup, and allow a distinction between formatting and linking of dates. Most dates will be formatted automatically, important dates can still be linked, and dates in quotations can be formatted as per the original, usually without any special syntax. (I had enough trouble thinking of quotes with dates in them; quotes with an ISO date in them should be relatively rare.)

The Manual of Style already recommends against using ISO 8601 dates in prose, so they shouldn't be very common. The only place they are recommended is in long lists and tables for conciseness and ease of comparison. In these rare cases, we can just use nowiki tags.

Special cases
Dates before the start of the Gregorian calendar and after the year 9999 are only allowed to be represented "by mutual agreement of the partners in information interchange". But since we are both the sender and receiver, we can decide how we want to use these.

Instead of converting early dates to the proleptic Gregorian calendar, we should just treat earlier dates verbatim and determine the calendar from context, so  &rarr; April 5, 1400 and   &rarr; February 13, 234. This is what the current system does:


 * &rarr; 1400-04-05

The dates have to be between 0000-9999 for the basic range, so  would not be recognized as a date; same as the current system:


 * 0234-02-13
 * 234-02-13

Dates after year 9999 (yes, we do have some) could be written according to the standard with a leading plus sign:  &rarr; April 5, 15232. The current system cannot handle dates beyond 9999-12-99:


 * 15232-04-05
 * +15232-04-05

Other possibilities
ISO 8601 provides for all kinds of stuff, like durations, recurring events, etc. Some things might be of use to us:

Ranges
ISO 8601 also handles date ranges, as brought up in the bug report. They are just written with a slash between the two dates, so currently we have:



Instead, we could use:



which could automatically be formatted with the en dash according to user preferences:

"George Washington (February 22, 1732 – December 14, 1799) was a central, critical figure in the founding of the United States"

(Note the footnote in the real article that explains the calendars used for the dates. This is the way it should continue to be done; not with an esoteric conversion.)

The MoS says "If the autoformatting function is used, the opening and closing dates of the range must be given in full and be separated by a spaced en dash." But with ISO dates, you can just use the slash format, and the date would be rendered as per the MoS automatically:


 * &rarr; 5–7 January 1979 or January 5–7, 1979

Of course this does not work with things like circa, birth and death locations (which the MoS says to put outside the date range anyway), etc.

Something to think about.

BC / BCE
Could this also put an end to the stupid BC / BCE wars? ISO dates can be BC simply by adding a minus sign ( &rarr; August 23, 1000 BC). Our software already handles this:


 * &rarr; August 23, 1001 BC or 23 August 1001 BC

Could we make the presentation of BC or BCE a user preference? We could extend the syntax for AD/CE as such:


 * &rarr; August 23, 1000 BC or August 23, 1000 BCE
 * &rarr; August 23, 1000
 * &rarr; August 23, 1000 AD or August 23, 1000 CE

Times
Apparently am/pm time is uncommon outside the US...

Absolute times and user signatures
I had my User preferences set to my local time zone, but I eventually changed it back because discussion posts are always in UTC and it was hard to correlate in my head.

It would be nice if the signature date was saved to the wikisource like "2008-01-13 01:19:39Z" instead, which was then auto-formatted according to my preferences and corrected to my local time zone.

Proposal for defaults
This proposal is really just about the wiki syntax, and not the default formatting for unregistered users, but how about this for defaults?


 * 1) ISO-formatted dates in the wiki syntax are interpreted as dates that should be localized, unless they have a nowiki tag, as explained on this page.
 * 2) The server will format these dates for logged-in users as per their user preferences, just as it currently does with linked dates.
 * 3) Default localization format for all unregistered users is "28 May 1996" as per RFC 2822.
 * 4) But, instances like this that are meant to be localized will also be given a class by the server, like   or something, so that third-party tools can recognize from the HTML which dates should be left verbatim and which can be localized.
 * 5) Later, if this bugs the Americans and Canadians enough, they can put something in Common.js to reformat these tagged dates for non-logged in users, too.
 * 6) * Check navigator.language (or the IE-specific navigator.browserLanguage, navigator.systemLanguage, navigator.userlanguage; whichever is most likely to be accurate) to guess the user's preferred language
 * 7) * Implement pop-up manual preferences with a cookie
 * 8) * or whatever

This way:


 * The server doesn't have to cache multiple formats for each page (we depend on the fact that there are many more readers than logged-in editors when caching)
 * The default for readers is unambiguous and easily readable by everyone, even without javascript
 * Most users with javascript (94% or so) will see a localized default anyway
 * Anyone can login and select a user preference manually (which disables the javascript)
 * Mediawiki developers don't need to worry about anything beyond the wiki ISO syntax parser (the JS, browser detection, cookies, or whatever can be written later by regular users based on the span tags), so it is more likely to be accepted by them and implemented. :)

Conflicts? Deficiencies?
List things here that would conflict with this and how common they could be expected to be.
 * Template parameters: Just fix the templates by removing the links so that they let the auto-formatting do its job.
 * Direct quotations: How many have ISO formatted dates in them? Maybe in code or something?  Just use nowiki tags.
 * Math: How common is a math expression like xxxx-yy-zz, where xxxx is between 1000 and 9999, yy is between 10 and 12, and zz is between 10 and 31? Just put spaces between the numbers or use nowiki tags.
 * Is there a way to format day-month without year? Possibly  .  See talk.