Talk:Internationalized Resource Identifier

This page needs serious attention.

I seem to recall that Tim Berners-Lee made a key distinction between Locator and Identifier (in his Weaving the Web). In my opinion an identifier is not "a kind of locator"!

Also the advantages are described from the wrong cultural viewpoint. —Preceding unsigned comment added by MihalOrel (talk • contribs) 09:05, 30 June 2008 (UTC)

Wikipedia itself offers a good illustration of the current problem. Look at http://bg.wikipedia.org/wiki/Начална_страница This is the URL for the start page of Wikipedia in Bulgarian. An IRI for the same page might look something like хттп://бг.уикипедия.oрг/уики/Начална_страница In other words from the point of view of the other, in this case the Bulgarian, an IRI is just something in Bulgarian when it concerns Bulgarian point of view of the world.

I guess I should try to find time to modify the text myself. (Михал Орела 09:36, 30 June 2008 (UTC))

ICANN
Supporting news from The Sofia Weekly 28 June 2008: Bulgaria Tables Request to Register Internet Domain in Cyrillic

Bulgaria became Monday the first nation to request the registration of an Internet domain in Cyrillic.

Bulgaria's representative at the Governmental Advisory Committee of the Internet Corporation for Assigned Names and Numbers has delivered a letter on behalf of the Chair of the State Agency for Information Technologies and Communications Plamen Vachkov to the ICANN President Paul Towmey in Paris requesting the right to register a domain in Cyrillic.

In submitting their letter, the Bulgarian authorities took advantage of the fact that the delegates at the ICANN Conference currently taking place in Paris are expected to make a decision for the setting up of multi-lingual first level domains.

ICANN manages the domains .com, .net, .info, and .org among others. Bulgaria is requesting to register and maintain domain .Ð±Ð³, which is likely the country's present code .bg but in Cyrillic.

The move is actively supported by the Bulgarian Uninet Association, which is working to promote the use of the Cyrillic alphabet on the net.

And a little further research on the ICANN web site http://www.icann.org/en/announcements/announcement-05jun08-en.htm shows how one might achieve progress. Examples are given at http://idn.icann.org/ (Михал Орела 09:55, 30 June 2008 (UTC))

W3C Internationalization
Another very good account of the major issues are available at the W3C site http://www.w3.org/International/articles/idn-and-iri/

In fact I am of the opinion that it proves to give good cultural arguments for the need to switch to IRIs now and some very cogent technical reasons why we must. (Михал Орела 14:04, 30 June 2008 (UTC))

Proposed New Text (ericP)
In computing, an Internationalized Resource Identifier (IRI) is a string of Unicode characters used to identify a name or a resource. IRIs provide a multi-language, multi-script alternative to URIs. IRIs are defined by RFC 3987.

Relationship to URI
While URIs are limited to a subset of the ASCII character set, IRIs may contain characters from the Universal Character Set (Unicode/ISO 10646), including Chinese or Japanese kanji, Korean, Cyrillic characters, and so forth. Many internet protocols such as HTTP and DNS use URIs or portions of them but publication languages such as AtomPub and RDF use IRIs to identify web resoruces. RFC3987 defines a mapping from IRIs to URIs, allowing, for example, IRIs to be dereferenced on the [World Wide Web]. The IRI http://рнидс.срб/cir/документи maps to the URI http://xn--d1aholi.xn--90a3ac/cir/%D0%B4%D0%BE%D0%BA%D1%83%D0%BC%D0%B5%D0%BD%D1%82%D0%B8 by The location bar in most conventional browsers is a compromise between URL and IRI. For instance, the Firefox 10 location bar will accept an IRI like http://рнидс.срб/cir/документи, but display it with the punycode domain name http://xn--d1aholi.xn--90a3ac/cir/документи.
 * mapping the Internationalized Domain Name (рнидс.срб) to a punycode Domain Name
 * percent-encoding the utf-8 form of the path (/cir/документи).

Advantages
There are reasons to see URIs displayed in different languages; mostly, it makes it easier for users who are unfamiliar with the Latin (A-Z) alphabet. Assuming that it isn't too difficult for anyone to replicate arbitrary Unicode on their keyboards, this can make the URI system more worldly and accessible.

Disadvantages
Mixing IRIs and ASCII URIs can make it much easier to do phishing attacks that trick someone into believing they are on a site they really are not on. For example, one can replace the "a" in www.ebay.com or www.paypal.com with an internationalized look-alike "a" character, and point that IRI to a malicious site. This is known as an IDN homograph attack.

While a URI does not provide people with a way to specify Web resources using their own alphabets, an IRI does not make clear how Web resources can be accessed with keyboards that are not capable of generating the requisite internationalized characters.

DOCTYPE Puzzle
Readers of this page may be interested in thw following discussion: --Guy Macon (talk) 22:37, 1 June 2021 (UTC)
 * Reference desk/Computing