Talk:Canonicalization

Normalisation
Article singles out UTF-8 as requiring normalisation. I'm not sure what it's referring to. Surrogates? UTF-16 has them too. All Unicode encodings require normalisation, because on code-point level (so basically after decoding UTF-something) there are ambiguities - see NFKC for example. —Preceding unsigned comment added by Jjjjjjbbb (talk • contribs) 23:44, 13 August 2008 (UTC)


 * The article refers to UTF-8 "overlong" forms, such as 0xC080 being an alias for 0x00. Of course, the current UTF-8 standard defines these overlong forms as being invalid, so any decoder of UTF-8 to sequence of codepoints must reject any characters which are encoded in such overlong forms.  Canonicalisation in this context would refer to either the rejection mandated by the standards or their replacement by the non-overlong forms.  --Wtrmute (talk) 01:33, 28 August 2011 (UTC)


 * Improved the section. 80.235.83.183 (talk) 18:49, 24 March 2015 (UTC)

XML section
Personally I don't think the entire list of possible changes should be laid out in the XML section. My suggestion would be to replace the bullet list with something like "In addition, a full XML canonicalization would also ensure the document is encoded as UTF-8, normalize attribute values, and remove superfluous namespace declarations. For a full list of canonicalization changes, see the W3C specification." The section already links to the W3C specification for XML canonicalization. JadeMatrix (talk) 23:53, 11 November 2015 (UTC)

Biological taxonomy
I have removed the following unsourced text and image because I see some problems with it and suspect that there are others:

"In zoological nomenclature, a type species (Species typica) is the species name with which the name of a genus or subgenus is considered to be permanently taxonomically associated, i.e., the species that contains the biological type specimen(s), and is used as "canonical type" or reference model to a genus."
 * I think this material could easily be confusing, because a reader might think that the type species or type specimen has to be "central" to a genus, which is not the case, the types are merely contained within the circumscription. More seriously, though, I think this text might get into difficulties with what is a "canonical type", and what is a "canonical object" (and the type specimen is an object: potentially quite confusing). There is no "canonicalization" (the title of this page) involved in biology, and since this page starts off with "in computer science" and this material is not computer science, it doesn't belong. Sminthopsis84 (talk) 17:47, 14 April 2016 (UTC)

External links modified
Hello fellow Wikipedians,

I have just modified 1 one external link on Canonicalization. Please take a moment to review my edit. If you have any questions, or need the bot to ignore the links, or the page altogether, please visit this simple FaQ for additional information. I made the following changes:
 * Added archive https://web.archive.org/web/20011010233257/http://www.wikipedia.com/ to http://www.wikipedia.com/

When you have finished reviewing my changes, please set the checked parameter below to true or failed to let others know (documentation at ).

Cheers.— InternetArchiveBot  (Report bug) 09:21, 14 November 2016 (UTC)