User:לערי ריינהארט/tests/bugzilla:1691

de:Benutzer:Gangleri/tests/bugzilla:65 sv:Användare:Gangleri/tests/bugzilla:65


 * 1591 not 1691 [[Image:Smiley.png|16px|;-)]]
 * This bug is a "duplicate" of 65.
 * Bug #65 is fixed now, see 65. Thanks Brion!
 * Bug #563 is fixed now, see 563. Thanks Brion!
 * reported to pyWikipediaBot-users (see also PyWikipediaBot)

examples

 * ro:Constantin Brâncusi
 * ro:Constantin Brancu&
 * ro:Constantin Brâncu&
 * ro:Constantin Br&
 * 1) Constantin Brâncusi
 * 2) Constantin Brancu&
 * 3) Constantin Brâncu&
 * 4) Constantin Br&


 * ro:Wikipedia:Caterogizare/Categorie Orase în Slovacia
 * ro:Wikipedia:Caterogizare/Categorie Ora&
 * ro:Wikipedia:Caterogizare/Categorie Ora&
 * ro:Wikipedia:Caterogizare/Categorie Ora&
 * 1) w:ro:Wikipedia:Caterogizare/Categorie Orase în Slovacia
 * 2) w:ro:Wikipedia:Caterogizare/Categorie Ora&
 * 3) w:ro:Wikipedia:Caterogizare/Categorie Ora&
 * 4) w:ro:Wikipedia:Caterogizare/Categorie Ora&

11:56, 2005 Feb 28 (UTC)

 * #1 - #8 works properly at de:], [[:en:, sv: ... everywhere !

explanations

 * Here are differnt links. Please look at what the link looks like and what title it targets.
 * If you look at this page and compare #3 and #4 you will not see any difference. The difference will show up only if you edit the page.
 * The diffrenece is that #3 uses â or î ehile #4 uses for these characters the &#nnnn; encoding too.
 * The examples are using three types of characters:
 * 1) 7 bit
 * 2) 8 bit
 * 3) UTF-8 characters
 * It is very strange that you can use either 8-bit characters in interlanguage (also InterWiki w:... at en: only) links or UTF-8 characters in the link, see links #1 and #2 (and #5 and #6)
 * If you click on link #3 the target will be somthing else.
 * Only link #4 works.
 * This behaviour is not transparent to the users using a copy and paste method to insert interlanguage links. It is discriminatory to a lot of languages using combined types and should be considered as a critical error. Users will not be aware that common method #3 will fail, that the very technical method #4 is required or that their interlanguage links will be remouved sooner or later. Gangleri | [ Th] | T 17:06, 2005 Feb 25 (UTC)

addtional tests
sv:Användare:Gangleri/tests/bugzilla:65 de:Benutzer:Gangleri/tests/bugzilla:65
 * same examples at
 * at another Latin-1 type Wikipedia
 * at a UTF-8 type Wikipedia


 * #1, #2 and #4 works properly at [[:sv:] but #3 not
 * #1 - #4 works properly at [[:de:]
 * #5 - #8 will all fail because "w:" is used

test links for pyWikipediaBot-users
an alternative would be &amp;amp;#x25; stands for &amp;#x25; for &#x25;
 * Notes:
 * in order to document here "what you see as documentation" is coded differently as "what is coded in the links"; the usual method is used:
 * 1) &amp;amp;#nnnn; stands for &amp;#nnnn;
 * 2) &amp;amp;#xnnnn; stands for &amp;#xnnnn;
 * 3) &amp;amp;#37; stands for &amp;#37; for &#37;

if you make a preview you will see links
 * all links have been inserted with the copy and paste method
 * 1) changed to &amp;#nnnn; encoding and
 * 2) containing characters in the range 128 - 255
 * you should know that they will fail
 * there are more "workarounds" to fix the links
 * 1) using &amp;#nnnn; encoding for all characters > 127
 * 2) using &amp;#xnnnn; encoding for all characters > 127
 * 3) using hardcoded %nn for all characters > 127
 * 4) a mixture of the methods above
 * only #1 is described below
 * see also: character encoding at User:Gangleri/tests/Unicode ISO 8859-1/Table of Unicode characters, 128 to 999

links to items from sk:Category:Slovenské mestá
&amp;Scaron; &Scaron; (see alanwood.net) &amp;scaron; &scaron; (see alanwood.net)
 * important note:
 * Unicode ofers multiple ways to go.
 * "opticaly" the following two characters "seems" to be the same
 * uppercase letters
 * &amp;#138; &#138; &amp;#x8A; &#x8A;
 * &amp;#352; &#352; &amp;#x160; &#x160;
 * probably other more or less advanced Unicode or HTML coding
 * lowercase letters
 * &amp;#154; &#154; &amp;#x9A; &#x9A;
 * &amp;#353; &#353; &amp;#x161; &#x161;
 * probably other more or less advanced Unicode or HTML coding
 * because of "the exact match" for accessing titles with MediaWiki only one is allowed:
 * sk:Hnúš& fails coded as sk:Hnúš&amp;
 * fails also as sk:Hn& coded as sk:Hn&amp;
 * works as sk:Hn& coded as sk:Hn&amp;


 * sk:Hnúš& fails coded as sk:Hnúš&amp;
 * fails also as sk:Hn& coded as sk:Hn&amp;
 * works as sk:Hn& coded as sk:Hn&amp;


 * works also for all titles containing only characters A-Z, a-z and "-" 
 * sk:Category:Slovenské mestá works - only Latin-1
 * sk:Category:Banská Bystrica works - only Latin-1
 * sk:Category:Bratislava
 * sk:Category:Fi& works - UTF-8
 * sk:Category:Humenné works - only Latin-1
 * sk:Category:Poprad
 * sk:Category:Se& works - UTF-8
 * sk:Category:Žilina works - only Latin-1
 * sk:Banská Bystrica works - only Latin-1
 * sk:Banská Štiavnica works - only Latin-1
 * sk:Bardejov
 * sk:Bojnice
 * sk:Bratislava
 * sk:Brezno
 * sk:Brezová pod Bradlom works - only Latin-1
 * sk:Byt& works - UTF-8
 * sk:Bánovce nad Bebravou works - only Latin-1
 * sk:Detva
 * sk:Dobšiná works - only Latin-1
 * sk:Dolný Kubín works - only Latin-1
 * sk:Dubnica nad Váhom works - only Latin-1
 * sk:Dudince
 * sk:Dunajská Streda works - only Latin-1
 * sk:Fi& works - UTF-8
 * sk:Galanta
 * sk:Gbely
 * sk:Gelnica
 * sk:Giraltovce
 * sk:Handlová works - only Latin-1
 * sk:Hanušovce nad Top& fails coded as sk:Hanušovce nad Top&amp;
 * sk:Hlohovec
 * sk:Holí& fails coded as sk:Holí&amp;
 * sk:Hri& fails coded as sk:Hri&amp;
 * sk:Humenné works - only Latin-1
 * sk:Hurbanovo
 * sk:Ilava
 * sk:Jelšava works - only Latin-1
 * sk:Kežmarok works - only Latin-1
 * sk:Kolárovo works - only Latin-1
 * sk:Komárno works - only Latin-1
 * sk:Košice works - only Latin-1
 * sk:Kremnica
 * sk:Krompachy
 * sk:Krupina
 * sk:Krásno nad Kysucou works - only Latin-1
 * sk:Krá& fails coded as sk:Krá&amp;
 * sk:Kysucké Nové Mesto works - only Latin-1
 * sk:Leopoldov
 * sk:Levice
 * sk:Levo& works - UTF-8
 * sk:Lipany
 * sk:Liptovský Hrádok works - only Latin-1
 * sk:Liptovský Mikuláš works - only Latin-1
 * sk:Lu& works - UTF-8
 * sk:Malacky
 * sk:Martin
 * sk:Medzev
 * sk:Medzilaborce
 * sk:Michalovce
 * sk:Modra
 * sk:Modrý Kame& fails coded as sk:Modrý Kame&amp;
 * sk:Moldava nad Bodvou
 * sk:Myjava
 * sk:Nemšová works - only Latin-1
 * sk:Nitra
 * sk:Nová Ba& fails coded as sk:Nová Ba&amp;
 * sk:Nová Dubnica works - only Latin-1
 * sk:Nováky works - only Latin-1
 * sk:Nové Mesto nad Váhom works - only Latin-1
 * sk:Nové Zámky works - only Latin-1
 * sk:Námestovo works - only Latin-1
 * sk:Partizánske works - only Latin-1
 * sk:Pezinok
 * sk:Pieš& fails coded as sk:Pieš&amp;
 * sk:Podolínec works - only Latin-1
 * sk:Poltár works - only Latin-1
 * sk:Poprad
 * sk:Považská Bystrica works - only Latin-1
 * sk:Prešov works - only Latin-1
 * sk:Prievidza
 * sk:Púchov works - only Latin-1
 * sk:Rajec
 * sk:Rajecké Teplice
 * sk:Revúca works - only Latin-1
 * sk:Rimavská Sobota
 * sk:Rož& fails coded as sk:Rož&amp;
 * sk:Ružomberok works - only Latin-1
 * sk:Sabinov
 * sk:Senec
 * sk:Senica
 * sk:Sere& works - UTF-8
 * sk:Se& works - UTF-8
 * sk:Skalica
 * sk:Slia& works - UTF-8
 * sk:Sládkovi& fails coded as sk:Sládkovi&amp;
 * sk:Snina
 * sk:Sobrance
 * sk:Spišská Belá works - only Latin-1
 * sk:Spišská Nová Ves works - only Latin-1
 * sk:Spišská Stará Ves works - only Latin-1
 * sk:Spišské Podhradie works - only Latin-1
 * sk:Spišské Vlachy works - only Latin-1
 * sk:Stará Turá works - only Latin-1
 * sk:Stará & fails coded as sk:Stará &amp;
 * sk:Stropkov
 * sk:Strážske works - only Latin-1
 * sk:Stupava (Slovensko)
 * sk:Svidník works - only Latin-1
 * sk:Svit
 * sk:Svätý Jur works - only Latin-1
 * sk:Tisovec
 * sk:Tlma& works - UTF-8
 * sk:Topo& works - UTF-8
 * sk:Torna& works - UTF-8
 * sk:Trebišov works - only Latin-1
 * sk:Tren& works - UTF-8
 * sk:Tren& works - UTF-8
 * sk:Trnava
 * sk:Trstená works - only Latin-1
 * sk:Turzovka
 * sk:Tur& works - UTF-8
 * sk:Tvrdošín works - only Latin-1
 * sk:Ve& fails coded as sk:Ve&amp;
 * sk:Ve& fails coded as sk:Ve&amp;
 * sk:Ve& fails coded as sk:Ve&amp;
 * sk:Ve& fails coded as sk:Ve&amp;
 * sk:Vranov nad Top& works - UTF-8
 * sk:Vrbové works - only Latin-1
 * sk:Vráble works - only Latin-1
 * sk:Vrútky works - only Latin-1
 * sk:Vysoké Tatry - Mesto works - only Latin-1
 * sk:Zlaté Moravce works - only Latin-1
 * sk:Zvolen
 * sk:& works - UTF-8
 * sk:& works - UTF-8
 * sk:Šahy works - only Latin-1
 * sk:Šamorín works - only Latin-1
 * sk:Ša& fails coded as sk:Ša&amp;
 * sk:Šaštín - Stráže works - only Latin-1
 * sk:Štúrovo works - only Latin-1
 * sk:Šurany works - only Latin-1
 * sk:Žarnovica works - only Latin-1
 * sk:Želiezovce works - only Latin-1
 * sk:Žiar nad Hronom works - only Latin-1
 * sk:Žilina works - only Latin-1


 * references: Slovak language

things to discuss

 * it looks to be necessary to have an "alias" translation table for pywikipediabot; hopefully only one for Latin-1 and one for UTF-8 type wikis and not one for every language;

ú

 * de:Aaiún
 * coded as :de:Aai ú n
 * de:Aaiún
 * coded as :de:Aai &amp
 * de:Aai&
 * coded as :de:Aai &amp
 * de:Aai&
 * coded as :de:Aai &amp;uacute; n
 * de:Aai&uacute;n
 * coded as :de:Aai &
 * de:Aai%C3%BAn
 * coded as :de:Aai &
 * de:Aai%FAn
 * all above works generating http: //de.wikipedia.org/wiki/Aai &#37;C3&#37;BA n


 * en:Aaiún (nl:Aaiún, sv:Aaiún. etc.)
 * translated to forms similar to http: //en.wikipedia.org/wiki/Aai &#37;FA n

š

 * de:Baška (Slowakei)
 * coded as Ba š ka (Slowakei)
 * de:Baška (Slowakei)
 * coded as Ba &amp;
 * de:Ba&
 * coded as Ba &amp
 * de:Ba&
 * coded as Ba &
 * de:Ba%C5%A1ka (Slowakei)
 * coded as Ba &
 * de:Ba%9Aka (Slowakei)
 * all above works generating http: //http://de.wikipedia.org/wiki/Ba &#37;C5&#37;A1 ka_&#37;28Slowakei&#37;29


 * en:Baška (nl:Baška, sv:Baška. etc.)
 * translated to forms similar to http: //en.wikipedia.org/wiki/Aai &#37;9A n

š failures

 * coded as Ba &amp;
 * de:Ba&
 * coded as Ba &amp;
 * de:Ba&
 * fails generating http: //de.wikipedia.org/wiki/Ba &#37;C2&#37;9A ka_&#37;28Slowakei&#37;29


 * coded as Ba &amp;scaron; ka (Slowakei)
 * de:Ba&scaron;ka (Slowakei)
 * fails generating http: //de.wikipedia.org/wiki/Ba &#37;26scaron&#37;3B ka_&#37;28Slowakei&#37;29

from 65

 * Brion:
 * NEVER use &amp;#154; or &amp;#x9A; for s-caron. Numeric character references always refer to Unicode code points, and U+009A is a reserved control character, *not* s-caron. It might appear to work sometimes due to a fluke and crappy workarounds for compatibility with a Windows bug, but should definitely not be relied upon. Use the real Unicode number, &amp;#353;. The same goes for the other characters in the Windows CP1252 extended range (see ISO 8859-1 ).
 * For the moment the only named character references that will work in links are the ISO 8859-1 ones (s-caron does not appear in ISO 8859-1). Stick with the numbers for now.

MediaWiki, PyWikipediaBot and en:Category:Diacritics
By the way: Why Edvard Beneš is redirected to Edvard Benes? OK! If the en: comunity wants this so it's fine for me
 * From the example above it can be seen that &amp; acute ; is supported by MediaWiki and
 * From the example above it can be seen that &amp; scaron ; not.
 * As you can see scaron are used in the code and titles:
 * 1) en: Josef_Hir%26scaron%3Bal
 * 2) en: Edvard_Bene%26scaron%3B
 * Which of the HTML coding methods (see en:Category:Diacritics, ) are supported by MediaWiki and wich not? Wich are corrected by PyWikipediaBot?
 * Regards Gangleri | [ Th] | T 05:47, 2005 Feb 26 (UTC)