User:Tabletop/sandbox

The wikipedia search function does not search for the word sic properly, and picks up many false matches, currently 2521. The word sic or [sic] used to identify a word that is misspelled dediberately and which should not be corrected.

Misspellings
Here are some sample misspelled words to help test variations of the wiki search function.


 * 1) Triangle is sometimes misspelled traingle [sic]. - - 00016
 * 2) Triangle is sometimes misspelled tringle [sicsic]. - 00026
 * 3) Triangle can also be misspelled  traingel [_sic_]. - 00000
 * 4) Triangle is sometimes misspelled traingle sic.
 * 5) Triangle is sometimes misspelled tringel sicsic.-00000
 * 6) Triangle can also be misspelled  trianle _sic_. -00001
 * 7) Triangle is sometimes misspelled traingle [ssiicc].
 * 8) Triangle is sometimes misspelled tringle ssiicc.
 * 9) Triangle can also be misspelled  traingle _ssiicc_.

Current matches
At time of creation, the following number of matches are made:


 * 2521 - sic - - -- 12896
 * 0000 - sicsic - - 00028
 * 0000 - ssiicc - - 00000
 * 0007 - ssii - - - 00021

Search function behaviour
If a misspelled word is detected by the wikipedia search function on a page it takes some time - several hours - for the match to be deleted. This also appears to be the case when a page has a misspelled word is added to a page.

Changes to languages to make it easier to ...

 * 1) Proposed sicsic instead of sic to make it easy to search for.
 * 2) In the 18th century, some changes were made to the Tamil language script by the Italian missionary Constanzo Beschi, known in Tamil as Veeramamunivar, to make it easier to print.
 * 3) Chinise man wants to name son @ because it looks nice.  Problem, @ is reserved word for email systems, and would probably fail to work properly..
 * 4) In 1975 when Personal Computers were starting to appear, there was considerable discusion regarding the new word needed to describe 8 bits (8 Binary digITs). The word Byte was selected, to be spelled that way and not Bite or Bight.
 * 5) The dot over the letter "i" ( called a tittle was gradually introduced in the early second millenium because it aided clarity especially with cursive writing.

Debate: Search for deliberately mispelled words.
Deliberately misspelled words such as "seperate" are marked with the word (sic) including the parentheses, thus "seperate (sic)"

Suppose that a spelling corrector person wishes to correct such misspelled words wishes to search for them by searching for the word (sic). Well the wikipedia search function matches a lot of false matches since sic is a valid word in its own right in Latic, and is also a substring or words like baSIC or pepSICo.

Just as it helps clarity to dot the letter "i", it would help to change the (sic) to something else less likely to make false matches. Two suggestions are achieved by doubling the (sic) word, to say (sicsic) or (ssiicc).

Debate!

There are other problems with the wikipedia search function which can be debated later. Tabletop 12:05, 18 August 2007 (UTC)

search words

 * [/wiki/Special:Search?ns0=1&ns14=1&ns100=1&search=seperate&fulltext=Search seperate] (separate) (00109 matches)
 * [/wiki/Special:Search?ns0=1&ns14=1&ns100=1&search=separate&fulltext=Search separate] (separate) (99159 matches)
 * [/wiki/Special:Search?ns0=1&ns14=1&ns100=1&search=(sic)&fulltext=Search (sic)] (sic) (2525 matches)
 * [/wiki/Special:Search?ns0=1&ns14=1&ns100=1&search=(sicsic)&fulltext=Search (sicsic)] (sicsic) (00013 matches)
 * [/wiki/Special:Search?ns0=1&ns14=1&ns100=1&search=(ssiicc)&fulltext=Search (ssiicc)] (ssiicc) (00004 matches)

Note: the match count seem to be updated about once per day, and include blank matches where the match word has be changed and has disappeared, but the page is still reported as having that match.

Trinidad

 * Port of Spain - terminus, capital and port
 * Tunapuna - junction in east
 * Chaguanas - junction in south east
 * Sangre Grande - terminus in east
 * Rio Claro - terminus in south east
 * Princes Town - terminus
 * Gasparillo - junction
 * San Fernando
 * Siparia - terminus in south