Wikipedia talk:WikiProject Wiki Syntax/archive01

Bad syntax
Every page says that "Fix those 5 problems with bad syntax". I'd rather use good or appropriate one :-) - Skysmith 07:56, 10 Nov 2004 (UTC)
 * Definitely ambiguous! I'll get it changed for next time to read "Fix those 5 syntax problems". All the best, Nickj 23:05, 10 Nov 2004 (UTC)

nowiki
Just note - if the sample text contains '&lt;nowiki&gt;' tags itself, they're not escaped (for example see (now deleted) second entry on 'Phylogenetic tree' on square-brackets-018.txt page) - this should be fixed (eg. replacing them with '&amp;lt;nowiki&amp;gt;') before next run, as it renders incorrectly and may be confusing. JohnyDog 13:54, 10 Nov 2004 (UTC)
 * Good point, thank you for that. For the next run I'll first do a replace all greater-than and less-than symbols with their HTML codes (as you suggest), and then surround it with nowiki tags (as per usual), which should prevent this. All the best, Nickj 23:05, 10 Nov 2004 (UTC)
 * I suggest you also replace occurences of &amp; with &amp;amp; so other entities don't get translated. Eric119 07:00, 18 Nov 2004 (UTC)

This is a bit silly. nowiki tags should be used for when some markup should not render, not to help out bots. One can simply add articles with intentionally misplaced brackets to an exclusion lists, in future. Dysprosia 03:21, 13 Dec 2004 (UTC)
 * I feel your pain, but if you take a step back, you might see things differently. The meaning of the  tag is really that anything inside should be displayed literally rather than interpreted as wiki-markup, which is a reasonable abstraction of what the project is partly intended to address. On the gripping hand, simply marking a whole article as "don't touch this" is bound to cause problems if other errors come to be introduced further down the line&mdash;I don't think any of us are fooling ourselves that no contributor is ever going to bork up the mark-up in a fixed article ever again. HTH HAND --Phil | Talk 15:26, Dec 13, 2004 (UTC)
 * Erm - I don't really understand what's "a bit silly"? The original subject (from JohnyDog) was that tags that appear in the sample text on the WikiSyntax pages should be escaped, so they don't make the sample text all wonky.  This doesn't seem to have anything to do with intentionally misplaced brackets or anything on article pages.  It seems like you're objecting to the suggestion to put nowiki tags around wikisyntax that should not be rendered as wikisyntax on a page, which I think Phil Boswell answered.  But I'm not sure if that's it. Er. JesseW 17:42, 13 Dec 2004 (UTC)
 * Semantics - nowiki means don't parse as markup. There are instances where intentional misplacement can occur and markup be used, which makes the wikitext a mess. Dysprosia 23:01, 13 Dec 2004 (UTC)
 * Perhaps instead of  , one could use ,   is both the opening and closing tag, because it is a shorthand for  . – ABCD 17:14, 18 Dec 2004 (UTC)
 * In the same way that  and (more appositely)   are legitimate. HTH HAND --Phil | Talk 09:23, Dec 20, 2004 (UTC)


 * Actually w3c specifies a blank before the slash.  and  . See . Erik Zachte 12:19, 20 Dec 2004 (UTC)


 * Ah, OK, I understand now - thank you all for clarifying that. -- All the best, Nickj (t) 22:59, 20 Dec 2004 (UTC)

Fixing non ISO-8859-1 characters?
How about a project to replace non ISO-8859-1 characters with their correct equivalents? For example, &#8364; becomes &euro;. These invalid characters are bad because they tend to get replaced by ?, automatically by the browser when someone edits. --Dbenbenn 08:25, 19 Dec 2004 (UTC)
 * This is work for a bot; Guanabot has done this in the past. Or, of course, the English Wikipedia could be converted to Unicode like the others, and the characters could be typed safely. Susvolans (pigs can fly) 13:24, 20 Dec 2004 (UTC)
 * Wouldn't this rather depend upon whether a particular user's browser was performing correctly? IMNSHO it would be better to always tend towards caution and replace anything which might break with something which won't. --Phil | Talk 09:45, Jan 24, 2005 (UTC)

URLs with Section specified?
The square bracket pages often included the section name as part of the URL, so that when you clicked on the link, it would position you to the section with the problem. Since starting on the parens section, I noticed that this doesn't do that. It was a great help, and if you could add that to this section, it would make life easier. ;^) wrp103 (Bill Pringle) - Talk 05:12, 22 Dec 2004 (UTC)


 * I know what you mean ... the sections markers were added for the third & fourth runs - the trouble is that the parentheses lists were generated as a one-off between the second and third runs, before the Wiki Syntax Project knew about section markers, so I just don't have section information for those lists, otherwise I would add it :( Sorry. -- All the best, Nickj (t) 05:53, 22 Dec 2004 (UTC)

As an aside...
After doing 120+ pages of bracket fixing, I hereby declare the term "parenthesis" to be a new form of mental illness... --Plek 03:14, 12 Jan 2005 (UTC)
 * Me: Doctor, I think I'm suffering from parenthesis.
 * Doctor: ''Ah, yes, that happens a lot to people whose colon has come to a full stop. Why don't you take this prescription and dash off to the pharmacy. In the meantime, I know this great quote that might help you get through this grave and difficult period...
 * Me: AAAAARGHHH!!!!


 * I know exactly how you feel! Although doing this work I've edited pages I would never otherwise have known about (e.g. List of people on stamps of Denmark), and learned more about certain subjects that I ever imagined - I never knew there were so many characters in Thomas the Tank Engine and Friends, that there was a Marquess of Northampton or a Manitoba Cooperative Commonwealth Federation! --Thryduulf 15:36, 12 Jan 2005 (UTC)

The Wiki Syntax Bar
The last of the parentheses is slain! Huzzah! Free drinks for everybody (rings bell)! --Plek 23:09, 12 Jan 2005 (UTC)
 * Well done everyone! Thanks for the offer of a drink, Plek - I'll have a Bitter please. Thryduulf 23:24, 12 Jan 2005 (UTC)
 * Plek pours Thryduulf a pint and happily puts a bowl of roasted parentheses on the counter
 * Brilliant! I'll have a pint of Guinness, please! - UtherSRG 00:12, Jan 13, 2005 (UTC)
 * Doctor forbade Guinness, have to take root beer instead. - Skysmith 09:39, 13 Jan 2005 (UTC)

Nice job folks, next run should attest to your efforts. &mdash; Davenbelle 00:37, Jan 13, 2005 (UTC)
 * I arrived late and contributed just one set, but I'm always ready to lift a Guinness with UtherSRG - Eustace Tilley 23:10, 2005 Jan 17 (UTC)

I'll have an exclamation pint. 68.88.234.52 21:53, 22 Jan 2005 (UTC)

I'll have a small Single malt Scotch, although I only did a little. Henry Troup 00:01, 2 Feb 2005 (UTC)

Source code
What queries were used to make this? r3m0t 18:37, 12 Feb 2005 (UTC)


 * It's not using a database query (other than to fetch the source text of the article). Rather, it's going through the source text from start to finish, and as it does so using a stack whereby any wiki syntax gets pushed onto or popped from the stack. If you get to the end of a line (for wiki links, italics, bolds, etc), or the end of the article (for everything else), and the stack is not empty, then you know that the syntax is malformed (i.e. not closed or opened properly). There's also separate checking for redirects (using a regular expression), and comparing whether any cur_is_redirect = '1' entries don't match the redirect regex - that's a bit of special case though, and in all other regards it's using a stack. That doesn't mean it can't be done as queries (in fact, it's possible it could be better to do so, because then the list of problems could probably be generated more quickly), but that's not how it's done at the moment. Hope that helps. -- All the best, Nickj (t) 00:16, 13 Feb 2005 (UTC)


 * That's useful, and I could program something like that myself, but only in PHP. PHP is slow. Very slow. In fact, exceedingly slow. Do you have anything faster? r3m0t 13:14, Feb 21, 2005 (UTC)

It's written in PHP currently. IMHO, PHP is fast enough. It does take a while to run (around 60 hours), but it's doing 3 different things at once in that time, to every "proper" (namespace = 0) article in the Wikipedia, namely:
 * 1) Checking the wiki syntax.
 * 2) Finding possible additional wiki links.
 * 3) Finding possible missing redirects.


 * I took care of a few double redirects, and noticed how mindlessly repetitive it was. So mindlessly repetitive that there's no reason they couldn't all be fixed with a script. 21:33, 4 May 2005 (UTC)

The slowest of these is the suggesting wiki links, since it involves checking whether every word (and some word combinations) in every article has a matching article or redirect of the same name. Given this, I don't think 60 hours is unreasonable, and I'm not sure that rewriting it in another language would make it significantly faster (I could definitely be wrong though!). -- All the best, Nickj (t) 22:11, 21 Feb 2005 (UTC)


 * Brion (I think) once said on wikitech-l that a port of the MediaWiki diff code produced a certain diff in 0.5 secs. PHP made the diff in 45.5 seconds. (This was a special case with almost every line changed.)
 * Spellchecking took 3.72 seconds in this benchmark - about 3 times slower than Perl or Python, and far slower than compiled C (or C++).
 * Word frequency count took 6.01 seconds; Perl 1.02; Python 1.36; C 0.36.

I've picked out the benchmarks most obviously involved in string manipulation. Well, I guess I'll reimplement it, for my own entertainment. So the (opening) tokens are: " ( { [ " $  & " and their closing tokens are " ) } ]   ' |} " $$  ; " correct? r3m0t 07:35, Feb 23, 2005 (UTC)

Those are some quite big speed differences! And if you're willing to implement a syntax checker, that's great because the more the merrier as far as I'm concerned ;-) With the wiki tokens, there are some multi-line tokens, and some single line ones. I've copied and pasted the code I'm using below, and tried to remove any stuff that's irrelevant to the area of syntax checking:

)|"    // Code tags                  . "(<div)|%i";      // div tags    $matches = preg_split ($pattern, strtolower($string), -1, PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY);    foreach ($matches as $format) {        if ($format == "&lt;nowiki&gt;") {            if ($in_nowiki == false) addRemoveFromStack($format, $format, false, $formatStack, $string);            $in_nowiki = true;        }        else if ($format == "&lt;/nowiki&gt;") {            if ($in_nowiki == true) addRemoveFromStack($format, "&lt;nowiki&gt;", false, $formatStack, $string);            $in_nowiki = false;        }        else if ($format == "&lt;math&gt;") {            if ($in_math == false) addRemoveFromStack($format, $format, false, $formatStack, $string);            $in_math = true;        }        else if ($format == "&lt;/math&gt;") {            if ($in_math == true) addRemoveFromStack($format, "&lt;math&gt;", false, $formatStack, $string); $in_math = false; }       else if ($format == "") { if ($in_comment == true) addRemoveFromStack($format, "<!--", false, $formatStack, $string); $in_comment = false; }       else if ($format == " ") { if ($in_code == true) addRemoveFromStack($format, "

Here's everything that I'm currently aware of that's wrong in the above code, or potentially missing from it:
 * Need to add &lt;pre&gt; tags to the list of tags to check.
 * Add &lt;tt&gt; tags?
 * Add a special case for "[[image:" and "]]" to allow multi-line syntax, because image tags can run across lines ?
 * Improve handling of div tags for XHTML compliance - &lt;div id="xxx" /&gt; is valid as both the opening and closing tag, because it is a shorthand for &lt;div id="xxx"&gt;&lt;/div&gt;
 * Add a special case for , which combines both the  and  cases? Otherwise cases like '85-'86 get handled wrong (the first 6 quotes get treated as a bold open and close, whereas the Wikipedia treats is as a bold open, then an italics open, then a single quote).
 * For nowiki, comment, math, and code tags, doubled up opening tags are not detected as an error, when the should be. For example &lt;code&gt;&lt;code&gt;&lt;/code&gt; should be listed as an error, but is not.
 * Actually, I'm not sure this is always the case. For example      is a valid way of generating the text '   '.  --HappyDog 02:27, 30 Mar 2005 (UTC) (PS - just take a look at the source to see what I had to type to generate that first string!)
 * The easiest way to generate those strings would have been &amp;lt;nowiki&amp;gt;&amp;lt;nowiki&amp;gt;&amp;lt;/nowiki&amp;gt; and &amp;lt;nowiki&amp;gt; – ABCD 02:58, 30 Mar 2005 (UTC)
 * Dash it all, you're right. I had wiki-markup on the brain! --HappyDog 03:32, 30 Mar 2005 (UTC)

Hope that helps! -- All the best, Nickj (t) 23:17, 24 Feb 2005 (UTC)

please use the edit summary
can you please add a summary of the change instead of just saying "fixed wiki syntax"? say "test] --> [[test Fix wikilink syntax blah blah blah". Then we don't have to go to each article to search every small thing you changed to find bad "fixes". - Omegatron 04:55, Mar 12, 2005 (UTC)

Can you please be more specific about what we did that was bad? For example, is there a particular error that we're misdetecting? If so, please let me know. Please realise that we're not perfect, but we're honestly not trying to introduce problems.

With the current batch, there's just two types of errors listed at the moment, namely:
 * Redirects that had slightly bad syntax (e.g. " #REDIRECT(Blah) ", instead of " "; For these the suggested summary is "Fix Redirect Syntax".
 * Redirects that were double redirects (e.g. A &rarr; B &rarr; C, which gets changed to A &rarr; C); For these the suggested summary is "Fix Double Redirect".

Which one of these was wrong? They're both fairly straightforward transformations, and hopefully neither should introduce new errors (but for example in the double-redirect case, if it was wrong for A to redirect B, and we then change A to redirect to C, then the source of the error was that A redirected to B, not that we changed A to redirect to C). -- All the best, Nickj (t) 07:42, 12 Mar 2005 (UTC)

Long image descriptions -- OK to remove?
I've hit a wall on WikiProject Wiki Syntax/square-brackets-001.txt, regarding pages containing several multi-line image descriptions, such as Apollodotus I, Apollodotus II and Apollophanes. I doubt that squashing these descriptions into a single line would be an acceptable solution. Would it therefore be alright to consider these pages "fixed", and rip them out accordingly? Fbriere 20:03, 23 Mar 2005 (UTC)
 * Yes, please do remove these from the list. Fixing this is on the to-do list. Basically the Wikipedia wants normal links to start and end on the same line, but image tags are allowed to run over multiple lines. A special case needs to be added to detect "[[image:" tags so that they are treated differently from "[[" tags (currently they're both treated the same), but this hasn't been added yet. Until this is added multi-line "[[image:" tags will be listed as malformed, even though they're OK. -- All the best, [[User:Nickj|Nickj]] (t) 23:13, 23 Mar 2005 (UTC)
 * As long as this is corrected before the next run; otherwise, we'll be removing them from the list over and over... (Especially since they end up appearing twice; I had to get rid of something like 15&mdash;20 occurences on 001.)  Fbriere 00:38, 24 Mar 2005 (UTC)

&lt;!--&gt;
This seems to be valid syntax for opening and closing a comment, i.e. and should probably be ignored, as they're valid and a complete waste of time to "fix". --Jshadias 23:07, 23 Mar 2005 (UTC)


 * :-) Interesting! That's a somewhat tricky one to parse correctly (at least with the current KISS approach, which will detect it as two open comment tags, rather than an open and then a close). It is valid HTML. However, it's definitely quicker to change a handful of articles than it is to change the parser (at least from my perspective), given that the total number of articles in the Wikipedia that use this construct must be quite small (e.g. less than 10). Do you have the titles of the articles that use this, and I'll get them changed? -- All the best, Nickj (t) 23:31, 23 Mar 2005 (UTC)
 * eh, I'll just change them when I come across them. As long as it's uncommon it's not a huge deal. --Jshadias 14:36, 24 Mar 2005 (UTC)

Victoria Cross recipients
The following appears on several articles:


 * recipient of the Victoria Cross, the highest and most prestigious award for gallantry in the face of the enemy that can be awarded to British] and [[Commonwealth forces.

Would it be possible to create a bot to take care of these? (Though I notice Google only shows 20 such articles. If this is accurate, I guess manual work would still be cheaper...)


 * Probably not worth adding a bot for this by a long shot. Getting permission to run and operate a bot is a political process (see Bots). You need to two levels of permission / non-objection (one from en, one from meta). The burden of proof that the bot is non-harmful is on the author. Many more regulations on bots have been discussed and may be added at some point. For 20 articles, it's (IMHO) really really really not even remotely worth the grief, hassle, and red tape. -- All the best, Nickj (t) 01:26, 24 Mar 2005 (UTC)

QC instead of ship-and-fix?
This is a very cool project, but...

It's commonly accepted in software development that it's a lot cheaper to fix a bug before the product is released than to release it and then have to go back and fix problems. It seems we would do well to do this sort of syntax checking right on the edit page (make it part of the "Show preview" function) instead of finding them in batch mode later. --RoySmith 01:02, 26 Mar 2005 (UTC)


 * I agree. The functions that provide the Wiki Syntax checking are in the section above (just added licensing to indicate that they're under the GPL), and I plan to release a very slightly updated version of those functions soon. I would like the see these incorporated into MediaWiki in some way (e.g. either a "Check Wiki Syntax" link in the "toolbox" section, or as part of "Show preview"), as the GPL licensing would allow this, and I encourage the MediaWiki developers to add this. Even with this though, there are still going to be errors that need to be cleaned up in batch mode, but it would be an improvement. -- All the best, Nickj (t) 04:27, 26 Mar 2005 (UTC)

msg: links
The syntax for templates is deprecated as of 1.5 where  will simply transclude Template:Msg:foo instead of Template:Foo, here's a list of pages from the 2005-03-09 dump that still use the syntax:


 * msg001.
 * msg002.
 * msg003.
 * msg004.
 * msg005.
 * msg006.
 * msg007.
 * msg008.


 * —Ævar Arnfjörð Bjarmason 02:05, 2005 May 15 (UTC)


 * In that list there are a lot of user pages, talk pages, and pages that have nowiki around the msg template. Should we just delete those out of your list or what? --Kenyon 05:08, May 16, 2005 (UTC)


 * Looks like a bot has been written / is being written to resolve these (i.e. it now has an SEP field around it) -- All the best, Nickj (t) 02:55, 3 Jun 2005 (UTC)

one left on div-tags-000
The one left on WikiProject Wiki Syntax/div-tags-000.txt is Main Page/French, and it looks too hard to do by hand. So if anybody has a good HTML fixing program to use on that, go ahead. Or I suppose we could just forget about that page, since it seems to be dead (last edit was Dec 14, 2004). --Kenyon 04:15, May 16, 2005 (UTC)


 * I had a go at it, hope it's fixed now, but if it's not I guess we'll know because it'll turn up in the next batch. ;-)  -- All the best, Nickj (t) 02:55, 3 Jun 2005 (UTC)

Project page "Completed pages" table
Is this at all necessary? Moving the links of the entries to a completely different table and also striking them out. I could see just keeping them in the one table and strikinging them out, or maybe moving them to a separate table, but not both. I'd like to join the two tables and keep the strike-outs. Anyone have an opinion on the matter? – Quoth 09:59, 20 May 2005 (UTC)


 * I don't feel too strongly about it, but by having a separate table, plus striking out, it makes sure that the uncompleted stuff is quite visible and all grouped together, and by also striking completed pages out it makes it doubly-clear that those things are already done (in case people are just skim-reading). (In other words: Yes, it is redundant, but maybe the redundancy is sometimes helpful). -- All the best, Nickj (t) 02:55, 3 Jun 2005 (UTC)

New double redirect cleanup project
Hello,

I've generated another list of double redirects as of the 20050516 database dump at User:triddle/double_redirect/20050516. I did not want to edit the project page since I'm not a member of this project. Perhaps someone else knows the best way to try to integrate my list with this project? Thanks. Triddle 21:18, Jun 23, 2005 (UTC)


 * Hi Triddle, Go for it! Please feel free to edit away to add your double redirect list. I'm not precious about what people list (as long as it's relevant, which this clearly is), and I tend to be pretty slack about running the current script that finds the syntax problems and generates the lists (typically I run it once every 2 or 3 months), so any help with producing up-to-date lists is more than welcome. In terms of integrating them, maybe just edit the pages (example) and add the updated lists (if you're happy to do this), and update the main page accordingly. Also, please be sure to add yourself to the "credits - software" section of the page. Add if you're feeling like doing some other extra stuff, two related things that may interest you are the listing of malformatted redirects, and broken redirects (i.e. redirects that point to something which isn't there). (e.g. to illustrate, here's an old example that's already been fixed up). The trick with the malformed ones is just applying a regex to all redirects and listing what doesn't match, and the main trick with the second is that people sometimes include a # in their redirect targets, so everything after the # has to be ignored. All the best, Nickj (t) 02:21, 27 Jun 2005 (UTC)

Working together?
Hello,

I've been getting pretty good at analyzing the dump files with perl and getting useful stuff done. I am curious if I could help work with this project? How are you preparing your lists? If you are having problems beating really hard on SQL databases then I might be able to help by having it done through analysis of the dump files. Let me know if you think I can help. Triddle 06:42, Jun 26, 2005 (UTC)


 * Absolutely you can help! The current problem finding script is a serious mess, as it integrates three different projects into the one script (suggester.php), so it lacks the clean separation that it really should have (and I tend not to have the time to do anything with it for extended periods, including add the separation that it really requires). It's also got quite slow (probably as a result of doing too much at once, plus I think I've maxed out the memory on the box I'm using, so it's starting to thrash to disk), taking now around 7 or 8 days to do a complete run. The current code is in PHP, but a cleaner and quicker reimplementation (in any language, such as perl) would be a very good thing. You can get the current source code here. This includes the source code for preparing the lists (in output_malformed_pages.php). All the best, Nickj (t) 02:21, 27 Jun 2005 (UTC)

Standards
I have some AlMac observations about possible similar interests between this project and the usability project. AlMac

For example, for the usability project, I suggested that there might be value in adding to the Tool box.

I just edited this page, please run some standard software to identify common typing errors, that I could fix right now. AlMac 4 July 2005 18:56 (UTC)

The edit suggestor bot
What was the script used to generate the vast lists of edit links for this project? It's needed desperately at WikiProject Disambiguation. --Smack (talk) 00:03, 24 July 2005 (UTC)
 * Smack - read this page. JesseW 16:22, 24 July 2005 (UTC)


 * Thanks. I didn't figure to look there :) --Smack (talk) 02:55, 25 July 2005 (UTC)


 * You want either of the output_*.php files from the ZIP file. They both generate a series of text files that can then be copied and pasted straight into the Wikipedia as lists of things that need doing. (I never got around to writing a bot to upload the files without human intervention). If you're suggesting disambiguations then probably the most similar file will be output_malformed_pages.php - you probably want the "outputToFile" function, and the global defines, and then delete the rest of the file, and then go from there to add the stuff that's specific to disambiguations. Hope that helps. All the best -- Nickj (t) 06:49, 30 July 2005 (UTC)