Wikipedia:Bots/Requests for approval/FrescoBot 2


 * The following discussion is an archived debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA. The result of the discussion was Symbol keep vote.svg Approved.

FrescoBot 2
Operator: Basilicofresco

Automatic or Manually assisted: Auto (where not stated differently)

Programming language(s): python (pywikipedia)

Source code available: standard pywikipedia

Function overview: remove useless piping within wikilinks links syntax

Links to relevant discussions (where appropriate): Wikipedia talk:Piped link

Edit period(s): every few months (or less) using the xml dump file

Estimated number of pages affected: 20k (rough guess)

Exclusion compliant (Y/N): Y

Already has a bot flag (Y/N): Y

Function details: as stated in Piped link we should never use piped links to convert first letter to lower case. This well-tested bot will correct occurences like  country code second-level domain . Already in use on italian Wikipedia.

More examples:
 *  meteor shower --> meteor shower 
 *  meteor shower --> meteor shower 
 *  Meteor shower --> Meteor shower 
 *  Meteor shower --> Meteor shower 

Discussion
Three things:

Tim1357 (talk) 15:15, 28 December 2009 (UTC)
 * 1) User:SmackBot already does a lot of these general fixes.
 * 2) Why not do a BRFA for a whole laundry list of general fixes?
 * 3) For a pre-parsed list of articles that already have this error, go here.


 * Thank you for your reply.
 * I just checked and SmackBot lists "Clean up piped links" in "Abandoned tasks"
 * This task (about 15k errors) sounds enough for the moment, but if you prefer I can add other (less common) wikilink fixes.
 * Uhmm.. there are very few pages here. I'm pretty sure it's better to run these regex on a recent dump file.
 * Basilicofresco (msg) 15:55, 28 December 2009 (UTC)
 * God answers. I still think it is a bit too task specific (it could be doing other things while at the article) which is why I hold my position that SmackBot re-add this to its list of general fixes. I like keeping all of these types of general fix tasks in one place (or at least under one bot username). Ill get Rich to comment here (the bots operator). Tim1357 (talk) 22:24, 28 December 2009 (UTC)

TBH, I really don't like the idea of a bot making 16000 edits that will result in no visible change to the rendered page. Its like WP:NOTBROKEN, except with possibly fewer benefits. Mr.Z-man 22:43, 28 December 2009 (UTC)


 * I understand your concern, however WP:NOTBROKEN is a different matter: eg. redirects can indicate possible future articles and linking for no reason something different is against intuitiveness. Moreover in that page you can read Introducing unnecessary invisible text makes the article more difficult to read in page source form. Here we are talking instead about cleaning the source of the article from a forbidden / deprecated syntax (never use piped links to convert first letter to lower case and in the talk page the opinion was "They are redundant and should be removed"). -- Basilicofresco  (msg) 00:19, 29 December 2009 (UTC)
 * Note The task was not abandoned by Smackbot, rather it became part of AWB's general fixes. SmackBot uses these general fixes, as it is run in AWB, and this task is therefore not needed. Tim1357 (talk) 01:05, 29 December 2009 (UTC)


 * I see... is AWB only manually assisted? This task has not to be manually assisted. -- Basilicofresco  (msg) 02:02, 29 December 2009 (UTC)
 * It can be, yes. However SmackBot uses the bot option, so it essentially clicks save all by itself.--72.169.191.155 (talk) 17:14, 29 December 2009 (UTC)
 * So SmackBot is already doing this task and there is no point in a new bot doing it? Regards  So Why  19:28, 29 December 2009 (UTC)
 * AWB fixes piped links as general fixes. Any editor processing a page with AWB fixes piped links. -- Magioladitis (talk) 21:56, 29 December 2009 (UTC)
 * Not quite. What we should really be doing is BRFA'ing as many of these minor fixes as possible a blanket addition to any other bot run. It would mean minor clean-up could be done with relative impunity.  Historically I have usually run SB with Gen fixes on, except when there has been a known bug, and it really is a big help to the 'pedia - all the AWB'ers doing this, now that AWB is mature.  Rich Farmbrough, 22:27, 29 December 2009 (UTC).

There is nothing wrong with having another smackbot. However, I think we should ask Rich to give the bot's code to User:Basilicofresco, so we can cram as many general fixes into one edit as possible. Tim1357 (talk) 02:40, 30 December 2009 (UTC)

I like the idea of adding general fixes to other bots. --IP69.226.103.13 (talk) 16:59, 30 December 2009 (UTC)


 * This is a purely cosmetic change to the wikitext with no visible effect. It should not be the sole purpose for editing a page. –xenotalk 16:21, 31 December 2009 (UTC)
 * Agree. Making large numbers of edits for (I was going to say cosmetic reasons, but it doesn't even change how the page displays) essentially no reason is generally undesirable. Unless it's also going to make other (more substanial) changes, I don't think it would be appropriate to approve this. A le_Jrb talk  21:59, 31 December 2009 (UTC)
 * I decided that I wasn't actually that clear. My point is that an incorrecly piped link really doesn't affect the source code of an article in any major way. If it significantly disrupted the ability to edit an article, it might be a useful task even with no visible changes. But piped links don't do that, so these changes wouldn't make a whole lot of difference in general - either to the displayed page, or to the source code. And 16k edits that make virtually no difference is possibly not the most useful way of expending resources. A le_Jrb talk  22:09, 31 December 2009 (UTC)
 * I concur with the above and am leaning towards declining the bot if its task range is limited to non-visible changes.  MBisanz  talk 01:33, 3 January 2010 (UTC)

Well, I can easily add other fixes, for example: Basilicofresco (msg) 13:50, 3 January 2010 (UTC)
 * auto
 * double piping, eg: |coup
 * misplaced quotes "", eg: "The Strange Life of Nikola Tesla" --> "The Strange Life of Nikola Tesla" (I will avoid special cases like "Heroes" or "—We Also Walk Dogs")
 * misplaced quotes “”, eg: “Hagushichao” --> “Hagushichao”
 * misplaced bold, eg: Wales Rally GB --> Wales Rally GB (I use italic because it is almost always more appropriate) (inappropriate boldface text will be left in place for evalutation by human readers)
 * missing space before a year, eg: October2007
 * misplaced space in front of a wikilink, eg: show Ultra Q
 * manually assisted
 * misplaced italic, eg: On the Genealogy of Morals --> On the Genealogy of Morals (I will avoid false positives eg Noetherian ring)
 * misplaced space at the end of a wikilink, eg: Nile Delta and (There could be false positives like Caribbean nations )
 * misplaced full stops, eg: non-melancholic mood disorders. --> non-melancholic mood disorders. (I still have to check for false positives)
 * Are these real issues though? I can't say I've seen any such problems on any content pages before, and I'm not sure this really requires a bot. – Juliancolton  &#124; Talk 20:47, 3 January 2010 (UTC)
 * If wikipedia were a professional encyclopedia, they would be real issues; but it's not, so maybe it doesn't matter, and parsimonious links on wikipedia are not something to strive for. -- IP69.226.103.13   |   Talk about me.   02:21, 4 January 2010 (UTC)

Yes, they are. The above examples are real. Do you prefer to carefully eye-check 3 million of pages and manually edit about 4k pages? (1:820, guess based on random articles sampling) -- Basilicofresco  (msg) 08:53, 4 January 2010 (UTC)
 * For the misplaced bold one, you shouldn't be changing bold to italic without manually checking the edit, as that's not always going to be correct. Even if these are added to the bot, for the majority of pages it edits, its still only going to make cosmetic changes to the wikitext. Mr.Z-man 17:15, 5 January 2010 (UTC)

FrescoBot 2 bis
Never mind, if the majority considers "cosmetic" my proposal about useless pipings, I can remove it. I'm here to help you, not to raise my editcount. So, what about the second group of replacements? -- Basilicofresco  (msg) 20:59, 5 January 2010 (UTC)
 * Personally, I think the problem here is that it is generally undesirable to have a bot making only cosmetic changes, unless it is demonstrated that there is a significant problem of some kind that requires their fixing... Either way, I have a few problems. You say under misplaced quotes that you will avoid special cases - how will you identify these, in order to avoid them? A list won't do, because it's possible that new ones will be introduced. As for your manual assisted changes, you don't actually need a bot to make these - just get out AWB and do them (following all AWB guidelines etc., possibly while making more substantial changes). A le_Jrb <sup style="color:blue;">talk  18:29, 6 January 2010 (UTC)


 * *a bit frustrated*
 * I'm not going to use AWB.
 * IMHO this guideline and this discussion were enough to consider the original proposal not just "cosmetic". It sounds a bit strange to include it in AWB's general fixes and Smackbot, but refuse any other help. (well never mind)
 * Theoretically you're right about a list... but did you take time to check how many special cases are actually present on wikipedia (just few) and how many broken wikilinks (a lot) there are due this problem? Without to mention the fact that any future article with a (correct!) quoted name will also come with a redirection from the unquoted name. I considered this issue and it's not a problem.
 * Manual assisted changes: did you ever used pywikipedia? Do you have an idea how much time it takes to manually open-correct-save 1k pages? Replace.py in manually assisted mode does 2 important things: 1.finds the errors within dump file 2.speeds up the editing process about 10x / 20x.
 * The above sostitutions are useful. Is there a human willing to manually correct them all over the enciclopedia?
 * Basilicofresco (msg) 11:38, 8 January 2010 (UTC)

Fixing also external links
I'm testing on italian wikipedia a new set of regex for syntax errors in external links. It probably would be nice to add them here in order to create a single task. I will add details here as soon as possible. -- Basilicofresco  (msg) 09:02, 9 January 2010 (UTC)

Function details: (new proposal) Using replace.py I will apply several accurate regular expressions in order to correct these errors:

Example: wrong wikisource --> replaced wikisource = error as appears in the article --> replaced text as appears in the article


 * External links:
 * [HTTP://www.google.it link] --> link = [HTTP://www.google.it link] --> link
 * link --> link = link --> link
 * [http:www.google.it link] --> link = [http:www.google.it link] --> link
 * [http:/www.google.it link] --> link = [http:/www.google.it link] --> link
 * link --> link = link --> link
 * [link] --> link = [link] --> link
 * [link --> link = [link --> link
 * [http:://www.google.it link] --> link = [http:://www.google.it link] --> link
 * [http//www.google.it link] --> link = [http//www.google.it link] --> link
 * somethinglink --> something link = somethinglink --> something link
 * linksomething --> link something = linksomething--> link something
 * few other very rare variants - (manually assisted)
 * Flat Broke Blues Band Photo Album --> Flat Broke Blues Band Photo Album = Flat Broke Blues Band Photo Album --> Flat Broke Blues Band Photo Album
 * Google Image Result for http://www.flatbrokebluesband.com/photos.php --> = Google Image Result for http://www.flatbrokebluesband.com/photos.php -->


 * Wikilinks:
 * |sidescan sonar --> sidescan sonar = |sidescan sonar --> sidescan sonar
 * sonar --> sonar = sonar --> sonar
 * sonar --> sonar = sonar --> sonar
 * sonar --> sonar = sonar --> sonar
 * "sonar" --> "sonar" = "sonar" --> "sonar" - I will avoid the few (24) exceptions, eg. "Them")
 * (sonar) --> (sonar) = (sonar) --> (sonar) - I will avoid the few (28) exceptions, eg. (not adam)
 * 'sonar' --> 'sonar' = 'sonar' --> 'sonar' - I will avoid the few (1) exceptions, eg. 'Hours'
 * sonar, --> sonar, = sonar, --> sonar, - I will avoid the few (1) exceptions, eg. Alors voilà,
 * somethingsonar --> something sonar = somethingsonar --> something sonar
 * something sonar --> something sonar = something sonar --> something sonar
 * 1992-1998 --> 1992-1998 = 1992-1998 --> 1992-1998 - any type of dash, I will avoid the few (26) exceptions, eg. 1967–1970, I will also avoid any decade eg. 1950-1959 (I just created these redirects to decades in order to capture any red-but-plausible wikilink)
 * 1992-98 --> 1992-98 = 1992-98 --> 1992-98 - any type of dash, I will avoid the few (2) exceptions, eg. 1806-20, I will avoid potentially ambiguous intervals (cross-century) eg. 1862-34
 * Nile Delta and --> Nile Delta and = Nile Delta and --> Nile Delta and - (manually assisted)
 * sonar. --> sonar. = sonar. --> sonar. - (manually assisted)
 * few other very rare variants - (manually assisted)


 * Internal links conversion:
 * ECFS --> ECFS = ECFS --> ECFS (handles piping and common url encoding)
 * Flag of Brunei --> Flag of Brunei = Flag of Brunei --> Flag of Brunei (properly handles files and categories)
 * René-Maurice Gattefossé --> René-Maurice Gattefossé = René-Maurice Gattefossé --> René-Maurice Gattefossé (handles links to wikipedia in foreign languages)
 * Mme de Genliss --> Mme de Genliss = Mme de Genliss --> Mme de Genliss (converts a good number of unicode sequences)
 * Tools and Techniques for Hindi Computing --> Tools and Techniques for Hindi Computing = Tools and Techniques for Hindi Computing --> Tools and Techniques for Hindi Computing (does not screw up with exotic not-recognized unicode sequences)

Discussion about new proposal
What will you do in the event of people creating articles that you don't currently have in your exceptions? A le_Jrb <sup style="color:blue;">talk 19:41, 16 January 2010 (UTC)


 * Excluding years intervals, there are currently only 60 exceptions over 3162000 articles. This means 60/3162000 = 1/52700. The probability of replacing a red wikilink (article missing) with a "incorrect" red wikilink (missing quotes/brakets/etc.) is about 1/52700. Pretty low. Moreover any new article with such a peculiar name will likely came with a redirect from the cleaned name. The risk is negligible and acceptable considering this task for example will fix +3k broken wikilinks. However what I can do is to periodically update the exception list, make my best to avoid any error and promptly correct any problem. -- Basilicofresco  (msg) 13:15, 17 January 2010 (UTC)


 * Periodic updates to the list sounds like a reasonable work-around. How often do you think you'll update the list of exception cases? Josh Parris 04:31, 19 January 2010 (UTC)


 * I plan to run the script every time is available a new dump file and I'm going to check for new exclusions before every run. It is the safest method. -- Basilicofresco  (msg) 08:41, 19 January 2010 (UTC)

Perhaps a heuristic you could use is that if the link is a redirect, you can repair; if it's an article, you can't repair. How does this fit with the exceptions you have identified? Josh Parris 12:53, 18 January 2010 (UTC)


 * Of course, I created the exclusion list starting from the existing articles with a matched name. -- Basilicofresco  (msg) 19:51, 18 January 2010 (UTC)

Meanwhile I also tested the not trivial conversion of "internal links" in wikilinks (take a look above). As you asked after my first proposal I put toghether a good bunch of several tasks. Let me know if there are any other common link problems I can solve of if you would like I reintroduce also the cleaning from useless piping. -- Basilicofresco  (msg) 09:35, 21 January 2010 (UTC)

I would like to know which ones are currently fixed by WP:AWB and/or are part of WP:CHECKWIKI i.e. they are fixed in daily basis from many editors -- Magioladitis (talk) 09:28, 24 January 2010 (UTC)


 * I don't know. Probably few fixes are included or partially-included, but it is far from being a problem. Can I start with some test edits so we can see if AWB editors are really able to correct any error on every page on daily basis? -- Basilicofresco  (msg) 12:05, 24 January 2010 (UTC)

Info: This is what AWB can do atm. -- Magioladitis (talk) 15:23, 9 February 2010 (UTC) ...and some more. Basilicofresco, very good ideas! PS Better poke someone from BAG to get approved. -- Magioladitis (talk) 19:46, 9 February 2010 (UTC)

Update: I'm performing additional tests in order to further improve the above collection. I will soon ping a BAG operator. -- Basilicofresco  (msg) 09:41, 12 February 2010 (UTC)

Useless piping
I checked the 2009/11/28 dump and I found out that just about 7% of useless pipings (existing at that date) have been corrected since its creation (2 months and 15 days ago). It means that AWB "daily basis" fixing is simply not enough. Many of you criticized the first proposal (usless piping removal only) because "cosmetic only". Ok, but now there is a whole collection of fixes and adding also a useless piping removal imho seems appropriate and balanced. See also Piped link. Is there any objection? -- Basilicofresco  (msg) 16:07, 12 February 2010 (UTC) Basilicofresco (msg) 16:07, 12 February 2010 (UTC)
 * Useless piping (improved):
 * Sidescan sonar --> Sidescan sonar = Sidescan sonar --> Sidescan sonar
 * Sidescan sonar --> Sidescan sonar = Sidescan sonar --> Sidescan sonar
 * sidescan sonar --> sidescan sonar = sidescan sonar --> sidescan sonar
 * Breakfast of Champions --> Breakfast of Champions = Breakfast of Champions --> Breakfast of Champions
 * "Breakfast of Champions" --> "Breakfast of Champions" = "Breakfast of Champions" --> "Breakfast of Champions" - also with and  
 * "Breakfast of Champions" --> "Breakfast of Champions" = "Breakfast of Champions" --> "Breakfast of Champions" - also with and  
 * other minor variants


 * I support these fixes if the number is so big. -- Magioladitis (talk) 16:13, 12 February 2010 (UTC)
 * Info: that's what AWB can do v.5.0.1.0 (rev. 6203). -- Magioladitis (talk) 10:51, 13 February 2010 (UTC)

Basilicofresco, where will your list of exceptions be located? Will be updated manually or automatically? Can other editors update it? -- Magioladitis (talk) 10:53, 13 February 2010 (UTC)


 * Exceptions were located on my userspace on it.wikipedia, but they are now also present here. I'm going to systematically check for new exclusions among page names on the fresh dump before every run (1 per month or less). Obiouvsly suggestions are always welcome. -- Basilicofresco  (msg) 18:47, 13 February 2010 (UTC)

BAGAssistanceNeeded


 * I think the majority of your fixes are good (i.e., the ones that change the way the page appears or the ones that change the final target of the link), and I would be interested in approving this bot for a trial. However, I'm not so sure if certain fixes you mentioned above are necessary, such as the ones that only change wikicode, and do not affect the final rendered page. A bot making an edit solely for cosmetic purposes, and not to fix links that are actually broken, may be a little wasteful (e.g., Sidescan sonar → Sidescan sonar). I think it would be best to stick with external links/wikilinks/internal link conversions, and avoid fixing useless piping for now. &mdash; The   Earwig   @  01:43, 18 February 2010 (UTC)
 * It's part of CHECKWIKI anyway. Some people solely do that. Why not a bot to save us time and effort? Of course it depends of the amount of edits done per day. We need some estimate but I don't think there were be many anyway. -- Magioladitis (talk) 07:29, 18 February 2010 (UTC)


 * Useless piping is considered a middle priority issue by checkwiki project. And, as you can see, they pointed out the problem should be corrected by "AWB, AutoEd, BOT". I checked againg the november dump with a larger sampling (900 old mistakes in the middle of the dump) and the test shows that only about 150 (17%) were corrected during the past 3 months. IMO general fixes of AWB and AutoEd are in need of help. For this reason I asked again for your opinion. Other comments are welcome. -- Basilicofresco  (msg) 08:10, 19 February 2010 (UTC)


 * Fair enough., with all fixes enabled. &mdash; The   Earwig   @  16:11, 19 February 2010 (UTC)


 * Done. The most common corrections are missing spaces near links, useless piping and wiki/interwikification of "fake" external links. -- Basilicofresco  (msg) 18:52, 20 February 2010 (UTC)


 * Results look good. Any comments/objections before this task is approved? &mdash; The   Earwig   @  22:23, 20 February 2010 (UTC)
 * Is the code soemwhere published? --Magioladitis (talk) 23:28, 20 February 2010 (UTC)
 * No, but if everything will go fine, I will probably publish it in a near future. -- Basilicofresco  (msg) 01:26, 21 February 2010 (UTC)


 * Ps. I'm going to include also namespace 6 because it's useful and harmless (tested). Is it ok? -- Basilicofresco  (msg) 06:52, 22 February 2010 (UTC)

If there are no objections, I'm ready to start. -- Basilicofresco  (msg) 15:01, 24 February 2010 (UTC)

BAGAssistanceNeeded

after reviewing the results, which provide a good example of the breadth of functionality to be exercised. Josh Parris 10:21, 25 February 2010 (UTC)


 * The above discussion is preserved as an archive of the debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA.