Wikipedia:Bots/Requests for approval/Yobot 38


 * The following discussion is an archived debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA. The result of the discussion was

Yobot 38
Operator:

Time filed: 10:22, Thursday, February 2, 2017 (UTC)

Automatic, Supervised, or Manual: Automatic

Programming language(s): AWB / WPCleaner

Source code available:

Function overview: Fix broken br tags

Links to relevant discussions (where appropriate):

Edit period(s): Daily

Estimated number of pages affected: 30 pages per day + some more pages coming for the monthly scans

Exclusion compliant (Yes/No): Yes

Already has a bot flag (Yes/No): Yes

Function details: Bot will fix code syntax by fixing broken br tags. E.g.  to . I will catch all these

Find: insource:/\< *br\. *\>|\<\\ *br *\>|\< *br *\\ *\>|\< *br\. */\>|\< *br */([a-z/0-9•]|br)\>|\< *br *\?\>|\/

Replace:

Discussion
This is not clear enough for me to understand exactly what will be fixed. There are broken tags such as "&lt;div\>" and then there are "broken" tags such as "&lt;br>". The former is not really a tag at all; the latter is fine and will be "fixed" automatically by Mediawiki when the page is rendered. So it might make sense for a bot to fix the former, but the latter are not harmful and can be left as-is. To make this an approvable task, it should include a detailed list of the specific errors that will be fixed. &mdash; Carl (CBM · talk) 12:17, 2 February 2017 (UTC)


 * This is too broad as written:

Example: Without context this is too error prone, I've seen it with Fluxbot Task 6 - editors find spectacular ways to put these tags in wrong. — xaosflux  Talk 12:28, 2 February 2017 (UTC)
 *  to

Self closing tags are not allowed. -- Magioladitis (talk) 13:08, 2 February 2017 (UTC)


 * I agree we'd need to see a list of fixes here. ~ Rob 13 Talk 13:17, 2 February 2017 (UTC)

You mean a trial I presume. -- Magioladitis (talk) 13:18, 2 February 2017 (UTC)

Recall that CBM is in favour of not changing html when it renders correctly even if the tags are wrong. -- Magioladitis (talk) 13:19, 2 February 2017 (UTC)
 * "Wrong" can mean many things. None of "&lt;br>", "&lt;br/>", or "&lt;br />" (with a space) is objectively wrong when put in wikicode - they all cause the same HTML to be generated, and so they have no difference for browsers, screen readers, or any other use of that HTML. And even if we were talking about HTML, none of them is really "wrong" depending on the HTML standard you're trying to meet (see and  for example). So there is no strong justification for a bot to "fix" those. There may be other instances where the fix would be more desirable. &mdash; Carl (CBM · talk) 13:33, 2 February 2017 (UTC)

what about "&lt;br //>" or "&lt;br /w>" ? -- Magioladitis (talk) 13:34, 2 February 2017 (UTC)


 * It seems to me that it's up to you to document exactly what fixes you want to propose in this task, and then others can comment on them. &mdash; Carl (CBM · talk) 13:37, 2 February 2017 (UTC)

Regex:< *br\. *>|<\\ *br *>|< *br *\\ *>|< *br\. */>|< *br */([a-z/0-9•]|br)>|< *br *\?>| -- Magioladitis (talk) 13:38, 2 February 2017 (UTC)


 * That is the entire list of changes? No span, div, etc? For that regex, what will it be replaced with, the HTML5-correct &lt;br> or the XML-correct &lt;br/>?  There isn't enough info here for me to tell if an edit would be correct under this request. &mdash; Carl (CBM · talk) 13:41, 2 February 2017 (UTC)
 * Same for span and div. no changes to br with slash. -- Magioladitis (talk) 13:42, 2 February 2017 (UTC)
 * That response is not very clear. All three of "&lt;br>", "&lt;br/>" and "&lt;br />" are arguably correct; the first the the HTML5 recommendation. None of them require changing. Would the bot change any of those three specific instances of the br tag? &mdash; Carl (CBM · talk) 13:51, 2 February 2017 (UTC)
 * No. And it never did. I thought you were watching my edits more closely. -- Magioladitis (talk) 13:53, 2 February 2017 (UTC)
 * This bot request is not about previous edits. It makes no difference what the bot did in the past - everything that the bot will do needs to be clearly detailed here, so that there is a clear record of what was approved. &mdash; Carl (CBM · talk) 13:56, 2 February 2017 (UTC)
 * True. Soon br self closing will be deprecated. Then we will run a bot to replace it. But it is better to do this separatelly. -- Magioladitis (talk) 13:58, 2 February 2017 (UTC)


 * I don't think this task is well suited to running fully automated, basically: if the change that is being made isn't the same as would be made by an knowledgeable human editor it shouldn't be made. Blindly changing a tag that is broken to one is not "broken" by only a regex is prone to false positives. For example:
 * is certainly broken, but this fix is not to change that first span to .  How are you planning on addressing this? —  xaosflux  Talk 14:12, 2 February 2017 (UTC)


 * I won't. changed my example. -- Magioladitis (talk) 14:31, 2 February 2017 (UTC)
 * I see that one example, but your function details are not reflective of that. Please update this to fully detail everything this is expected to do. —  xaosflux  Talk 14:47, 2 February 2017 (UTC)

. For self-closing tags. It is done here: Wikipedia_talk:WikiProject_Check_Wikipedia. -- Magioladitis (talk) 18:47, 2 February 2017 (UTC)
 * That discussion says "There's a total of 72 in articles. " SInce then, they have all been fixed - there is an empty tracking category now at Category:Pages_using_invalid_self-closed_HTML_tags. That says to me that there is no need for an ongoing bot task for them; there are many errors that can be made, and in general they can just be fixed by the usual editing process, unless there are so many that a bot is needed to handle the volume. &mdash; Carl (CBM · talk) 19:52, 2 February 2017 (UTC)
 * This is because Jonesey95 and other have been doing this semi-automatically i.e. more watchlist turbulence. -- Magioladitis (talk) 19:55, 2 February 2017 (UTC)
 * I am very aware of all the self closing tag problems related to Category:Pages using invalid self-closed HTML tags, I've been running bot jobs all over WMF (meta:User:Fluxbot/BADHTML) projects cleaning it up - that is why I know that you can't just change any broken tag automatically to a best guess without looking at it in context. in many cases the bad tag (e.g. ) is actually intended to be a start tag, not a close tag. —  xaosflux  Talk 20:00, 2 February 2017 (UTC)
 * that's why I changed my BRFa to reflect CHECKWIKI error 2 which does not include self-closed HTML tags. In the future we can discuss the br/ case seperatelly. -- Magioladitis (talk) 20:05, 2 February 2017 (UTC)

For CHECKWIKI error 2, how would you handle these strings programmatically:
 * The checkwiki description does not give explicit examples of these cases that it says are included. — xaosflux  Talk 20:34, 2 February 2017 (UTC)
 * The checkwiki description does not give explicit examples of these cases that it says are included. — xaosflux  Talk 20:34, 2 February 2017 (UTC)
 * The checkwiki description does not give explicit examples of these cases that it says are included. — xaosflux  Talk 20:34, 2 February 2017 (UTC)

I won't deal with self-closing HTML tags. -- Magioladitis (talk) 20:52, 2 February 2017 (UTC)
 * OK how about this, please provide an exact list of all of the substitutions you want to make below. — xaosflux  Talk 22:15, 2 February 2017 (UTC)
 * And then copy it to the "Function details". Anomie⚔ 02:23, 3 February 2017 (UTC)
 * Done but I already have done it above while replying to Carl. -- Magioladitis (talk) 09:32, 3 February 2017 (UTC)
 * You say you'll "catch" several variations. For br you'd presumably replace them with or . But then you say "Same for div and span", but there it does matter what you replace it with and "catching"   seems questionable since it matches  . Anomie⚔ 12:59, 3 February 2017 (UTC)
 * I'm not a huge expert on HTML rendering, so would you mind explaining how the broken tags render on a page? Do they appear as just gibberish or are they non-rendering tags? ~ Rob 13 Talk 11:14, 3 February 2017 (UTC)

Anomie Changed description. I will only fix br tags. -- Magioladitis (talk) 01:28, 5 February 2017 (UTC)


 * The function details still say that the task will cover "broken br, span, div, etv. tags." - it has not been updated. Separately, it appears your regex would match &lt;br/> which is acceptable under HTML5; see 8.1.2.1 "Start tags" in the spec . Can you confirm that no change will happen to "&lt;br/>" or &lt;br />", or link to a community discussion that established consensus to change these. &mdash; Carl (CBM · talk) 02:44, 5 February 2017 (UTC)

which part catches "&lt;br/>"? I won't fix these. -- Magioladitis (talk) 08:33, 5 February 2017 (UTC)

Please make it clear in the function details what the result of the fixes is. Despite numerous comments and requests above, you still have not posted the actual replacement result of the regex. — HELL KNOWZ  ▎TALK 17:52, 12 February 2017 (UTC)

Done. Thanks for the feedback! I may use either F&R or the AWB's built-in function. Hopefully WPClenaer will have a built-in function too. In the future I may switch to this too. WPCleaner does not allow F&R rules to be added. The result will be a fixed tag. -- Magioladitis (talk) 17:53, 12 February 2017 (UTC)

Please provide a policy, guideline, or discussion with consensus that broken or invalid markup break tags should be replaced by  and not   or , regardless of original intention or dominant style in the article. In other words, that  →   and not   →. — HELL KNOWZ  ▎TALK 18:07, 12 February 2017 (UTC)

WP:HTML5. In fact all  should be replaced by. HTML does not support self-closed tags. I'll do this with separate BRFA. In fact, we could do this in addition to this one.

See also.

Also note that this fix is part of CHECKWIKI project. A project with consensus between Wikipedians.

AWB's built-in functions change the tags to br with no slash. Same does WPCleaner. -- Magioladitis (talk) 18:10, 12 February 2017 (UTC)


 * Either do not modify the tag syntax (which is what I recommend) or provide consensus that the community thinks this should be done. I do not wish to repeat the same comments as above, but HTML5 ≠ MediaWiki markup and CHECKWIKI/AWB/WPCleaner ≠ automatic consensus for automation. WP:HTML5 says nothing about  usage. Fixing a tag is one thing, changing its syntax is another. —  HELL KNOWZ  ▎TALK 18:29, 12 February 2017 (UTC)

I won't change  until there is consensus to do it. -- Magioladitis (talk) 18:31, 12 February 2017 (UTC)
 * That is not what the function details say. In fact, the very first example is the exact opposite: " E.g.  to ". —  HELL KNOWZ  ▎TALK 18:32, 12 February 2017 (UTC)
 * Yes this change will be done. The only valid tag is  . It's still OK to use    but we should not encourage it. Note that this is the change eveyrone that use CHECKWIKI/AWB/WPCleaner does anyone and none ever complained about it.-- Magioladitis (talk) 18:35, 12 February 2017 (UTC)
 * Then, per WP:BOTPOL, provide a policy, guideline, or discussion with consensus that  are not acceptable or that broken or invalid tags should be replaced by   and not , regardless of original intention or dominant style in the article. —  HELL KNOWZ  ▎TALK 18:42, 12 February 2017 (UTC)

Is your suggestion that I use  based on BOTPOL or common logic? In which cases do you think we should convert to  ? To all listed? Worst case scenario both tags do the same thing so which to use is a matter of preference right? -- Magioladitis (talk) 18:44, 12 February 2017 (UTC)

or  ). My suggestion is to not change the existing style. If you just fix a tag -- clearly okay. If you also change the style -- provide consensus. Yes, per BOTPOL -- consensus to perform the task (of changing the style of break tags). If you don't change style -- then you don't need additional consensus. Your function details change style, thus I ask for consensus. How you determine which case is which (if at all determinable) is up to you to specify in function details and BAG will deal with this as uncontroversial, supported by existing consensus, or will ask for new consensus. —  HELL KNOWZ  ▎TALK 19:02, 12 February 2017 (UTC)
 * My suggestion is not to use just  (or

It turns you are right per Line-break_handling. I updated accordingly. -- Magioladitis (talk) 19:03, 12 February 2017 (UTC)


 * I support a trial of 20 edits or two weeks, whichever is shorter. Note that this does not necessarily indicate support of the task; the regex is complicated enough that it's best to see this in action. Even if every edit were to come back as "wrong" in some way (and I doubt they would), it would be easily fixable with so few edits. Examples would be extremely helpful in comparing diffs. ~ Rob 13 Talk 04:17, 14 February 2017 (UTC)

-- Magioladitis (talk) 23:47, 22 February 2017 (UTC)

Johnuniq please comment here too because someone above told me the exact opposite. Thanks, Magioladitis (talk) 11:20, 27 March 2017 (UTC)

MSGJ I would be more than happy to hand this task to any other bot owner. -- Magioladitis (talk) 12:12, 27 March 2017 (UTC)

Andy Dingley please read discussion. -- Magioladitis (talk) 20:22, 27 March 2017 (UTC)
 * I'm now inclining (regretfully) to opposing making any changes here (and I still feel strongly that everything should become ). The problem is the risk of multiple 'bots or AWBs starting to war with each other. Too many people think that   is somehow "right" (it isn't, it has always been wrong, even in XHTML) and that even   ought to be changed to  . Andy Dingley (talk) 20:37, 27 March 2017 (UTC)


 * No bot should do mass edits against the advice of a developer such as Tim Starling. His comment is at the end of this VPT archive and was added in diff. Tim recommended using  saying "&lt;br&gt; is valid wikitext, and whether it's valid in any particular output format or version of HTML [is] pretty much irrelevant." Johnuniq (talk) 22:47, 27 March 2017 (UTC)

Johnuniq just to be clear: This bot won't change  to  or vice versa. I personally consider both tags valid. -- Magioladitis (talk) 22:55, 27 March 2017 (UTC)
 * Whenever I hear "valid" in a discussion about "well-formedness", I do rather lose hope of that discussion even properly understanding the question. Andy Dingley (talk) 10:09, 28 March 2017 (UTC)

I agree with Rob above - I'd like to see a very limited trial so we can see how this regex works in practice. SQL Query me! 19:12, 18 May 2017 (UTC)
 * D Has the trial occurred? — xaosflux  Talk 11:46, 6 June 2017 (UTC)

-- Magioladitis (talk) 13:51, 6 June 2017 (UTC)
 * The above discussion is preserved as an archive of the debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA.