Wikipedia:Bots/Requests for approval/Josvebot 3


 * The following discussion is an archived debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA. The result of the discussion was Symbol keep vote.svg Approved

Josvebot 3
Operator:

Time filed: 19:06, Wednesday November 6, 2013 (UTC)

Automatic, Supervised, or Manual: Automatic

Programming language(s): WPCleaner

Source code available: Yes at sourceforge.net

Function overview: Fixing articles with Unicode control characters (CHECKWIKI-error #16)

Links to relevant discussions (where appropriate):

Edit period(s): Daily

Estimated number of pages affected: About 1-500 per day.

Exclusion compliant (Yes/No): I don't know. I have not written the program and have not yet encounted this problem.

Already has a bot flag (Yes/No): No

Function details: Tha bot will fix article with unicode control characters and migh also errors #2, 6-7, 9. 17-22, 25, 32, 44-45, 54, 57, 64, 66, 76, 85, 87-88 if the article in question has any of thes errors (inluding the Article with Unicode control characters-error). I will file for approval for those CHECKWIKI-errors also.

Discussion
What characters are to be fixed, and why? Josh Parris 21:22, 6 November 2013 (UTC)
 * E.g. #xFEFF; or #x200E; or #x200B; or #x2028 since they can create problems insida an article. - (t)  Josve05a  (c)  21:51, 6 November 2013 (UTC)
 * Unicode Character 'ZERO WIDTH NO-BREAK SPACE' (U+FEFF)
 * Unicode Character 'LEFT-TO-RIGHT MARK' (U+200E)
 * Unicode Character 'ZERO WIDTH SPACE' (U+200B)
 * Unicode Character 'LINE SEPARATOR' (U+2028)
 * Seem pretty innocuous to me; what's the problem with them? Why aren't they locked out with an edit filter? Josh Parris 22:40, 6 November 2013 (UTC)
 * I don'r relly know (quite francly), but here are two articles showing diffs: & . I can't see any change, but it is listed as an CHECKWIKI-error, so it most be something wrong. -  (t)  Josve05a  (c)  22:54, 6 November 2013 (UTC)

@, you do a lot of the kinds of edits. Do you know what the problem is with these kind of Unicode characters? - (t)  Josve05a  (c)  22:56, 6 November 2013 (UTC)
 * It is a CHECKWIKI error which it can't be fixed sufficiently by AWB so I support the creation of an additional bot. Invisible characters could be used to abuse urls, make editing more difficult and we clean them up daily. There are about 100 pages per day, not more. -- Magioladitis (talk) 09:50, 7 November 2013 (UTC)

In that case, Josh Parris 10:04, 7 November 2013 (UTC)

We already have 3 bots for the other errors. What we need is more editors doing manual edits. -- Magioladitis (talk) 10:54, 7 November 2013 (UTC)
 * 1/5 is done. Will do more as soon as more is reported (and I am at at computer) at the CHECKWIKI-database. - (t)  Josve05a  (c)  11:24, 7 November 2013 (UTC)

While I don't object to the task per se, the operator's understanding is a concern. "I don't really know" and "must be something wrong" is not the operator's expected understanding of the task they are proposing to do. Why not familiarize with the task first? If you don't even know what it does, how will you know when it does it wrong? — HELL KNOWZ  ▎TALK 12:42, 7 November 2013 (UTC)
 * I have to agree with that. The bot owner should also be able to solve problems by themselves at some level. Running automated tools blindly doesn't make me feel comfortable. We had problems in the past with some interwiki bot owners. -- Magioladitis (talk) 08:41, 8 November 2013 (UTC)

- (t)  Josve05a  (c)  07:38, 8 November 2013 (UTC)
 * Any problems with the trial? Josh Parris 10:29, 8 November 2013 (UTC)
 * No, I did not have any problems. (Except that the database with all the CHECKWIKI only updates once a day.) - (t)  Josve05a  (c)  11:05, 8 November 2013 (UTC)
 * I found this edit interesting, in such that the source of the error was the previous editor. Did you try to figure out how the character got in there? It definitely was there:

~$ echo "of India‎]]" | od -x 0000000 666f 4920 646e 6169 80e2 5d8e 0a5d 0000016 ~$ echo "of India]]" | od -x 0000000 666f 4920 646e 6169 5d5d 000a 0000013
 * but it didn't cause any categorisation problems. Do you think it should have been made?
 * The fixes and  were helpful, fixing what were 404s. Josh Parris 11:40, 8 November 2013 (UTC)

Hmmm. That's really interesting, I did not pay much attention to that edit before. My answer is...yes. I believe that that kind of chrectors should be changed, since they can cause errors in diffrent browsers and platforms. Better to be safe than sorry. - (t)  Josve05a  (c)  12:03, 8 November 2013 (UTC)

In the past interwiki bots could or could not distinguish between interwikis with invisible characters. The result was duplicated interwikis. I had to go and rename dozens of pages in other wikipedias to avoid the problem. I have reported the problem at bugzilla. Same problem could be possible for categories. Especially for mirror sites. We could prevent many of these characters if HotCat was automatically ignoring invisible characters. -- Magioladitis (talk) 12:10, 8 November 2013 (UTC)


 * Okay, so I come back to my question: Why aren't they locked out with an edit filter? Josh Parris 22:57, 8 November 2013 (UTC)
 * Good question. I already asked something similar in bugzilla (here). -- Magioladitis (talk) 00:42, 9 November 2013 (UTC)
 * Let's see how Edit filter/Requested works out then. Josh Parris 01:12, 9 November 2013 (UTC)
 * Not well. Josh Parris 22:27, 16 November 2013 (UTC)
 * I suggest you get these characters added to https://meta.wikimedia.org/wiki/Talk:Title_blacklist Josh Parris 01:17, 9 November 2013 (UTC)

On the basis that these characters aren't going to be prevented from entering the wikitext, I'm planning on approving this task. Josh Parris 22:27, 16 November 2013 (UTC)
 * I agree. -- Magioladitis (talk) 22:29, 16 November 2013 (UTC)

Josh Parris 09:08, 17 November 2013 (UTC)
 * The above discussion is preserved as an archive of the debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA.