Wikipedia talk:WPCleaner/Archive 2021

Odd edit by WPCleaner
✅

Hi NicoV. I was fixing links to Touch (TV series) and something odd happened with a fix to 24 (TV series). As you can see there was a single successful edit followed by a further edit (by me, and with the WPCleaner message and tag), that deleted a large chunk from the end of the article. Someone fortunately immediately reverted it, but I wonder if you know a cause or reason for the mistake? I'm using WPCleaner version 2.04 (Nov 11, 2020). As far as I'm aware this has never happened to me before. Thanks! Tassedethe (talk) 23:05, 6 March 2021 (UTC)
 * Hi Tassedethe. That's the first time I see this problem, that's really strange. I don't see how my code can do this. If there's a way to retrieve a list of edits tagged by WPCleaner that removed a large chunck of text (maybe with quarry), I will check a few of them to see if it's a glitch on one edit. Only idea for the moment : the size of the page with the chunck removed is just a bit under 64KB in size, so maybe it's combined with a network problem. --NicoV (Talk on frwiki) 08:38, 7 March 2021 (UTC)
 * Hi Tassedethe. I tried a first query to retrieve edit havings removed more than 50KB of text: apart from your edit, only two old edits which seem normal 1 and 2. I will try with smaller difference size. --NicoV (Talk on frwiki) 08:48, 7 March 2021 (UTC)
 * Hi Tassedethe. I modified the query to find edits that removed more thant 10KB of text, with an article still above 50KB (I believe it's a network issue, so the article needs to be big): apart from your edit, 20 other edits, see the query. They seem normal: 1, 2, 3, 4, 5, 6, 7, 8, 9... (I haven't check the ones before 2020). I suggest to consider this a temporary glitch and watch for potential future problems. --NicoV (Talk on frwiki) 09:08, 7 March 2021 (UTC)
 * Thanks NicoV. If you can't find another example it would be hard to work out what exactly went wrong. I'll keep a look out for any further problems. Thanks! Tassedethe (talk) 17:30, 7 March 2021 (UTC)
 * Are you by any chance using WPCleaner on Linux? I had a problem with AWB, which I suspect was related to 64k chunks, which I solved by upgrading Wine. Certes (talk) 10:48, 7 March 2021 (UTC)
 * No, this is Windows 10. And there was nothing different about this particular edit compared with the 1000s I've done previously. Thanks. Tassedethe (talk) 17:30, 7 March 2021 (UTC)
 * Certes I wouldn't advise using Wine to run WPCleaner on Linux: it's a Java program, so it should run without any Windows adapter. --NicoV (Talk on frwiki) 19:08, 7 March 2021 (UTC)
 * Oops, thanks! I occasionally run WPCleaner on Linux but haven't looked at the internals; it just works. Certes (talk) 20:02, 7 March 2021 (UTC)
 * Note that WPCleaner for Linux is close to WPCleaner for Mac as both are UNIX-based.Johnny Au (talk/contributions) 00:18, 1 April 2021 (UTC)
 * Hi Tassedethe. I've opened T281303, because I see sometimes strange edits by my bot that look similar. This time, I have a log on my side where I see that the API has returned an error for the first edit (but still saved the page without the end of the text), so WPCleaner retried 30s later and this time the edit worked. --NicoV (Talk on frwiki) 21:02, 27 April 2021 (UTC)
 * Thanks for the update! Tassedethe (talk) 21:59, 27 April 2021 (UTC)
 * Thanks as well! Johnny Au (talk/contributions) 00:57, 1 May 2021 (UTC)
 * Hi NicoV. I wonder if there has been any update on this issue? The ticket you opened (T281303) seems to have been closed but nothing was fixed. One of my fixes yesterday caused about 80% of the article to be removed. Fortunately that was reverted by another user. I've since checked several 1000 edits for the last few days, and found another example. This edit deleted a large part of Choral symphony, but the very next edit  restored it??? Both edits  were tagged as WPCleaner. Hopefully maybe there's something there that will help you identify the issue. Thanks, Tassedethe (talk) 20:35, 1 August 2022 (UTC)
 * Hi Tassedethe. No update on this issue, I'm overloaded with work, and unfortunately have very little time to spend on WPCleaner recently. Yes, the phabricator ticket was deemed as not being a bug on MW side (on which I disagree...). I have some potential fixes to try, but they require some work on my part... I'm more accustomed to WPCleaner fixing this problem by itself: WPCleaner is aware there was an error when saving the edit, so it tries again automatically after a few dozen of seconds (like 3 times).
 * But maybe you can help me with more detailed information about the edits you're reporting: in the folder where WPCleaner is installed on your computer, you should see a  file (logs from the day you last used WPCleaner) and   files (one compressed file per day). Could you look into theses files (.gz files are compressed with gzip) to find the logs about the articles with problems, and post them here with surrounding logs? At least, I should see what errors were reported by WPCleaner. --NicoV (Talk on frwiki) 20:47, 1 August 2022 (UTC)


 * Tassedethe (talk) 21:28, 1 August 2022 (UTC)
 * Those seemed to be the relevant lines for the first issue. The second is below:


 * Tassedethe (talk) 21:42, 1 August 2022 (UTC)
 * Thanks Tassedethe.
 * So the second one is the same as T281303: WPCleaner receives a Service Unavailable error when trying to update the page (but MW still saves the page even if the connexion was broken in the middle: that's a bug in MW code IMHO...), it then tries again 30s after the error. This time it gets a Network is unreachable error (which I think confirms that you were having a connexion problem at the time), so it tries again 30s after the second error and apparently succeeds this time. This is consistent with the page history.
 * For the first one, do you have more logs around 19:26 and 19:06? Apparently, the logs say you tried to edit the page at 19:26 but it was rejected with an  (error code returned by MW), so it shouldn't have been saved by MW. But the page history shows an edit at xx:26 (the removal of the end of the article) and one at xx:06. So I'm wondering how the xx:26 edit can have been both saved and rejected as an edit conflict at the same time.
 * I think the 19.06 edit is a red herring, I think I fixed 1 out of 2 links at 19.06. That's why there was a 2nd attempt at 19.26. This is the 19.06 part of the log:

19:06:04.718 [AWT-EventQueue-0] INFO PERF - Slow regular expression (Music on The O.C.): Typo AWB dies/died(76455ms):(?<=\b(?:brothers?|c(?:hild(?:ren)?|ousins?)|daughters?|f(?:athers?|riends?)|grand(?:child(?:ren)?|daughters?|fathers?|mothers?|parents?|sons?)|He|h(?:e|usbands?)|mothers?|n(?:ephews?|ieces?)|parents?|s(?:he|isters?|ons?|pouses?|tep(?:child(?:ren)?|daughters?|fathers?|mothers?|parents?|sons?)|tudents?)|[A-Z][a-z]+|She|[tT]hey|wi(?:fe|ves))\s+)(?:sadly\s+)?(?:pass(?:e([ds]))?\s+away|lose(s)?\s+(?:their|h(?:er|is)(?:\s+or\s+h(?:er|is)|[/\\]h(?:er|is))?)\s+li(?:fe|ves))(?! from earthly existence) 19:06:46.153 [AWT-EventQueue-0] INFO PERF - Slow regular expression (Music on The O.C.): Typo AWB Commercially(41211ms):(?<![a-z]+-)\b([cC])ommerciall?y-(?=[a-z]+(?:ble\b|ed\b|ful\b))(?![a-z]+-) 19:06:47.680 [Thread-27] INFO API - POST https://en.wikipedia.org/w/api.php?summary=v2.04 - Repaired 1 link to disambiguation page - (You can help) - Sam Roberts, 1 to be fixed - Sam Roberts&minor=&bot=&format=xml&starttimestamp=2022-08-01T00:57:11Z&title=Music on The O.C.&tags=WPCleaner&basetimestamp=2022-05-05T03:37:25Z&assert=user&action=edit&text=...&watchlist=nochange


 * I don't think there is much else in the logs around 19:26 but here is more around that time:

19:23:00.368 [Thread-36] INFO PERF - Slow regular expression (The Ongoing History of New Music): Typo AWB Commercially(217386ms):(?<![a-z]+-)\b([cC])ommerciall?y-(?=[a-z]+(?:ble\b|ed\b|ful\b))(?![a-z]+-) 19:23:17.439 [AWT-EventQueue-0] INFO PERF - Slow regular expression (Ontario Place): Typo AWB Commercially(45014ms):(?<![a-z]+-)\b([cC])ommerciall?y-(?=[a-z]+(?:ble\b|ed\b|ful\b))(?![a-z]+-) 19:24:40.750 [AWT-EventQueue-0] INFO PERF - Slow regular expression (University of British Columbia): Typo AWB dies/died(223029ms):(?<=\b(?:brothers?|c(?:hild(?:ren)?|ousins?)|daughters?|f(?:athers?|riends?)|grand(?:child(?:ren)?|daughters?|fathers?|mothers?|parents?|sons?)|He|h(?:e|usbands?)|mothers?|n(?:ephews?|ieces?)|parents?|s(?:he|isters?|ons?|pouses?|tep(?:child(?:ren)?|daughters?|fathers?|mothers?|parents?|sons?)|tudents?)|[A-Z][a-z]+|She|[tT]hey|wi(?:fe|ves))\s+)(?:sadly\s+)?(?:pass(?:e([ds]))?\s+away|lose(s)?\s+(?:their|h(?:er|is)(?:\s+or\s+h(?:er|is)|[/\\]h(?:er|is))?)\s+li(?:fe|ves))(?! from earthly existence) 19:25:41.765 [AWT-EventQueue-0] INFO PERF - Slow regular expression (Music on The O.C.): Typo AWB dies/died(73534ms):(?<=\b(?:brothers?|c(?:hild(?:ren)?|ousins?)|daughters?|f(?:athers?|riends?)|grand(?:child(?:ren)?|daughters?|fathers?|mothers?|parents?|sons?)|He|h(?:e|usbands?)|mothers?|n(?:ephews?|ieces?)|parents?|s(?:he|isters?|ons?|pouses?|tep(?:child(?:ren)?|daughters?|fathers?|mothers?|parents?|sons?)|tudents?)|[A-Z][a-z]+|She|[tT]hey|wi(?:fe|ves))\s+)(?:sadly\s+)?(?:pass(?:e([ds]))?\s+away|lose(s)?\s+(?:their|h(?:er|is)(?:\s+or\s+h(?:er|is)|[/\\]h(?:er|is))?)\s+li(?:fe|ves))(?! from earthly existence) 19:26:23.455 [AWT-EventQueue-0] INFO PERF - Slow regular expression (Music on The O.C.): Typo AWB Commercially(41512ms):(?<![a-z]+-)\b([cC])ommerciall?y-(?=[a-z]+(?:ble\b|ed\b|ful\b))(?![a-z]+-) 19:26:24.670 [Thread-97] INFO API - POST https://en.wikipedia.org/w/api.php?summary=v2.04 - Repaired 1 link to disambiguation page - (You can help) - Sam Roberts&minor=&bot=&format=xml&starttimestamp=2022-08-01T01:19:31Z&title=Music on The O.C.&tags=WPCleaner&basetimestamp=2022-08-01T01:06:50Z&assert=user&action=edit&text=...&watchlist=nochange 19:26:30.193 [Thread-97] WARN o.w.api.impl.MediaWikiAPI - Error reported: editconflict - Edit conflict. 19:26:30.193 [Thread-97] WARN o.w.api.impl.MediaWikiAPI - Error updating page Music on The O.C. 19:27:17.477 [AWT-EventQueue-0] INFO  PERF - Slow regular expression (Ontario Place): Typo AWB dies/died(50544ms):(?<=\b(?:brothers?|c(?:hild(?:ren)?|ousins?)|daughters?|f(?:athers?|riends?)|grand(?:child(?:ren)?|daughters?|fathers?|mothers?|parents?|sons?)|He|h(?:e|usbands?)|mothers?|n(?:ephews?|ieces?)|parents?|s(?:he|isters?|ons?|pouses?|tep(?:child(?:ren)?|daughters?|fathers?|mothers?|parents?|sons?)|tudents?)|[A-Z][a-z]+|She|[tT]hey|wi(?:fe|ves))\s+)(?:sadly\s+)?(?:pass(?:e([ds]))?\s+away|lose(s)?\s+(?:their|h(?:er|is)(?:\s+or\s+h(?:er|is)|[/\\]h(?:er|is))?)\s+li(?:fe|ves))(?! from earthly existence) 19:27:27.584 [AWT-EventQueue-0] INFO PERF - Slow regular expression (University of British Columbia): Typo AWB Commercially(166464ms):(?<![a-z]+-)\b([cC])ommerciall?y-(?=[a-z]+(?:ble\b|ed\b|ful\b))(?![a-z]+-) 19:27:29.159 [Thread-50] INFO API - POST https://en.wikipedia.org/w/api.php?summary=v2.04 - Repaired 1 link to disambiguation page - (You can help) - Sam Roberts&minor=&bot=&format=xml&starttimestamp=2022-08-01T00:56:09Z&title=University of British Columbia&tags=WPCleaner&basetimestamp=2022-07-18T05:51:23Z&assert=user&action=edit&text=...&watchlist=nochange 19:27:55.443 [AWT-EventQueue-0] INFO PERF - Slow regular expression (Ontario Place): Typo AWB Commercially(37767ms):(?<![a-z]+-)\b([cC])ommerciall?y-(?=[a-z]+(?:ble\b|ed\b|ful\b))(?![a-z]+-) 19:27:56.069 [Thread-98] INFO API - POST https://en.wikipedia.org/w/api.php?summary=v2.04 - Repaired 1 link to disambiguation page - (You can help) - Sam Roberts / Fix errors for CW project (Whitespace characters after heading)&minor=&bot=&format=xml&starttimestamp=2022-08-01T01:19:33Z&title=Ontario Place&tags=WPCleaner&basetimestamp=2022-07-31T01:29:36Z&assert=user&action=edit&text=...&watchlist=nochange 19:27:56.193 [MW-21] INFO API - GET  https://en.wikipedia.org/w/api.php?curtimestamp=1&continue=&prop=revisions|info&inprop=protection&format=xml&rvslots=main&action=query&titles=MapleMusic Recordings&rvprop=content|ids|timestamp 19:27:56.193 [Thread-99] INFO API - GET  https://en.wikipedia.org/w/api.php?continue=&prop=pageprops|info&format=xml&action=query&generator=links&gpllimit=max&titles=MapleMusic Recordings&ppprop=disambiguation&gplnamespace=0 19:27:56.930 [Thread-99] INFO API - GET  https://en.wikipedia.org/w/api.php?redirects=&continue=&prop=pageprops&format=xml&action=query&titles=Boots Electric|Brick and mortar business|Gordon Downie|J Roddy Walston and the Business|Juno Awards|List of record labels|MBL (identifier)|Mariachi El Bronx|Neverending White Lights|Prozzak|Sam Roberts|The Bees (UK band)|The Fireman (music)|Universal Music Canada&ppprop=disambiguation 19:27:57.112 [Thread-99] INFO API - GET  https://en.wikipedia.org/w/api.php?continue=&prop=revisions&format=xml&rvslots=main&action=query&titles=Boots Electric|Brick and mortar business|Gordon Downie|J Roddy Walston and the Business|Juno Awards|List of record labels|MBL (identifier)|Mariachi El Bronx|Neverending White Lights|Prozzak|Sam Roberts|The Bees (UK band)|The Fireman (music)|Universal Music Canada&rvprop=content 19:27:57.381 [MW-7] INFO API - GET  https://en.wikipedia.org/w/api.php?continue=&prop=pageprops|info&format=xml&action=query&generator=links&gpllimit=max&titles=Samuel Roberts&ppprop=disambiguation 19:27:59.667 [Thread-99] INFO PERF - Slow regular expression (MapleMusic Recordings): Typo AWB dies/died(1758ms):(?<=\b(?:brothers?|c(?:hild(?:ren)?|ousins?)|daughters?|f(?:athers?|riends?)|grand(?:child(?:ren)?|daughters?|fathers?|mothers?|parents?|sons?)|He|h(?:e|usbands?)|mothers?|n(?:ephews?|ieces?)|parents?|s(?:he|isters?|ons?|pouses?|tep(?:child(?:ren)?|daughters?|fathers?|mothers?|parents?|sons?)|tudents?)|[A-Z][a-z]+|She|[tT]hey|wi(?:fe|ves))\s+)(?:sadly\s+)?(?:pass(?:e([ds]))?\s+away|lose(s)?\s+(?:their|h(?:er|is)(?:\s+or\s+h(?:er|is)|[/\\]h(?:er|is))?)\s+li(?:fe|ves))(?! from earthly existence)

Tassedethe (talk) 15:01, 2 August 2022 (UTC)


 * I can also email you both logs if that would help? Tassedethe (talk) 15:01, 2 August 2022 (UTC)
 * Thanks Tassedethe. I'm puzzled by the 19:26 problem, because I don't understand how MW could have answered there was an edit conflict:
 * At 19:26:24.670, WPCleaner tries to save the page giving correct information ( which seems to be consistent with the last edit before this one, and   when you probably loaded the page)
 * At 19:26:30.193, MW answers with an edit conflit: there's no reason for that, as the basetimestamp matches the one of the previous edit (see this request) and starttimestamp is afterwards.
 * As the error reported by MW is an edit conflict, WPCleaner doesn't attempt any retry because logically there's no reason an edit conflict will disappear with a retry...
 * Ok for the logs by mail, I'm sending you a email so you have my email address. Thanks again ! --NicoV (Talk on frwiki) 17:43, 2 August 2022 (UTC)
 * Thanks Tassedethe. I tried a hack on WPCleaner, can you tell me if you're still seeing truncated articles? --NicoV (Talk on frwiki) 22:53, 2 August 2022 (UTC)
 * Thanks. I launched WPCleaner but I still see the same version, 2.04. Do I need to do something to upgrade? As before these types of errors are really rare, and I clearly need to be doing lots of edits to raise the chance of seeing one. That will probably not happen for a while as I have vacation coming up and enforced 'No Wikipedia!' imposed :) Thanks for your efforts though. Tassedethe (talk) 23:33, 2 August 2022 (UTC)
 * Hi Tassedethe. I didn't change the version, it should have upgraded transparently (if you want to be sure, check the logs for the POST lines, parameters should now be ordered alphabetically). No problem if it takes some time for you to check, this bug is opened for more than a year... --NicoV (Talk on frwiki) 05:30, 3 August 2022 (UTC)
 * I think I'm catching a good part of the problems with this query. We'll see with time if it happens again. --NicoV (Talk on frwiki) 07:02, 6 August 2022 (UTC)
 * Yikes, rather more by me than I thought. But it looks like there are a lot of 'double edits' where the 1st edit deletes a chunk and a 2nd edit restores it. Tassedethe (talk) 15:48, 6 August 2022 (UTC)
 * Hi Tassedethe. I ran again the query above, no new error for more than a month. I hope it means it's fixed... --NicoV (Talk on frwiki) 16:13, 5 September 2022 (UTC)
 * No change this time either, I mark the discussion as resolved. --NicoV (Talk on frwiki) 15:24, 29 September 2022 (UTC)
 * Thanks for all your efforts! Tassedethe (talk) 22:58, 1 October 2022 (UTC)

Installation not working
I was unable to access anything within WPCleaner earlier today. It was my first time using it since upgrading to a new computer, so I thought it was likely that my Automator script might have been broken.

Just in case, I went through the entire download process per the getdown.jar instructions, including writing a new script. Unfortunately, at launch, I'm getting the following error message:

"The digest file is invalid"

Any suggestions? Mlaffs (talk) 15:15, 4 July 2021 (UTC)
 * Hello Mlaffs. Looking at your last edits with WPCleaner, you were using an old version of WPCleaner (2.0) which cannot update itself (see Wikipedia_talk:WPCleaner/Archive_2020). Automator means OSX ? If so, Johnny Au seems to be on OSX also and it worked for him. I think that on OSX you need JDK 10 at least (otherwise there are problems). --NicoV (Talk on frwiki) 17:03, 4 July 2021 (UTC)
 * Hi Mlaffs. Can you also check the contents of getdown.txt (it should contain in particular a line ) and digest.txt (it should contain lines like  ). --NicoV (Talk on frwiki) 18:17, 4 July 2021 (UTC)
 * Thanks, NicoV. I deleted all the WPCleaner folders and files that I had and started fresh. The two text files do have those lines in them. Seems to be working now! Mlaffs (talk) 18:21, 4 July 2021 (UTC)
 * Hi Mlaffs. If you started with an existing folder, it can go wrong (keeping somewhere the link to the old installation server). --NicoV (Talk on frwiki) 19:55, 4 July 2021 (UTC)
 * As an update, I am using MacOS Big Sur and WPCleaner is working perfectly while using an Automator script to open it. Johnny Au (talk/contributions) 03:09, 5 July 2021 (UTC)

Problems caused by the automatic CW fixes
Hi NicoV. Another editor (User:Find bruce) posted to my talk about edits User_talk:Tassedethe I has made which broke some transcluded pages. These edits had been automatically applied by WPCleaner. The edits in question are in that talk comment, but are and. Can you look into fixing this issue? Also is it possible to switch off the automatic application of fixes? Many thanks. Tassedethe (talk) 19:10, 15 December 2021 (UTC)
 * Hi Tassedethe. I've fixed the first problem: when  (or similar) is present in the article, there will be no detection of title linked in text, and no fix. I will look into the other one.
 * If you want to disable some automatic application of fixes, you can create your own configuration page (like User:WikiCleanerBot/WikiCleanerConfiguration) and add configuration options like  to disable automatic fixes for one type of error.
 * --NicoV (Talk on frwiki) 12:28, 19 December 2021 (UTC)
 * Thanks for the advice. Johnny Au (talk/contributions) 01:12, 1 January 2022 (UTC)
 * , is that possible to make a list of false positives articles like this? which will skip the list of articles for WPCleaner. Warm Regards, ZI Jony  (Talk) 13:52, 26 March 2022 (UTC)
 * . It's currently possible but error by error, by creating lists like WikiProject Check Wikipedia/Error 002 whitelist, which is configured in WikiProject Check Wikipedia/Translation with . --NicoV (Talk on frwiki) 18:38, 27 March 2022 (UTC)
 * Thanks. Johnny Au (talk/contributions) 00:20, 1 April 2022 (UTC)