Wikipedia:Bots/Requests for approval/Cydebot 3


 * The following discussion is an archived debate. Please do not modify it. Subsequent comments should be made in a new section. The result of the discussion was Symbol neutral vote.svg Request Expired.

Cydebot
Operator: Cyde Weys

Automatic or Manually Assisted: Automatic and unsupervised

Programming Language(s): pyWikipediaBot

Function Summary: This bot will work off of a human-written list of pages that are to remain deleted, and will keep them deleted. It will also keep track of users who are re-creating pages that are supposed to remain deleted.

Edit period(s) (e.g. Continuous, daily, one time run): Continuous

Edit rate requested: This depends on how many times people try to re-create deleted pages, but presumably, it won't be a very high editing throughput.

Already has a bot flag (Y/N): Yes

Function Details: The bot works off of a human-written list of pages that are to remain deleted, and it will keep them deleted by deleting them if they are ever recreated. If someone is caught recreating a deleted page, the bot makes a report of it. The bot works off of two separate lists, one of which will be on-wiki and will be editable by any administrator. This will be useful for dealing with common everyday vandalism like the recent rash of reality-related page creations. The second list is maintained privately and will be used for sensitive issues (for example, if some troll repeatedly recreates a page in which defamation of character is contained in the page title; in this case, using deletedpage on it would only be giving the troll what he wanted). The two primary benefits of this bot are that it will prevent repeated attacks on Wikipedians that are contained in page titles, and it will reduce the amount of metadata nonsense protected deleted pages contained in article space. The private list is directly editable by me and several other highly-trusted Wikipedians, and can be added to by valid request from anyone and viewed on request by administrators.

It would make sense to run a trial period under my own sysop account first, to make sure that all of the bugs are worked out, and then go through the inevitable rigamarole of RFA later as necessary. The bot will be open source except for certain configuration parameters that will be kept secret to prevent abuse. -- Cyde Weys 22:40, 30 January 2007 (UTC)

Discussion

 * Ok, to be clear, this would have to be an admin-bot. —  xaosflux  Talk 01:21, 31 January 2007 (UTC)


 * Questions:


 * 1) Will this list be static or regex? (e.g. If you are 'guarding' Cyde is a meanie will it prevent said new-page candal from mkaing Cyde is a meanie! or others? —  xaosflux  Talk 01:21, 31 January 2007 (UTC)
 * 2) Would simply tagging the articles for CSD be sufficient here?


 * I was thinking of the list as static. One would have to be really careful with the regexes to make sure that they don't inadvertently catch something they shouldn't.  And tagging with CSD would also work, but it would take a bit longer and it'd be more inefficient.  All the pages that would be deleted would already have been identified as something that would have had deletedpage on it anyway, so skipping straight through the tagging to the deletion doesn't seem problematic.  -- Cyde Weys  15:25, 31 January 2007 (UTC)


 * It's been pointed out that most deleted articles can be guarded against creation by transcluding them on a cascade-protected page (that proposal's on VPR as of this writing), and there's a working demo of that, so the bot would be limited to the secret salt-but-don't-let-anyone-know-about-it list. Is the bot still useful despite this? Also, I'd suggest that the admin tasks of the bot are on a new account, as this wouldn't be any extra effort to do and would reduce the number of complaints that the bots' non-admin functions might mess up and start doing admin things on the adminbot account (that behaviour seems very unlikely to me, but it's sufficiently trivial to prevent by creating a new bot account that you may as well do that). --ais523 10:54, 31 January 2007 (UTC)

What is the process for getting a page off the list of secretly salted pages? I understand some article names are offensive in themselves. But, for article names that are not offensive yet make it to this list, the path doesn't seem clear yet. Let's say I'm Joe Experienced User coming here, and I create an article on Flaming Flamingos which happens to be on the secret-salt list. I spend a fair bit of effort putting it together. The article clearly passes the bar for inclusion here. I click "save" and see my results. A couple of minutes later, I notice a grammatical error and go to edit the page...and my work is all gone. Will the bot have reported to my talk page that the page was deleted, why it was deleted, and how I can go about getting it off the secret-salt list? Right now, if I speedy delete a page and it is recreated by the same user, I notify the user why it was speedy deleted and what their options are (if any). I then speedy delete it again. This tends to stop the speedy-create/speedy-delete spin cycle. Will this bot do the same?
 * +1 to that. If this bot is going to be approved, it would need to be on a dedicated account if it's an adminbot. —  xaosflux  Talk 13:05, 31 January 2007 (UTC)
 * It absolutely would indeed have to be a separate account, IMO. Writing Python code to follow the protected-edit guidelines to Turing test standards would be... well, beyond my Py-1, to say the least.  The adminbot concept certainly has some merit in this case, not least that it eliminates the adrenal gland aspects of (re-)dealing with especially vile or personalised vandalism, but it's only marginally better than the auto-tagging solution, much inferior to a server-side "super-salting" function, and is likely to cause a huge bunfight at RFA over various concerns that will inevitably go beyond the strict pros and cons of this narrowly-defined function.  The speedy-tagging-bot route seems preferable, on the whole.  After all, even an adminbot doesn't erase all trace of the article titles;  for that you'd need an oversightbot...  (Now, that'd make for an interesting arbcom discussion.)  Alai 12:25, 2 February 2007 (UTC)
 * You mean I shouldn't be running the bot that is using my oversight bit? I'll have to make a note to stop. The following announcement is for the humour impaired: Just kidding folks! On a related note, the title is of course still in the logs until log oversight comes, and/or a supersalting/article blacklist function comes out. The former could fix problems, the latter could help prevent. - Taxman Talk 14:40, 2 February 2007 (UTC)
 * Wouldn't that cause problems if for some reason a DRV decides a change in decision is warrant. I know it's rare, but it could happen. And the bot wouldn't know it. - 131.211.210.17 13:26, 31 January 2007 (UTC)
 * If that happened, it would only take a quick email or talkpage message to Cyde or to the bot to sort it out.
 * While certain pages won't ever be recreated by a good-faith newbie, I am worried about the list becoming large and the bot biting newbies or other good-faith users. In any case, the list of pages watched by Cydebot should be public and editable by admins (and kept short, just like the list of SALTed pages). I am not quite convinced that an extra bot for this is necessary, as most of the job can be done by cascading protection (unless it is regex, in which case the list will be just inviting people to figure out titles that won't trigger the bot). Kusma (討論) 13:33, 31 January 2007 (UTC)
 * The only point in the bot is so that the list of pages can be kept secret (otherwise cascading protection could be used instead); doing it by regex is a bad idea if the regexes are public, for the reasons you've explained. I'm sure that Cyde would share the list of pages with admins who asked, and likewise incoporate any reasonable suggestions the bot was given. I'm not sure how useful the bot would be, though; does anyone know how much use the bot would get in such a case? --ais523 13:59, 31 January 2007 (UTC)
 * Most of the list will be available in the bot's deletion log, so not making the full list available seems a needless incovenience (and the secrecy involved is rather unwiki). Kusma (討論) 14:26, 31 January 2007 (UTC)
 * Email me in private and I'll show you some of the absolutely disgusting things that would need to be done on a secret list. Saying it's "unwiki" is unfair; it's naive to think that every single thing can be done totally out in the open.  -- Cyde Weys  14:54, 31 January 2007 (UTC)
 * As to doing it by published regex would be a bad idea, how so? If a vandal was going to go through the trouble to find the regex page, learn regex (it's not THAT common and wildcards could be used), then craft a page to avoid that string it would be NOTHING for them to create one off character pages if regex was not used; as for publication of it:publication allows admins to contribute to the process, and instantly correct errors, but that is not a requirement for bots, simply a possible feature (see the server side-soltion proposed below for more on regex). —  xaosflux  Talk 01:29, 1 February 2007 (UTC)

I tend to agree that cascading protection can be used for pages that have titles that do not need to be kept secret. As for those that need to be kept secret; hmm. There's a delay loop involved here already. Let's say I'm User:Joe-vandal and I create George Washington was a flippin idiot (well, more offensive than that but you get the idea). It enters into the speedy-create/speedy-delete spin cycle. Since I'm a vandal, I keep recreating it. My account gets blocked, so I make another account and keep at it. Eventually, an admin recreates it as deletedpage and protects it, then adds a request for the page to be added to the secret list. At some future point, it gets added to the list, the page is deleted and unprotected, and away we go with the kill-on-sight bot (hey, nice name for it :)). So, there's inherent delay in this system, and the process already results in deletedpage and protection. And, we have to educate our admins that the secret list exists and how to use it.

An alternative; we now have temporary protection. So, we recreate pages with deletedpage and temporary protect them. Once the temporary protection expires, a different bot comes along and tags the page for speedy deletion. It does this by periodically running through Category:Protected deleted pages and looking for pages that are not protected and have the deletedpage template. This also helps to eliminate those pages created with deletedpage and not protected (typically by non-admins). I would imagine that the significant majority of article name vandals lose interest after a time. One or two weeks of temporary protection would probably be enough to stop them, and the bot that comes along and tags the page for speedy deletion keeps things clean up. We'd still need to educate admins on this feature/bot's existence, but we would not have a bot with admin functions then. As Cyde has noted, this has been problematic at RfA. This alternate path solves the problem, albeit differently, without having to have admin functions. --Durin 14:21, 31 January 2007 (UTC)
 * I don't think you realize what the secret list is for. "Flaming Flamingos" would never make it onto the secret list.  The secret list would be used for cases of defamatory content being used in page titles (most often about Wikipedians).  It wouldn't include anything that could ever really become a valid article.  -- Cyde Weys  14:54, 31 January 2007 (UTC)
 * I do realize what it is for. I am citing the Flaming Flamingos example to question how this bot goes about reporting violations of the list. That you can't imagine a valid article name that appears on the secret-salt list is unconvincing to me. Truth is stranger than fiction. Want be to get vulgar? Ok, imagine an article that keeps getting recreated titled "Fucking Dead President Skulls". It gets to the secret-salt list. 6 months later, a band with that name that was previously obscure creates enough fame for themselves to be included here. So, user comes along and works on it...and it gets deleted. Does the bot or does it not inform the user who created the page? --Durin 15:11, 31 January 2007 (UTC)
 * The bot would include in the deletion summary a link to a page with instructions on what to do in the event that a page is mistakenly protected against recreation. But I think the example you've come up with is a bit contrived, and extremely unlikely to occur in practice.  -- Cyde Weys  15:20, 31 January 2007 (UTC)
 * Could it send notification to the user who keeps recreating as well? And, what of the alternative, non-admin method of handling this? What are your thoughts on that? --Durin 15:31, 31 January 2007 (UTC)


 * 90% of what the bot is designed to accomplish could simply be done by having it post a notice to a page (or to several admins who have agreed to help out), saying "Page X has been created", together with a link to the AfD or other place where the page deletion was discussed. This notification also provides a safeguard - the admin will presumably check the link before deleting the page again.  In fact, the admin who lists the page could include his/her user name as the account to be notified. (And if the bot is only useful if it is built to use a "secret" list, exactly what sort of precedent does that set? -- John Broughton  (☎☎) 14:28, 31 January 2007 (UTC)
 * Perhaps the bot could just tag all pages on the "secret list" (pages attacking Wikipedians) that do get recreated with db-attack. The deleting admin (the attack page category usually gets done within an hour) then should also block the creator. To be really efficient, this bot would need to delete and block just like any admin who comes across a page like "Kusma's home phone number is 1-612-3255555 and he is GAY!" should. Kusma (討論) 16:08, 31 January 2007 (UTC)
 * I can't imagine this even coming close to being a good idea. I can't even make a coherent argument in favor of this, this sounds completely absurd. --badlydrawnjeff talk 18:05, 31 January 2007 (UTC)
 * Would you happen to have an actual argument against it? Obviously you've never been on the receiving end of attack pages.  -- Cyde Weys  01:48, 1 February 2007 (UTC)
 * Yeah, please let us know when you've read the above. I can attest that the type of stuff we are talking about we regularly permanently delete on the oversight mailing list. That to me is clear justification for this bot until such a time a mediawiki function is ready to run. - Taxman Talk 14:40, 2 February 2007 (UTC)
 * Would there be some kind of automatic expiry for articles on the open list? So if a page had been added because of repeated recreation, would it automatically be removed after a week/month or so?  I ask because it seems like the only way to prevent a gargantuan list developing over time.  Oh, and I assume it wouldn't be restricted to articles that fall under the scope of G4, i.e. those deleted by XfD? David Mestel(Talk) 19:49, 31 January 2007 (UTC)
 * If this bot would have to go through RfA like ProtectionBot did, I find it highly difficult to believe it could succeed. The idea of a bot automatically deleting pages kept on a secret list only accessible by a select group of admins would attract every single "The admins are out to get us" conspiracy theorist on the entire site, regardless of your justification. People were making comparing ProtectionBot to the Terminator. Just imagine what they'll compare this to. I agree that the concept would be useful, but I can definitely see why others wouldn't.-- Dycedarg ж 21:36, 31 January 2007 (UTC)
 * I know my expectation is an RfA on this if this ends up being successful, but I can't speak for anyone else. Badlydrawnjeff 21:39, 31 January 2007 (UTC)
 * I think it's pretty clear from the protectionbot RfA that people don't want a bot to return to RfA. They want it done through a similar process or just agreed upon by the BAG and bcrats. - Taxman Talk 22:59, 31 January 2007 (UTC)
 * Err, I don't think that's the case. We've yet to see a succesful admin-bot rfa, after all. -  brenneman  03:33, 11 February 2007 (UTC)
 * Why can't the "secret list" be coded up just like any other limited access page such as Special:Undelete or Special:Oversight, so that either all admins could see it, or all with Oversight rights, etc (or one list for each if there are entries that really need to be private). On that note, why can't this bot's features be coded up for Mediawiki to interface with the cascading protection feature?. That way there is a public list and a private one, and nobody gets hurt in RfA. If that's not feasible, then move forward with the bot. - Taxman Talk 22:59, 31 January 2007 (UTC)
 * That would be the ideal solution, but until it's done, the bot is ready. -- Cyde Weys  01:47, 1 February 2007 (UTC)


 * Good question. It's not too hard to check the history and make sure the page hasn't been moved from elsewhere before deleting it.  We're quickly approaching the BEANS threshold, by the way ...  Cyde Weys  16:01, 12 February 2007 (UTC)
 * True. But I'm sure someone will think of it sooner or later with or without our help.  If it becomes a common problem we can look at automatically reverting the move, but there are ways to mess that up, so lets wait and see if it's justified before trying it.  So long as the bot only chews on New pages, not moved pages, I'm happy.  If you want, it could message someone about moved pages, if that's easy to do.  Regards, Ben Aveling 07:14, 13 February 2007 (UTC)

This bot sounds useful, and it's even open source. I'd say go for it.  &gt; R a d i a n t &lt;  14:52, 12 February 2007 (UTC)

Server side solution preferable?
In thinking about the "secret list", it doesn't matter if the list is secret, as the pages will still be seen in the logs. Along that line, couldn't this be resolved with a Blacklist page of juped page names, along the lines of the spam blacklists? If done where editable by admins, I could even see this being regex'able (should help prevent explict swearing, reduce personal information in the page). Such a list may useful to replace long term salting. —  xaosflux  Talk 01:23, 1 February 2007 (UTC)
 * Also items like Kusma's home phone number is 1-612-3255555 and he is GAY! Could be regexed as  home phone .*[0-9][0-9][0-9]-[0-9][0-9][0-9]  and cover this and many other situations, without revealing the information, and with a very low chance for false positives. —  xaosflux  Talk 01:46, 1 February 2007 (UTC)
 * Forgive my laze regex above, been a while since I used it, but it gets the point out. —  xaosflux  Talk 01:50, 1 February 2007 (UTC)
 * Email me and I'll show you some attack pages that have been repeatedly re-created and deleted that I don't think regexes could cover. Plus, you know it's always possible to use alternative ASCII encodings for characters that look nearly identical. -- Cyde Weys 01:51, 1 February 2007 (UTC)

Working on this for MediaWiki, too. It's on my list after fixing Special:Protectedpages and per-page blocking. (Oh no, not Werdna and his making-adminbots-obsolete-with-MediaWiki-features) — Werdna talk 08:00, 1 February 2007 (UTC)
 * Actually please do make them obsolete, but in the future lets figure out how to cut the middleman out and not waste as much time. How can we better coordinate needed Mediawiki features and help people contribute there? Bugzilla only goes so far, since many requests languish for a long time. But after 500kb per adminbot discussion, suddenly a feature gets coded. And can you clarify what you mean on "this". What features are you looking to implement? - Taxman Talk 15:40, 1 February 2007 (UTC)

(as of Expired as of 03:51, 2 March 2007 (UTC)) -- RM 12:07, 22 March 2007 (UTC)
 * We do not seem to be nearing a consensus to even start trials on this bot, some suggestions have been made above to change the operations to a tagging based solution, and a possible server-side solution. Without consensus this request will soon expire.  My suggestion would be to refactor the request to a tagging-type solution, give the developers some time to work on the backend, and they don't come up with a fix and the tagging isn't useful to rerequest deletions.  In preperation for that I'd suggest running this on a seperate account as well.  —  xaosflux  Talk 13:33, 18 February 2007 (UTC)


 * The above discussion is preserved as an archive of the debate. Please do not modify it. Subsequent comments should be made in a new section.