Wikipedia:Bots/Requests for approval/CMoonBot


 * The following discussion is an archived debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA. The result of the discussion was Symbol neutral vote.svg Request Expired.

CMoonBot
Operator:

Automatic or Manually assisted: Manually-assisted

Programming language(s): I will only be using regex commands via AWB.

Source code available: If desired, I will make the source code available in the form of ABW settings files.

Function overview: I am using AWB to inspect a lot of articles as part of removing deprecated templates, merging templates, etc. Please see the Function details section below.

Links to relevant discussions (where appropriate):

Edit period(s): Infrequently; one-time for different manually-assisted tasks, and certainly not intended for continuous or daily use.

Estimated number of pages affected: Because these are manually-assisted edits, the page rate should be less than 10 per minute, the highest rate I've observed when using AWB to make simple changes. A more typical rate is 3 per minute.

Exclusion compliant (Y/N): Y

Already has a bot flag (Y/N):

Function details: I am using AWB to make manually-assisted edits. In order to build a list of more than 25K candidate articles in AWB, one must use a bot account. There are two maintenance tasks (one of which is complete) that I will use to illustrate why I would like to use a bot-approved account, but I should point out that these are examples of the type of edits I was planning to make and so this is an open-ended request.

In the recent past, I used AWB to make thousands of edits to change the deprecated template Mapit-US-cityscale to coord or equivalent. The relevant discussion for that change is here. After making those edits, I discovered that making that many manually-assisted edits might require bot approval.

As part of that task, I used AWB list tools to make lists of candidate articles, and specifically, I used them to build a list of articles that transclude the template. It is limited to returning 25K pages, and there were more than 25K such pages, but given I was removing pages from the universe of articles as I edited them, the technical restriction was not a major barrier. It did impact me in that I could not make a cross-list of pages that transclude both Mapit-US-cityscale and Infobox settlement, a template with more than 25K transclusions that was used on many of the same pages as Mapit-US-cityscale. When Infobox settlement was present, adjusting its parameters was the preferred way to eliminate the use of Mapit-US-cityscale. Due to the restriction, I could not determine how many of the uses of Mapit-US-cityscale should be replaced by changing the parameters of an Infobox settlement template in the same article. This was not a major barrier, but it did make the project more difficult.

I am currently investigating a proposed merge of two templates: Infobox single (25K+ uses) and Infobox song (~2800 uses). Merging the templates has been discussed on the template talk pages of both templates at various times in the past. The most recent discussions are on the Infobox song talk page. I am doing some research right now so I can prepare a formal proposal, but I can not say whether that proposal will be accepted. I suspect it will; the infoboxes are very similar, maintaining them takes twice as much effort as it should, and users have complained about slight differences between the two. In order to support the merger discussion, there should be an analysis of how merging the templates will affect existing articles. I was planning to use AWB to help with that analysis, but the 25K limit for non-bot accounts in AWB is a barrier.

One example of the analysis concerns the Type parameter. It is a valid Infobox song parameter but it is not a valid Infobox single parameter. Despite that, it is sometimes included in calls to Infobox single. Without knowing how many times it is used, and what values it has in those instances, it's hard to know if those calls will be a problem when the Infobox song Type parameter is implemented in the merged template. I could trigger the addition of the article into a category when the Type parameter is used in Infobox single, but there are various similar situations and making categories for them would require more work for me and more work for the WP servers. Also, the two templates are protected, and I am not an Admin, so to make the "add category" edits I would have to edit the sandbox version, wait for an admin to install it, wait for the servers to update the categories, etc. Again, this seems like extra work all around.

While this request is open-ended, my plan is not to use AWB to edit 25K+ articles. I want bot approval for the CMoonBot account so I can find the articles of interest without using maintenance categories and without using database dumps.

Discussion
Is there any particular reason you don't want to use database dumps? - Kingpin13 (talk) 08:44, 1 March 2010 (UTC)
 * I may not know enough to make a good decision about that, but using database dumps seems like a lot more work to me, especially to get around a relatively arbitrary limitation. I have to download a multi-gig compressed dump file, expand it to an even bigger database feed, and then load that into a DB program which I don't currently have, so I'd have to install and configure it. The dumps are always a little stale, so the results may not be 100% accurate, though that's a small issue for the type of projects I'll be doing. — John Cardinal (talk) 15:19, 1 March 2010 (UTC)
 * Note that if you're only searching text, you don't need to load it into a DB, you can just parse the XML directly. Mr.Z-man 17:39, 4 March 2010 (UTC)
 * Good point. I assume I'd have to use a SAX-based XML tool. (I've got experience with XML tools, but all DOM-based.) — John Cardinal (talk) 15:22, 7 March 2010 (UTC)
 * You dont even have to decompress the dumps on your harddisk because a lot of languages allow you to read a zipped file and decompress on the fly. SAX is faster but I guess you can use both. --hroest 12:35, 9 March 2010 (UTC)

Have you thought about changing the infobox templates such that they accept the same parameter sets? I have no experience in the file of music but that would at least relive the problems of people that use the "wrong" template. --hroest 12:38, 9 March 2010 (UTC)
 * I don't have knowledge of the specific templates being discussed but has it been pointed out that AWB has a database dump scanner, which would allow all the exiting template calls to be analysed offline? Rjwilmsi  21:52, 28 March 2010 (UTC)

Questions — John Cardinal (talk) 15:22, 7 March 2010 (UTC)
 * Are other editors using the XML-based approach described above? Can someone describe their tools?
 * I use the XML based approach on the de-Wikipedia to find and correct typos in a semi-automatic fashion. --hroest 12:35, 9 March 2010 (UTC)
 * What about cases where I use AWB to make edits that are allowed by the tool but exceed the accepted usage, i.e., edits of the type where the AWB docs say you should have a bot account even if performing manual edits?
 * May I suggest WP:DDR? They do stuff like this. Tim1357 (talk) 19:56, 28 March 2010 (UTC)


 * The activity level here seems to have died down. What's going on with it? Are there any updates, and do you still want this request to stay open? &mdash; The Earwig   (talk)  20:08, 19 May 2010 (UTC)

No operator activity since 2010-mar-08 Josh Parris 01:14, 31 May 2010 (UTC)
 * The above discussion is preserved as an archive of the debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA.