Wikipedia:Bots/Requests for approval/KuduBot 3


 * The following discussion is an archived debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA. The result of the discussion was Symbol oppose vote.svg Withdrawn by operator.

KuduBot 3
Operator:

Time filed: 00:01, Monday September 12, 2011 (UTC)

Automatic or Manual: Automatic unsupervised

Programming language(s): Python and regular expressions

Source code available: Standard pywikipedia, regular expression / parameters may be available on request

Function overview: Move all hatnotes to the very top of the articles per the Manual of Style.

Links to relevant discussions (where appropriate): Bot requests/Archive 43

Edit period(s): One-time run, then daily

Estimated number of pages affected: ? (articles)

Exclusion compliant (Y/N): Y

Already has a bot flag (Y/N): N

Function details: Affects only article lead section.

Discussion
Are there *any* cases where the top is not the best place for a hatnote? Are some used in sections, perhaps? - Jarry1250 [Weasel? Discuss.] 17:23, 12 September 2011 (UTC)
 * They definitely could have been used there, so limiting to lead section is probably smart. — HELL KNOWZ  ▎TALK 17:41, 12 September 2011 (UTC)
 * Okay, that's one exception. Are there others? I assume we're limiting to article space here for a start? - Jarry1250 [Weasel? Discuss.] 18:02, 12 September 2011 (UTC)
 * I somehow assumed this applies to articles by default; it definitely should, article layout guidelines do not apply to other namespaces. — HELL KNOWZ  ▎TALK 18:07, 12 September 2011 (UTC)
 * It probably applies to project space too, but limiting to article space for now seems wise. Yes, I'll add an exception for sections in the form of a lookahead. —  Kudu ~I/O~ 20:12, 12 September 2011 (UTC)


 * How do you intend to find articles that suffer from this problem? Are you going to randomly crawl through every article, or is there a database report somewhere?  &mdash;SW&mdash; comment 18:47, 13 September 2011 (UTC)
 * Presumably by processing a dump (AWB can handle this). - Jarry1250 [Weasel? Discuss.] 20:52, 13 September 2011 (UTC)
 * Right, just want to ensure that the operator is willing/able to download and process a database dump. &mdash;SW&mdash; confabulate 21:27, 13 September 2011 (UTC)
 * This will be done by accessing the database directly from the toolserver. —  Kudu ~I/O~ 20:35, 15 September 2011 (UTC)
 * This is not possible. The toolserver database does not include page text.  &mdash;SW&mdash; verbalize 22:30, 15 September 2011 (UTC)
 * Right. Perhaps I can write a separate tool which uses WikiProxy and dumps a list of pages to a file, and then feed that to pywikipedia's replace.py. —  Kudu ~I/O~ 14:03, 18 September 2011 (UTC)
 * What is WikiProxy? How will it generate a list of problematic pages if it does not analyse a dump? (Or does it?) - Jarry1250 [Weasel? Discuss.] 15:35, 18 September 2011 (UTC)
 * The magic eye says: meta:User:Duesentrieb/WikiProxy —  Kudu ~I/O~ 21:36, 19 September 2011 (UTC)

Let's at least see how it performs and then (hopefully) wait for some feedback. Per WP:COSMETICBOT, be careful to not make edits that only affect whitespace and newlines, as is often the case with misformatted lead templates. — HELL KNOWZ  ▎TALK 09:37, 22 September 2011 (UTC)
 * One minute: Please don't use WikiProxy. If you're going to be scanning 3.5 million articles, please use a dump. It just makes sense. - Jarry1250 [Weasel? Discuss.] 18:17, 22 September 2011 (UTC)
 * I agree. Sending mass queries through wikiproxy will still consume massive resources at toolserver, which is not good (and is probably a violation of toolserver policies).  There is no reason that this task can't work from a database dump that is a few days old.  The task doesn't require up-to-the-second versions of articles.  You could also consider asking the maintainer of this tool to add a report for misplaced hatnotes if you don't want to deal with database dumps yourself.  &mdash;SW&mdash; spout 19:02, 22 September 2011 (UTC)
 * I'm trying to see if there are dumps available on the toolserver already, since I have a rather small quota myself. Anybody more experienced feel free to help. —  Kudu ~I/O~ 12:08, 23 September 2011 (UTC)
 * If you have a fast internet connection and a moderately good processor, it is far easier to download one onto your home PC and process it with AWB. - Jarry1250 [Weasel? Discuss.] 12:20, 23 September 2011 (UTC)
 * I use Mac OS and Linux, so no AWB for me. However, I'll consider running pywikipedia with a dump from my own computer. Nothing is urgent, so I'll set it up over the next few days. —  Kudu ~I/O~ 12:27, 23 September 2011 (UTC)
 * Downloading an XML dump to the toolserver. —  Kudu ~I/O~ 19:38, 23 September 2011 (UTC)


 * Here's the update: I finished downloading and extracting the dump, and now I'm running the script in a screen session. It's still analyzing the dump. — <span style="font-family: Georgia, Garamond, serif;"> Kudu ~I/O~ 22:42, 23 September 2011 (UTC)
 * Bad support from pywikipedia. It'd be easier for someone to file a new BRFA using AWB. — <span style="font-family: Georgia, Garamond, serif;"> Kudu ~I/O~ 21:59, 4 October 2011 (UTC)
 * The above discussion is preserved as an archive of the debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA.