Wikipedia:Bots/Requests for approval/PDFbot 3


 * The following discussion is an archived debate. Please do not modify it. Subsequent comments should be made in a new section. The result of the discussion was Symbol keep vote.svg Approved.

PDFbot
Operator: Dispenser

Automatic or Manually Assisted: Unsupervised automatic

Programming Language(s): Python

Function Summary: Maintenance outside of PDFlink of PDF files and external links

Edit period(s) (e.g. Continuous, daily, one time run): Periodic

Already has a bot flag (Y/N): Yes

Function Details:


 * 1) Adding |format=PDF to cite web templates.  This will be checked using the old PDFbot engine.  (usurped from Plasticbot task #3)
 * 2) Add/remove  tags when appropriate
 * 3) Fill parameters in cite web/cite news templates
 * 4) General maintenance of external links (likely to be limited by bot's name)

The following "Common fixes" will only performed if the above actions result in an edit to the page. All are intended to be non-destructive.
 * 1) Unlink format parameters if they are common (e.g. PDF)
 * 2) &lt;s&gt; Unlink common publisher (e.g. Associated press, CNN, New York Times) &lt;/s&gt; (Reconsidering per talk at WP:OVERLINK)
 * 3) Lower casing on &lt;REF> tags
 * 4) Convert bracketed URLs to references
 * 5) Move punctuation characters before inline refs
 * 6) Merge duplicate references
 * 7) If more than a specified number of refs, enable multi-column on reflist using colwidth parameter
 * 8) HTML to XHTML convention
 * 9) Converts some HTML to equivalent CSS
 * This is a small sample, more are in the engine and will be add/remove as time goes on.

Discussion
I'm not so sure about some of the "Common fixes". #5 is in contradiction to REFPUNC. #7 is similarly a matter of editor preference and IMO not something a bot should be messing with; some people hate multiple columns with a passion in any circumstance (and on a personal note, I'd want at most 25em). #3, #8, and #9 seem like they would just clutter the diff without doing much of anything useful (the software will fix anything that really needs fixing), but that's not so important. Anomie⚔ 01:40, 6 October 2008 (UTC)
 * #5 is a long standing request for AWB's general fix and I could test to see which is the dominate format; however, the long use of both has caused weird things to appear in punctuation.[1]. Similarly people with issue of #7 probably better off using some CSS override to get ride of those columns (widths are "recommended" 30em).  You can try out the commons fixes using either the checklinks or reflinks tools.  Its surprising how many (case) variants of &lt;ref> there in one article and how the parser handles &lt;/br>.  — Dispenser 03:16, 6 October 2008 (UTC)
 * I still can't see #5 not causing trouble unless someone can get REFPUNC to change; however, fixing ".[1].", "[1] [2]", and ".[1]Foo", and the like seems uncontroversial. Fixing &lt;/br> is not an "HTML to XHTML" conversion IMO (&lt;br> to &lt;br /> would be), it's just fixing an error.
 * I tried to get the ability to do easy CSS overrides into reflist a few months back, but no one cared to discuss it and so we are left with the current situation where CSS overrides are difficult if not impossible. I could override every reflist to use my preference, but I can't override just the ones using column widths instead of counts or even just the ones using multiple columns, for example. Anomie⚔ 03:59, 6 October 2008 (UTC)
 * It should be possible to override many situational with css like, .  Specific values of column-width and column-count could be overridden using javascript.  But this isn't really the form to discuss it.  I may drop it as it is may likely better to enable columns globally in reflist.  — Dispenser 07:02, 6 October 2008 (UTC)
 * As I said, that CSS does it for all instances of reflist instead of specifically overriding problem cases. Good luck if you try to get columns added by default to reflist ;) I agree, this isn't the forum for that discussion. My concerns have been raised, you've responded, now it's up to BAG to decide. Anomie⚔ 13:13, 6 October 2008 (UTC)

Unlink a common publisher > Could you point me to the relevant MoS? =Nichalp  «Talk»=  18:38, 10 October 2008 (UTC)
 * I do not think it is stated anywhere in the MoS. However, generally the MoS is moving away from overlinking.  For instance, the format parameter on cite web is no longer linked in the examples and more recently date linking became deprecated.  I suspect that the quality of the backlinks became an issue with citation only links. — Dispenser 00:54, 13 October 2008 (UTC)
 * Yes, that's true, but would prefer it be deferred until consensus can be reach on which publishers need to be unlinked. Also, I strongly feel that your bot name should reflect the tasks that it takes up. So, if you wish to diversify your portfolio to handle citation fixes, please consider setting up a new account. I'll approve the bot flag for the tasks that PDFbot currently has. =Nichalp   «Talk»=  08:42, 13 October 2008 (UTC)
 * The purpose of the bot is maintain PDF external link (functions A), but the discussion so far has been focused on common/general fixes (functions B). B with not happen without A performing something useful.  — Dispenser 13:05, 13 October 2008 (UTC)
 * Great. Thanks for the clarification. =Nichalp   «Talk»=  15:10, 13 October 2008 (UTC)

Regarding your recent addition, how is your bot going to determine which pages in the PDF are being referenced in order to fill in ? Anomie⚔ 22:48, 13 October 2008 (UTC)
 * Removed, I didn't full review the idea when I can up with it thing morning. I had thought that pages= referred to the total number of pages.  Its been dropped now.  — Dispenser 05:18, 14 October 2008 (UTC)
 * No super-advanced AI? Darn! ;) Anomie⚔ 11:09, 14 October 2008 (UTC)

Mr.Z-man 18:04, 7 November 2008 (UTC) Please note not all the capabilities have been programmed in yet. — Dispenser 00:13, 24 November 2008 (UTC)
 * Early run without commonfixes updated code with each issue
 * Yesterday, edits with commonfixes.py, cite web/PDFlink bug's has since been fixed and conversion of * [link] ...text... (pdf) has been disabled.
 * Mr.Z-man 07:06, 28 December 2008 (UTC)


 * The above discussion is preserved as an archive of the debate. Please do not modify it. Subsequent comments should be made in a new section.