Wikipedia:Bots/Requests for approval/Lightbot 16


 * The following discussion is an archived debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA. The result of the discussion was Symbol delete vote.svg Denied for now.

Lightbot 16
Operator:

Time filed: 09:20, Sunday July 31, 2011 (UTC)

Automatic or Manually assisted: Automatic supervised

Programming language(s): AWB, monobook, vector, manual

Source code available: Source code for monobook or vector are available. Source code for AWB will vary but versions are often also kept as user pages.

Function overview: Delink common units of measurement

Links to relevant discussions (where appropriate): This request is similar to Bots/Requests for approval/Lightbot 6. That was designed to delink common units but limited to units of length, area, or volume. The constraints were put in place to gain confidence and experience. Following experience, this bot is designed to address all common units (e.g. including weight, time, speed, volume, etc).

Edit period(s): Multiple runs. Often by batch based on preprocessed list of selected target articles.

Estimated number of pages affected: Individual runs of tens, or hundreds, or thousands.

Exclusion compliant (Y/N): Yes, will comply with 'nobots'

Already has a bot flag (Y/N): No

Function details: Edits will delink common units of measurement in accordance with wp:link - What generally should not be linked. Wikipedia has information on what may be regarded as a common unit. The threshold for delinking may be adjusted depending on whether a conversion is present. Lightbot6 has been very successful. If an issue arises that can't be resolved locally, community guidance will be sought.

Discussion
Please provide a list of every unit that will be delinked. Please describe how the exception "Unless they are particularly relevant to the topic of the article" in wp:link - What generally should not be linked will be addressed, bearing in mind that it is up to the bot to get it right, not the responsibility of the article editors to use some peculiar syntax to "protect" a relevant link. Jc3s5h (talk) 11:48, 31 July 2011 (UTC)
 * A couple of brainstorm-y ideas for this:
 * If the unit in question is linked multiple times in the page, there's probably a decent chance it's legit to delink, at the very minimum, the 2nd or 3rd+ occurrence (so as to also cover WP:OVERLINK).
 * Double check to make sure, via the api, that things like gram and inch and whatnot don't have backlink relationships with the page-of-interest (e.g., if "Article Foobar" is referenced by gram, either as a direct link or, especially, a redirect, don't touch the linking of units on "Article Foobar" at all). Same goes with categorymembers of stuff related to measurement, distance, etc.
 * Also, given the prior history with reference to Wikipedia_talk:Requests_for_arbitration/Date_delinking, it'd probably be ideal to seek consensus to run this before any sort of large-scale run. I could easily see this bot causing a ruckus&mdash;even if it worked perfectly&mdash;because it seems like this could affect a lot of pages.
 * -- slakr \ talk / 00:28, 3 August 2011 (UTC)
 * There's already consensus for this (see WP:OVERLINK). However, I would like to know what the "general logic" of delinking would be before making a decision to trial or not. Several people complained at User talk:Lightmouse, and I'm inclined to share their concerns. Exclusion lists should be fairly comprehensive (e.g. Category:Units of measure + subcats should definitely not be covered by the bot, there probably are others that should be excluded too, like Category:Measurement, Category:Metrology, etc...). Headbomb {talk / contribs / physics / books} 14:45, 14 August 2011 (UTC)

The process includes obtaining a target list (often by doing 'what links here' and/or a database scan). Then preprocessing the list based on a variety of criteria (e.g. includes certain strings in the title or body) and/or matches a whitelist. There are also run-time checks. Each process and check takes effort in different ways and has pros and cons.

I've recently been doing a massive run. A small proportion on a large run becomes disappointingly large in number (i.e. more than a handful). The feedback and investigation shows most of them due to redirects (issue now resolved almost 100% using an AWB switch) or by insufficient exclusion (as you suggest). The exclusion process does make use of categories but it didn't go all the way to the top of [:Category:Measurement]. I tried initially but ended up using the lower categories such as [:Category:Units of measure]. See what the AWB manual says about drilling down from the top: It's clear that I'll have to go to more effort and increase the size of the exclusion list. One method may be to use the database scanner to include more categories in the whitelist. I can say more about this recent massive run but I think I'll wait for your further thoughts. Lightmouse (talk) 19:13, 14 August 2011 (UTC)
 * Be advised that this can be very time intensive and may yield redundant results. For example, recursively searching from Category:Evolution will find the Human evolution article several times and even traverse the same subcategories numerous times. Worse yet, the process may take so long that it will terminate prematurely (caused by either Wikipedia or AWB).


 * Hmmm... Well this is sound and all, but I'm not really ready to approve or trial anything at this point in time. Judging from your talk page, complaints about improper unlinking arise fairly often. While this lead to improvements in the code and logic, it also makes me believe that the false positive rate is currently too high at the moment to approve this or trial this. There are also a few discussions on the MOS concerning unit unlinking (I'm sure you're aware of that) so I would take the time between now and the resolution of whatever unit unlinking RFC is going on to improve and solidify existing code/exclusion logic rather than to expand it. So for now, with no prejudice against a later BRFA for the same task, after false positives rates have gone down, and not before one week after the conclusion of the "RFC" on unit unlinking. You can re-open the request on this page rather than file a new one once you believe the false positive rates has demonstrably gone down, and that the unit unlinking stuff on the MOS has been resolved and stable. We'll revisit the matter then. Headbomb {talk / contribs / physics / books} 19:41, 14 August 2011 (UTC)


 * The above discussion is preserved as an archive of the debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA.