Wikipedia:Bots/Requests for approval/H3llBot 9


 * The following discussion is an archived debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA. The result of the discussion was Symbol oppose vote.svg Withdrawn by operator.

H3llBot 9
Operator:

Time filed: 13:49, Sunday August 28, 2011 (UTC)

Automatic or Manual: Automatic unsupervised

Programming language(s): C#, own library, api.php

Source code available: N

Function overview: Add wikilinks to work/publisher fields from a selected list, where unambiguously identified, and where it matches the existing style.

Links to relevant discussions (where appropriate): Wikipedia talk:Citing sources; Template talk:Cite web; Template talk:Cite news; Wikipedia talk:WikiProject Video games

Edit period(s): Continuous when run, with other tasks

Estimated number of pages affected: Articles only; a lot I think not a lot, currently at a ratio 4/3500, will grow as I add cases, will decline as backlog drops

Exclusion compliant (Y/N): Y

Already has a bot flag (Y/N): Y

Function details:

If the publisher, work is a text-only value of a known article (from a list I keep and manually update) and where ambiguous the url matches that entity, then wikilink there:  |work=IGN   to    |work=IGN 

The bot will only edit based on usage threshold of existing style (50% for now):
 * If more than 50% of entities have at least one occurrence linked, then proceed to link all first occurrences.
 * If more than 50% of entities have two or more occurrences linked, then proceed to link all occurrences. (If an entity only appears once, it is not counted towards non-multiple linked occurrences)

In the below examples 3 entities are considered – "IGN", "Gamespot", "RPS" for which bot knows the target article, and how many times their occurrences are linked. Any unknown entities would be considered as unlinked.

|work=IGN |work=IGN |work=IGN |work=IGN |work=Gamespot |work=Gamespot |work=Gamespot |work=RPS |work=RPS |work=RPS |work=IGN |work=IGN |work=IGN |work=IGN |work=Gamespot |work=Gamespot |work=Gamespot |work=RPS |work=RPS |work=RPS
 * Nothing linked – no changes (below threshold)

|work= IGN |work=IGN |work=IGN |work=IGN |work=Gamespot |work=Gamespot |work=Gamespot |work=RPS |work=RPS |work=RPS |work= IGN |work=IGN |work=IGN |work=IGN |work=Gamespot |work=Gamespot |work=Gamespot |work=RPS |work=RPS |work=RPS
 * One occurrence linked 33% – no changes (below threshold)

 |work=IGN |work=IGN |work=IGN |work=IGN |work=Gamespot |work=Gamespot |work=Gamespot |work=RPS |work=RPS |work=RPS  |work= IGN |work=IGN |work=IGN |work=IGN |work= Gamespot |work=Gamespot |work=Gamespot |work= |work=RPS |work=RPS</tt>
 * One occurrence linked 66% – link first occurrences

<tt> |work=IGN |work=IGN |work=IGN |work=IGN |work=Gamespot |work=Gamespot |work=Gamespot |work=RPS |work=RPS |work=RPS </tt> <tt> |work=IGN |work=IGN |work=IGN |work=IGN |work=Gamespot |work=Gamespot |work=Gamespot |work=RPS |work=RPS |work=RPS </tt>
 * One occurrence linked 100% – no changes (no first occurrences to link); Two+ occurrences linked 0 % – no changes (below threshold)

<tt> |work=IGN |work=IGN |work=IGN |work=IGN |work=Gamespot |work=Gamespot |work=Gamespot |work=RPS |work=RPS |work=RPS </tt> <tt> |work=IGN |work=IGN |work=IGN |work=IGN |work=Gamespot |work=Gamespot |work=Gamespot |work= |work=RPS |work=RPS</tt>
 * One occurrence linked 66% – link first occurrences; Two+ occurrences linked 33 % – no changes (below threshold)

<tt> |work=IGN |work=IGN |work=IGN |work=IGN |work=Gamespot |work=Gamespot |work=Gamespot |work=RPS |work=RPS |work=RPS </tt> <tt> |work=IGN |work=IGN |work=IGN |work=IGN |work=Gamespot |work=Gamespot |work=Gamespot |work=RPS |work=RPS |work=RPS </tt>
 * Two+ occurrences linked 33 % – no changes (below threshold)

<tt> |work=IGN |work=IGN |work=IGN |work=IGN |work=Gamespot |work=Gamespot |work=Gamespot |work=RPS |work=RPS |work=RPS </tt> <tt>|work= IGN |work=IGN |work= |work= |work= Gamespot |work=Gamespot |work= |work= RPS |work= |work=</tt>
 * Two+ occurrences linked 66 % – link all occurrences

<tt> |work=IGN |work=IGN |work=IGN |work=IGN |work=Gamespot |work=Gamespot |work=Gamespot |work=RPS |work=RPS |work=RPS </tt> <tt>|work= IGN |work= IGN |work= IGN |work= |work= Gamespot |work=Gamespot |work= |work= RPS |work=RPS |work=</tt>
 * Two+ occurrences linked 100 % – link all occurrences


 * Individual case logic

If the publisher, work is a text-only value of a known article (from a list I keep and manually update) and, where ambiguous the url matches that entity, then wikilink there:


 * <tt> |url=http://ign.com/hi.html |work=IGN </tt>  to   <tt> |work=IGN </tt>
 * <tt> |url=http://ign.com/hi.html |work=IGN Entertainment </tt>  to   <tt> |work=IGN Entertainment </tt>
 * <tt> |work=IGN Entertainment </tt>  to   <tt> |work=IGN Entertainment </tt> url missing, but it's unambiguous
 * <tt> |url=http://dummy.com/lol.php |work=IGN </tt>  url different, may be it's Institut Géographique National?
 * <tt> |work=IGN </tt>  url missing, may be it's Institut Géographique National?
 * <tt> |url=http://dummy.com/lol.php |work=IGN Entertainment </tt>  to   <tt> |work=IGN Entertainment </tt> url different, but it's unambiguous

The bot will skip linking the same entity used twice:
 * <tt> |work=IGN.com |publisher=IGN Entertainment, Inc. </tt>

<tt> </tt>
 * Example:


 * to

<tt> </tt>

Discussion
Not sure where I should advertise this? WT:Citing sources? I don't think it's really controversial, except for potentially making many edits. — HELL KNOWZ  ▎TALK 13:54, 28 August 2011 (UTC)


 * Probably there, maybe also give WikiProject Manual of Style a direct poke since they seem to claim that page as part of their project. <em style="font-family:Bradley Hand ITC;color:blue">Hers <em style="font-family:Bradley Hand ITC;color:gold">fold  (t/a/c) 15:27, 28 August 2011 (UTC)


 * Dropped a few notes around. — HELL KNOWZ  ▎TALK 16:22, 28 August 2011 (UTC)


 * Two points. Firstly, how will the bot handle multiple citations to the same publisher? Will it link every instance?
 * Secondly, I must oppose the bot as written on ground of inefficiency. Making tens of thousands of edits just to link something is not a good idea. May I suggest you take the opportunity to make other edits at the time, such as template name and parameter name standardisation, merging duplicated references, etc, etc. This would make the edits much more worthwhile. If you're feeling brave, once you've got the publisher and link there are tools to give information that you can to the reference such as publication date. - Jarry1250 [Weasel? Discuss.] 17:37, 28 August 2011 (UTC)


 * It links all instances. I went through some of FA content and when it's linked, that's the way it's used. Like here: 1080° Snowboarding. All but one "IGN" are linked, and I don't think that the omitted link was a concious editorial decision.
 * I don't really consider adding wikilinks a minor change. It changes the visual display, so not WP:COSMETICBOT. We have bots repairing wikilink syntax and such. If anything, this is the exact tedious task that is best suited to bots. But that's my opinion and I realize this may edit a lot of pages. If yours is a major opinion, I can limit this to be done with other tasks or when a certain number of links is added. —  HELL KNOWZ  ▎TALK 17:59, 28 August 2011 (UTC)


 * Wikilinking to replace a url is a good idea (that certainly doesn't fall in the realm of WP:COSMETICBOT). Not so sure about wikilinking by default however (although maybe some rules can be devised, like wikilink if more than 50% of stuff is wikilinked). I like the wikilinks, but many would consider it overlinking I think. This needs a bit more discussion I think. Headbomb {talk / contribs / physics / books} 19:10, 28 August 2011 (UTC)

O.K. I realize I've been spending too much time with video game/science articles, where everything is always linked. This may not be preferred by everyone, may be considered OVERLINK, and can even be intrude CITEVAR. So I'll try to refine the scope of this task from "link whenever can" to "link the way it already does" and amend the description. — HELL KNOWZ  ▎TALK 19:42, 28 August 2011 (UTC)
 * I updated the function details to only work while preserving the existing style with a threshold value of 50% to what is considered dominant style. — HELL KNOWZ  ▎TALK 20:38, 28 August 2011 (UTC)
 * Alright, the updated task's fine. Let's trial this. Headbomb {talk / contribs / physics / books} 20:11, 29 August 2011 (UTC)
 * I hate to be a negative nancy, but I really don't like the idea of mass linking work and publisher from citations. 99% of the time, the only link you want to follow from a citation is the link to the cited work itself. Having to fish these out of a sea of blue links will actually make citations less convenient, not more. This is the same reason why people have suggested delinking the doi/pmed/etc labels. Citations need fewer links, not more. Kaldari (talk) 17:52, 30 August 2011 (UTC)
 * These are not mass linked, and are added only if the article already follows the style of wiki-linking work/publisher. If it does not already, then no links are added. "99% of the time, the only link you want to follow from a citation is the link to the cited work itself." is only one of the style/format opinions and the bot will respect that and won't add links if the article doesn't use them. — HELL KNOWZ  ▎TALK 17:58, 30 August 2011 (UTC)

I noticed that it just does wikilinking, but perhaps it could also fix capitalization? For instance, in [ this edit], Gamespy should also be changed to GameSpy to avoid a redirect. --Odie5533 (talk) 20:01, 5 September 2011 (UTC)
 * Good point. I leave redirects unbypassed on purpose, but I will fix obvious spelling mistakes like that. — HELL KNOWZ  ▎TALK 20:08, 5 September 2011 (UTC)
 * Actually, that said, I might as well fix these errors when not wikilinking, since I am relying on a hand-made list. I'll put it in another BRFA later though to not mix these together. Still some issues with this one I need to code. — HELL KNOWZ  ▎TALK 20:19, 5 September 2011 (UTC)


 * Im going to have to oppose this bot, WP:OVERLINK is a a good read. Creating half a million pointless links just makes it harder to to work with. ΔT <sup style="color:darkred;">The only constant 21:00, 5 September 2011 (UTC)
 * Millions? My edit rate was 1/500, which may grow to, say, 1/100 if I add a lot of recognized entities. This is potentially at most 30k pages from 3M articles, of which hardly all use citations and I could hardly ever make an exhaustive entity list. I don't think this will in near future edit more that a few thousand in months. There is no overlinking, because these links are only added where they are already used and these are not links to common terms. 4 characters per link hardly make it harder to work with compared to, say, persondata metadata. — HELL KNOWZ  ▎TALK 07:37, 9 September 2011 (UTC)

In science articles linking the publisher in a citation is important, imo, because it gives the mildly informed reader some inkling of credibility. I think overlinking should be watched though, again, imo. The publisher should be wikilinked once in an articles, but should be wikilinked in far more science articles than is arleady the case. No other real opinion on the utility of this bot. --72.208.2.14 (talk) 03:55, 9 September 2011 (UTC)

There are too few edits and too much work maintaining entity lists for this to be truly worthwhile. Plus there is apparently already contention to this task, and I guess there will be more due to touchiness of CITEVAR or whatnot. It does great work on, say featured articles that got a new reference but which wasn't linked, but that happens once in a blue moon. — HELL KNOWZ  ▎TALK 09:54, 18 September 2011 (UTC)
 * The above discussion is preserved as an archive of the debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA.

Even if it's withdrawn; could you possibly publish the entity list that you were using/intending to use? —Sladen (talk) 19:02, 14 December 2011 (UTC)
 * It was very short as I didn't build it up for trial/testing. It's not in human readable form and about 15 items long (video gaming sites), so I don't think it will be of much help to you. It's mostly manually made lists of article title redirects that seem unambiguous. — HELL KNOWZ  ▎TALK 19:14, 14 December 2011 (UTC)