Wikipedia:Bots/Requests for approval/WildBot


 * The following discussion is an archived debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA. The result of the discussion was Symbol keep vote.svg Approved.

WildBot
Operator: Josh Parris

Automatic or Manually assisted: Automatic, unsupervised

Programming language(s): Python using pywikipedia

Source code available: Now available under User:WildBot

Function overview: The intention of this bot is to immediately bring to the page author's attention that the article is linking somewhere other than they thought it would be linking. It will add template dn after links to disambiguation pages in any new namespace 0, 6, 10 or 14 article. The bot will message the creator place a message on the talk page of any new namespace 0, 6, 10 or 14 article with ambiguous links. After ten minutes of article inactivity, it will add template dn after any links to disambiguation pages.

Links to relevant discussions (where appropriate): Wikipedia talk:Disambiguation pages with links

Edit period(s): Continuous

Estimated number of pages affected: There are approximately 1000 new pages a day. Approximately 10% of them have links to disambiguation pages, so about 100 edits/day.

Exclusion compliant (Y/N): Standard in pywikipedia; will add inuse, newpage and underconstruction compliance

Already has a bot flag (Y/N): N

Function details:

The members of Category:All disambiguation pages ( entries) and Category:Redirects from incomplete disambiguations ( entries) will be considered to be ambiguous links. Articles or redirects containing "(disambiguation)" will not be considered ambiguous.

The bot will operate off a cached copy of this list, updated periodically via an API call to categorymembers to retrive new additions, and periodic checks against its watchlist (containing all known disambiguation pages; assuming there's no technical limitation with a watchlist having pages) to check for removals. If I'm granted a toolsever account maintaining this list might be better done via SQL queries.

Periodically (I propose every minute) queries API for New Pages since the last query in namespaces 0 (mainspace), 6 (file), 10 (template) and 14 (category).
 * New redirects are excluded.
 * New disambiguation pages are excluded.
 * Each new page will be checked for any ambiguous links. If a page has ambiguous links, the creator will be notified a message will be left on that talk page.
 * After ten minutes of inactivity on the page, each ambiguous link will have the dn template added before the first following whitespace.

Affected pages will be monitored, the template changed or removed as the article changes.

Discussion
New Pages rather than Recent Changes was intentionally picked as a lower bandwidth and less ambitious initial proposal, one more easily monitored.

My intention is to monitor the bot and its effects; specifically, do the ambiguous entries get fixed? How many editors will just unlink the text, or remove the tag? Do editors welcome its help, or do they complain? Josh Parris 12:44, 26 December 2009 (UTC)

A few questions: -- Mr.Z-man 18:45, 26 December 2009 (UTC)
 * 1) Why is it necessary to use both the category and a watchlist? If you just need a list of pages and can't use an SQL database, a text file will likely be faster.
 * 2) *The category and a watchlist combination is intended as an alternative to querying the replicated database for all ambiguous article titles. It isn't my intention to store the results of the query in a database, but to query the replicated database for redirects to disambiguation pages.  The category is the definitive reference of what disambiguation pages exist (there are many disambiguation sub-categories and many more disambiguation templates) - it's a flat, hidden category containing every dab page.  Querying it is expensive - on my home connection it took over 30 minutes to bring the full contents down.  Naturally I've decided to cache it rather than periodically load it, and the cache needs to be brought up to date periodically, which would be cheaper to do than a full refresh; to do this, the watchlist is needed to check that pages currently in the category haven't fallen out.  The API call to categorymembers will reveal new additions to the category.  Currently I'm caching the results in a text file and don't see any reason to change that. Josh Parris 23:51, 26 December 2009 (UTC)
 * 3) Will it properly handle intentional links to disambig pages, like it dablink templates?
 * 4) *Hatnotes are normally created using templates - the bot won't recognise them as links because they're not in brackets . To intentionally link to a dab page, link to Pagename (disambiguation) - but non-template intentional links to dab pages are very unusual in mainspace articles. Josh Parris 23:51, 26 December 2009 (UTC)
 * 5) How much of a delay between article creation and templating will there be?
 * 6) *With the proposed one minute cycle time, I'd guess an average of just over 30 seconds. Josh Parris 23:51, 26 December 2009 (UTC)
 * People are going to complain if you tag articles just seconds after creation. People get mad when they get edit conflicts with people tagging their articles when they've just barely started writing. I don't see this as a particularly time-sensitive task. It should have a delay between article creation and tagging. I'd say at minimum an hour. I don't see why this needs to run every minute either. If it only makes 100 edits per day, then at 1 run per minute, it'll be doing nothing on 93% of runs. Mr.Z-man 03:47, 27 December 2009 (UTC)
 * One of the design criteria is to make editors immediately aware of unintended links; the time-sensitivity of it comes from having the original author available to disambiguate (granted, in general disambiguation isn't a time-sensitive task). I'm quite happy to change the frequency, but I'd prefer to do that with some data in hand rather than educated guesses.  A day's worth of edits will yield data we can use to make a decision informed by empirical evidence.  I'm not dogmatic about the frequency, and I intend to alter the design of the bot based on the reactions of editors. One thing I'm concerned about is "drive by" article creation, where an editor creates and article and leaves; I'm not sure what can be done about that. Josh Parris 04:37, 27 December 2009 (UTC)
 * Who set the design criteria? Why do editors need to be immediately told to fix such a trivial issue, especially in a way that could interfere with other editing? Also, you might be able to use an API query like this for your purposes. Mr.Z-man 19:20, 27 December 2009 (UTC)
 * I agree with Mr.Z-man strongly about this: "why do users need to be immediately told to fix such a trivial issue, especially in a way that interferes with their editing?" It doesn't matter if most of the editors don't return, the ones I don't want you to annoy are the ones who were going to return but you interfered. You do 100 such edits a day and annoy 2 editors each time, it takes you 50 days of editing to act in bad faith to 100 users. Not a deal, imo. --IP69.226.103.13 (talk) 19:33, 27 December 2009 (UTC)
 * Symbol dot dot dot.svg ] Comment: There is no need to watch-list the articles. A more efficient way is to use Special:RecentChangesLinked/Category:All disambiguation pages, which is up to the second display of the most recent changes to articles in the category: Category:All disambiguation pages. Im not sure what the API querey is for this, I may do some research and come back here later. DASHBot (talk) 20:36, 26 December 2009 (UTC)
 * P.S. When you publish your code I'd like to look it over. Thanks!DASHBot (talk) 20:36, 26 December 2009 (UTC)
 * I realized that Special:RecentChangesLinked only shows changes to the disambig pages themselves. That leads me to the question: Why is it that you need to use the watchlist? DASHBot (talk) 20:41, 26 December 2009 (UTC)
 * A dab page (or redirect to a dab page) can be moved (typically to Pagename (disambiguation)), or changed into not being a dab page through deletion or modification. To ensure the bot is not tagging links as ambiguous when they're not, all the pages the bot thinks are ambiguous need to be validated periodically.  So, when a dab page changes, it needs to be checked to see that it is still a dab page. See the above discussion as to why I'm not repeatedly pulling down the entire category. Josh Parris 23:51, 26 December 2009 (UTC)
 * Did your research show anything that could be used as an API call? I didn't find anything when I was drawing up my original plan, but if you did that would save maintaining a ridiculously large watchlist.  I'll ensure the sources are made available. Josh Parris 23:51, 26 December 2009 (UTC)
 * (Very sorry for editing with my bot account)No, but I did find some functions within pagegenerators.py that could do the trick. The first part would be to pull down all the pages using pagegenerators.GeneratorFactory.getCategoryGen somehow. Im not sure if that function includes redirects. Tim1357 (talk) 03:58, 27 December 2009 (UTC)
 * Fetching the contents of the category is not a problem,  handles that just fine.  It's maintaining it efficiently that's the tricky part. Josh Parris 04:37, 27 December 2009 (UTC)


 * What is the reasoning behind "immediately" adding a link that takes an editor, particularly a new editor, within seconds of an article's creation, to a page that is useless?


 * Why "immediately" do anything to an article freshly created since one likely outcome is an edit conflict between the editor trying to write the article and the bot? Go near any of my articles this way, forcing me to wait for a bot so until I can continue writing the aritlce, and I'll dump the article like it's a piece of garbage and leave you and your bot to write it. Are you going to write quantum astrophysics articles or insect species articles or could you just let me write them in more than one edit?


 * In my opinion this is just rude. I'm not the decider of things bot, though, so, why not find some community consensus in a broader audience for this idea of "immediately" tagging new articles with this particular dab alert? I can't see something this hostile going forward without a lot of community support about the need to interfere with writing new articles to alert users "immediately" that they need to disambiguate them. One of the village pumps is in order for this discussion. Also, the page this tag leads to is not usable. --IP69.226.103.13 (talk) 04:42, 27 December 2009 (UTC)


 * You're saying that WikiProject Disambiguation/Fixing links, which is what the "disambiguation needed" text leads to, is not usable?
 * You're right, it's horrible - not novice friendly. I'm having a stab at fixing it and will get others to look it over. Josh Parris 05:44, 27 December 2009 (UTC)
 * See, that was easy, I didn't have to explain anything, you just had to try to read it. Sorry to put you through that. --IP69.226.103.13 (talk) 19:29, 27 December 2009 (UTC)
 * I'm happy to take this to Village pump (proposals) if you think that's appropriate; I thought the discussion at Wikipedia talk:Disambiguation pages with links was sufficiently one-sided that I brought the proposal here, but I can see why others less interested in the disambiguation process could feel this is intrusive. Josh Parris 05:00, 27 December 2009 (UTC)
 * Having had a few more minutes to think about this, I'd like to draw an analogy between this and a squiggly red line under a misspelled word in a word processor. I don't think the line is the computer being rude, it's the computer trying to be helpful. The editor thought the link would go to a useful article, but there's a dab page between the new article and the intended destination. Josh Parris 05:08, 27 December 2009 (UTC)
 * ...but the red line does not hinder you ability to save the draft. Perhaps a wait period of 5 minutes from last edit?Tim1357 (talk) 05:20, 27 December 2009 (UTC)
 * Hmmmm, check out the data below. There appears to be a systemic problem here if editing immediately causes issues. Josh Parris 07:09, 27 December 2009 (UTC)
 * I've looked at the last 50 new articles created, and 10% of them had subsequent edits. The first of these after creation was typically 2 or less minutes later, one was 7 minutes.  I don't think that data is representative, so I'm currently checking another block of 50. Josh Parris 06:29, 27 December 2009 (UTC)
 * The second set of data was more in line with my expectations, it is from a few hours earlier. 48% had a single edit (creation). 14% were edited twice in total. 22% were edited with a minute of creation, 14% by someone other than the creator, 8% by the creator. 22% were edited by the creator within 5 minutes of creation. Basically, rapid editing of new pages is the norm, I think this is down to the WP:New Page Patrol and their use of automated tools for page evaluation. Quoting:
 * Especially if the new article has a newpage, inuse, or underconstruction template showing, care should be taken to ensure that the author has finished the initial version before you evaluate the page. A good rule of thumb is to wait until at least 15 minutes after the last edit before tagging the article (or up to an hour for the newpage tag). Additionally, it may be helpful to check the editor history to be sure that you don't offend an experienced editor who has a set plan to create a valid article.
 * I had already planned on honouring inuse, but I'll add these other templates: newpage and underconstruction. If I honor these tags, and the guidelines for NPP don't suggest any other amnesty (at least, from my reading of the NPP behaviour, it seems if it's not tagged then it's open season from the moment it's created), ought I take this bot to the village pump? IP69.226.103.13?
 * "Open season" does, imo, express the attitude of a lot of new page patrollers towards article writers. How are you going to honor experienced editors or new editors who simply take a little more time to write articles? And, who is going to be disambiguating all of these links when the disambiguation pages on wikipedia are almost impossible to read? That is, the ones most needing disambiguation are simply unreadable lists of vaguely related topics thrown together on a page with no information whatsoever to take readers to the page they want to go to? So, the tag will simply be a tag cluttering up the article, readers will be clueless about it, and editors will be annoyed with it?
 * Yes, if you're going to do anything that antagonizes contributors who write articles you should discuss it first with the broader wikipedia community. --IP69.226.103.13 (talk) 19:13, 27 December 2009 (UTC)


 * The more I read about how this proposed bot would work, the less I like it. I share many of the concerns raised by others here and on other talk pages.  In my opinion, new articles are not the problem:  generally they are short, have few links, are orphans, and even when not orphans they get hardly any page views.  If they get many page views soon they also get many more edits including better linking.  Fixing links to dab pages is one of the last tasks I do when writing a new article.  --Una Smith (talk) 21:22, 27 December 2009 (UTC)
 * But it might be useful on established articles, imo. There is an editor who does nothing but dabs. I try to make sure I disambiguate next time after seeing what she does to my articles, and I try to check the links in my new articles for what needs disambiguated. But I'm sloppy at it. I wouldn't mind articles I've created getting run through a bot or user app that showed me what needed disambiguated. --IP69.226.103.13 (talk) 21:33, 27 December 2009 (UTC)
 * Sure. As I said before on Wikipedia talk:Disambiguation pages with links, I would rather be able to put a tag on a dab page, or add a dab page to a list somewhere, causing a bot to tag dn on the incoming links to the dab page.  That could be very useful.  --Una Smith (talk) 21:37, 27 December 2009 (UTC)
 * See this tool. --Una Smith (talk) 21:39, 27 December 2009 (UTC)
 * Thanks. --IP69.226.103.13 (talk) 21:51, 27 December 2009 (UTC)
 * Based on this discussion I'm going to make an alternative proposal. I'll be back within the day with details. Josh Parris 23:56, 27 December 2009 (UTC)
 * I understand your desire to address the core of the disambiguation problem; I do too. But I want to tackle something with a high probability of success and learn from any failings before moving to a higher-value area (with associated higher risks of looking a fool). Besides, 10% of new articles have ambiguous links, so this isn't a low-value proposition.  Josh Parris 09:18, 28 December 2009 (UTC)
 * I don't quite understand what you're saying. Are you suggesting that a page like Broadway (disambiguation) is not useful for disambiguation (in which case you need to take your complaints to WP:WikiProject Disambiguation); or that there are disambiguation pages so bad that it is preferable for readers to follow links to them than have editors to try to pick a disambiguated link from them; or some other meaning? Josh Parris 09:18, 28 December 2009 (UTC)
 * I've tested the last 500 new articles created, and found 10.6% contain links to disambiguation pages; I have not tested for links to redirects to disambiguation pages (which would push the rate higher). There was a  run of articles created by the same person with the same ambiguous link (Special:Contributions/Starzynka created Chantecler (play) (and others) which contains Broadway).

New proposal
I have altered my proposal.

The philosophy behind the bot is that the original creator of an ambiguous link is best placed to know what they intended to link to, and that editors do not intentionally create ambiguous links. New pages is a unique environment in that it is simply identified who inserted each and every ambiguous link in an article - the author, and that the volume of new pages is manageable for the bot's operator to monitor while it is in it's infancy.

The insurmountable problem I see with my original plan is objections due to edit conflicts created by rapid tagging. I have changed this proposal to a bot that tags after a period of inactivity on the article - ten minutes, for example. However, I propose to have the bot leave a message on the creator's article's talk page:
 * It seems there are inadvertantly ambiguous links in the article you recently created, Jose Tovar - links to page titles that can mean more than one thing. To find out which links are ambiguous, please visit http://toolserver.org/~dispenser/cgi-bin/dablinks.py?page=Jose_Tovar  To find out how easy it is to fix these links, please read WikiProject Disambiguation/Fixing links. If you don't want to receive messages like this, put  on your User or User talk page. Thanks for making Wikipedia a less ambiguous place! WildBot 09:18, 28 December 2009 (UTC)

as soon as the ambiguous links are found, providing immediate feedback but without creating edit conflicts.

Naturally, the message text is an opening point for discussion. I imagine it needs to be not BITEY, and from IP69.226.103.13 comments, they will want two different messages for newbies and old hands (whatever that distinction may be). The bot may also need to monitor toolserver lag, and if it grows too high to stop messaging editors - because the toolserver report is run against a replicated copy of the live database, and if it's too far behind following the provided link will give no information about ambiguous links. Josh Parris 09:18, 28 December 2009 (UTC)


 * I think that the toolserver bit is a little overkill. Also the message needs to be worked on. Ill start one at User:WildBot/msg. Tim1357 (talk) 16:46, 28 December 2009 (UTC)
 * Checking toolserver will be less of an issue if this goes on the article talk page and the bot enumerates all the ambiguous links; toolserver in this case would act as a backup/confirmation. Josh Parris 04:49, 29 December 2009 (UTC)


 * When I am working on an article I find it disruptive to get edits on my talk page. In any case, I would much rather see comments about the article on the article's talk page.  Focus more on content, less on who did it. --Una Smith (talk) 18:48, 28 December 2009 (UTC)
 * The article's talk page is the place to put notices about the article, not on an editor's talk page. I strongly disagree with bots posting any notices on user talk pages without the strongest proactive community support. All discussions about an article should be taking place on its talk page, no where else, as a courtesy to the mission of wikipedia: a community created encyclopedia. The community interested in the article has a right to think it is being discussed on the talk page not on some user's talk page. --IP69.226.103.13 (talk) 19:07, 28 December 2009 (UTC)
 * I would not object to a bot that scans new articles for links to dabs, and puts a note on the article's talk page, with a link to the dablinks tool so any editor who responds to the note can see the current situation. The note would be something like this:  WildBot found one or more links from this article to disambiguation pages;  for help fixing these links please see WikiProject Disambiguation/Fixing links. For use on a talk page, BASEPAGENAME could be passed to dablinks.  --Una Smith (talk) 20:46, 28 December 2009 (UTC)
 * This looks useful. --IP69.226.103.13 (talk) 21:36, 28 December 2009 (UTC)
 * I agree, the talk page is a much more appropriate place, even if it is less likely to garner the attention of the contributing editor. I've modified the proposal to reflect this. Josh Parris 04:11, 29 December 2009 (UTC)


 * How's this? An upside of using a template is that it will be easy to find and remove/update if necessary. Josh Parris 10:23, 29 December 2009 (UTC)
 * I like this. In fact, I would like to use it on, say, all articles nominated to appear on the Main Page:  nominations for Today's featured article, Did you know, In the news, On this day...  --Una Smith (talk) 16:31, 29 December 2009 (UTC)
 * Okay, we can arrange for the bot to wikistalk these areas in a later Task. Josh Parris 16:41, 29 December 2009 (UTC)


 * I question the premise that editors who add an ambiguous link are in the best position to fix that link. I have disambiguated tens of thousands of incoming links to dab pages, and probably have used dn no more than 50 times.  In my experience, if the intended meaning is not obvious from the context, then the linked text requires checking reliable sources and therefore should be followed by a citation.  Finding reliable sources and formatting citations are tasks beyond the abilities of many editors. --Una Smith (talk) 18:48, 28 December 2009 (UTC)
 * Here is an example. An experienced editor significantly expanded Mary Rose.  After that, the dablinks tool showed 22 links to dabs.  The experienced editor was able to fix 20 of the 22 links.  The two that remained were links where the editor knew what he intended but did not find a corresponding article.  In the course of fixing those two links, another article was added to each of the dab pages.  See Talk:Mary Rose.  --Una Smith (talk) 20:32, 28 December 2009 (UTC)
 * I've read the recent history of the article, talk page and the dab pages involved. I'm concerned that an experienced editor didn't feel confident changing dab pages to include relevant links, I guess I've never held dab pages in much reverence (and judging by the amount of vandalism and unhelpful edits I see them receive, some others don't either). I understand now why you say that disambiguators may be better suited to disambiguation.  In your example, however, the editor did manage to disambiguate 90% of the links even with missing terms; they would have done a better job that a generalist because of their expert knowledge (for example, the Portsmouth fix required changing the original article).  The author knows what they were intending to communicate; a disambiguator can only guess.
 * Having said that, I have in the past created ambiguous links when creating content. Had I known at the time, I would have gone back and fixed them. Ignorance is my defense. Josh Parris 10:23, 29 December 2009 (UTC)
 * Disambiguators should not ever guess; if they do not know, and cannot or do not want to look it up, they should use dn.  That is why the contributor of a dablink often does not have much advantage.  If an editor needs special knowledge to fix a dablink, then it is likely the dablink constitutes original research.  When I fix dablinks I often do rewrite the article in the vicinity of the link;  expertise is not required there either. --Una Smith (talk) 16:21, 29 December 2009 (UTC)


 * It is my impression that the majority if links to dabs are created not by careless new editors but by careful experienced editors removing articles from ambiguous base names. That is the source of most links repaired by participants of Disambiguation pages with links. --Una Smith (talk) 18:48, 28 December 2009 (UTC)
 * That's something that DPL is good at, fixing those kinds of suddenly ambiguous links; I participate in that project, have done a lot of disambiguation in my time, having started doing so the same month I arrived. There are other contributors to the backlog of links to dab pages, and a very small subset is new articles - which is why I choose it as a starting point.  I'm not implying this will end the black war against ambiguous links.  It's an opportunity to build a tool that can used as a stepping stone in building broader reaching tools; a tool that will make mistakes, mistakes that will have less of an impact than a more powerful tool because of its intentionally limited scope, the lessons from which can be applied to subsequent tools with broader scope. This analysis of the effects of this bot and editor reactions to its actions is something you yourself suggested, and it is needed to inform any changes to it and any bots that come after it. Josh Parris 04:49, 29 December 2009 (UTC)
 * It seems to me that to test the reception to this bot, the bot itself isn't actually necessary; just go around tagging dablinks in articles and see what happens.  --Una Smith (talk) 05:07, 29 December 2009 (UTC)
 * Well, that's an idea. --IP69.226.103.13 (talk) 05:15, 29 December 2009 (UTC)
 * I'm certainly leaning towards doing that, but the monitoring and detection is faster and cheaper done by a bot; this is the venue where bots get approved. Humans can't do it fast enough; it'd be ten minutes after article creation before you could form the appropriate message and confirm the links, and that's ten minutes you could have spent doing something else.  Besides which, there's still the various matters of the text of the message and what template to tag with. Josh Parris 05:26, 29 December 2009 (UTC)

Use of dn
It's been pointed out elsewhere that dn might not be the most appropriate template to add after the ambiguous links, as it is intended for links that have perplexed humans, not robots. Ought another template be used, perhaps a new one? Josh Parris 04:55, 29 December 2009 (UTC)
 * Template:Dn/doc has a list of redirects to the template. You could turn one redirect into a separate tag that does not put the article in Category:Articles with links needing disambiguation.  --Una Smith (talk) 16:02, 29 December 2009 (UTC)

Just a few hours ago I thought the current plan was to put a notice on the article talk page, drawing attention to the dablinks tool. Does the current plan still involve tagging dablinks in the article? --Una Smith (talk) 05:20, 29 December 2009 (UTC)
 * It does if after ten minutes of inactivity there are still ambiguous links. Do you think that's a fundamental flaw? Josh Parris 05:41, 29 December 2009 (UTC)
 * I think tagging dablinks in the article would be far more annoying than helpful, for both readers and contributors. Also, I would abandon the premise that it matters who made the dablink;  if it is abandoned then there is no need to tag dablinks in the article so soon.  If tagging dablinks is a good thing, then it could be done just once a day, or when the article has gone without editing for more than a few hours.  --Una Smith (talk) 16:02, 29 December 2009 (UTC)

Having had some time to think about this, I'm less and less in love with it. There will be people. They will have pitchforks. I think I'll strike it from the proposal. Josh Parris 16:05, 29 December 2009 (UTC)

Final tweaks
I think I've satisfied everyone, but I might be jumping the gun. As it stands, are there any concerns with this proposal? Josh Parris 16:44, 29 December 2009 (UTC)
 * Please describe the proposal as it now stands. Not the rationale, just what the bot would do.  --Una Smith (talk) 16:47, 29 December 2009 (UTC)
 * The altered description is above. It reads:
 * The bot will place a message on the talk page of any new namespace 0, 6, 10 or 14 article with ambiguous links.
 * The discussion has settled on a message that looks like:


 * Josh Parris 16:56, 29 December 2009 (UTC)
 * Hm. In this case the notice should not send readers to WikiProject Disambiguation/Fixing links, because that page presumes the use of dn tags.  --Una Smith (talk) 17:41, 29 December 2009 (UTC)
 * True. When I wake up in the morning, I'll fiddle this. It will need to direct to a new page, which needs to talk about finding and then fixing. I think there was some blurb at the start of Disambiguation_pages_with_links/Guide that could be ripped off. Josh Parris 17:51, 29 December 2009 (UTC)
 * Yes, we need a page like that anyway. I was just thinking it may be time for a makeover of WikiProject Disambiguation.  --Una Smith (talk) 17:52, 29 December 2009 (UTC)
 * I had an idea. Why not query dablinks to generate the list of disambiguation pages? Theres no need to re-invent the wheel. — Preceding unsigned comment added by Tim1357 (talk • contribs)
 * To do that I'd need to use Beautiful Soup, and hope that the html format of the toolserver app didn't change. Also, as the pace for the bot quickens (I'm hoping one day to hook it up to recent changes) I don't know if I want to be pumping toolserver that hard.  Almost all of the functionality I've built for this bot can be re-used in other tools, for example one that does human-assisted disambiguation of all the links on a page at once, or one that puts all the redirects to dab pages into a category, or one that stalks high-value wiki pages. Josh Parris 18:01, 29 December 2009 (UTC)

You don't need BeautifulSoup to query dablinks, just some regex. However, at the very least you should look at the dablinks code. It is written in python. Ill do some research and see how Dispenser gets over the caching problem. Tim1357 (talk) 05:51, 30 December 2009 (UTC)
 * At a guess, it's SQL queries rather than MediaWiki API calls. That's how I'd do it. Josh Parris 11:14, 30 December 2009 (UTC)

There's still a win when question, and, the task has changed so much, is there still consensus for it? --IP69.226.103.13 (talk) 17:01, 30 December 2009 (UTC)
 * I'm not sure. Perhaps the notice should include the instruction to delete the notice once the links are fixed.  Or the bot should put the notice in a new section and instruct readers to tag the section done.  If something like this is not done, future editors may continue responding to the notice even when the article has no dablinks.  Also, I am thinking the bot code needs to include a threshold variable, so that it can be tuned.  Is it really useful to post a notice about 1 dablink?  Or should the threshold be 10 or 20 dablinks, or 10% of links?  --Una Smith (talk) 17:35, 30 December 2009 (UTC)
 * Personally, I'd want to know when I create an ambiguous link, so I don't think any minimum threshold ought to apply. I'll add a bot boilertext and remove instructions; as a later task I plan to add monitoring so the bot removes it automatically. Josh Parris 00:33, 31 December 2009 (UTC)
 * That you personally want to know something is not a task that community bots are created for. Can we stick to finding the community consensus? -- IP69.226.103.13 19:15, 31 December 2009 (UTC)
 * I came here with one. Perhaps we should roll back to the original proposal as supported by consensus? Josh Parris 00:48, 1 January 2010 (UTC)
 * There is a tool that does this, by coloring the links an editor sees in preview mode. See the discussion here.  --Una Smith (talk) 22:01, 31 December 2009 (UTC)
 * As it's still in beta, I don't think recommending it to users in the template box is a good idea yet. Josh Parris 00:48, 1 January 2010 (UTC)
 * Perhaps you would like to use it yourself. Many editors, and I think that includes the majority of new editors, do not know or care about ambiguous links.  --Una Smith (talk) 04:00, 1 January 2010 (UTC)


 * Once I've satisfied everyone here, I'll take it back to the disambiguation project for those that aren't watching this; if they're also on board I'll ask for a BAG member to close this. Josh Parris 00:33, 31 December 2009 (UTC)

Current proposed message template:

This includes a link to WikiProject Disambiguation/Fixing a page Josh Parris 01:22, 31 December 2009 (UTC)

I don't see a timing, and I don't see community consensus on this. The bot operator is suggesting no delay because that's what he wants, but wikipedia bots aren't for personal editing preferences. -- IP69.226.103.13 19:15, 31 December 2009 (UTC)
 * I agree. I personally do not want to be bothered about ambiguous links until I am nearly done working on the article.  Most articles get few page views and the odds are small that someone who reads the page will follow any one link out, so repairing ambiguous links is a low priority task.  It is important, but not urgent.  Also, I think it is a far better use of time to do semi-automated repair of many incoming links to dab pages, rather than small lots.  --Una Smith (talk) 22:01, 31 December 2009 (UTC)
 * Avoiding bothering editors is why this proposal was modified from changing the article and messaging the author to making a note on the talk page. The note on the talk page gives a one-click tool to detect dablinks on the page.  How does the timing of a note on the talk page reduce the editorial impact?  If the notification delay is an hour later, that may be in the middle of editing their next article, a day later an article ten articles down the track, a week later during fixing their user page.
 * It's more efficient to have street sweepers cleaning litter from the street, but as a society we prefer that the litter doesn't get there in the first place. If you tell someone they've dropped something, they go pick it up; people aren't inherently thoughtless, giving them information about their actions makes them aware of things they may not have noticed. Josh Parris 00:48, 1 January 2010 (UTC)
 * I point you to the consensus I originally brought to this proposal. Would you like another? Josh Parris 00:48, 1 January 2010 (UTC)
 * I think hitting pages with only one or two dabs is overkill.   Randall Bart    Talk   01:55, 1 January 2010 (UTC)
 * Is your concern that the talk page will become cluttered with templates that don't get removed? Josh Parris 08:53, 1 January 2010 (UTC)
 * That consensus does not include all of this information. How about seeking consensus at one of the village pumps for the final proposal and link that discussion to the original "consensus" discussion and, of course, this location. The discussion can be had here or there. -- IP69.226.103.13 |  Talk about me.  17:30, 1 January 2010 (UTC)
 * Notification messages placed. Josh Parris 03:03, 3 January 2010 (UTC)

I intend to file a subsequent Bot Request for Approval for a cleanup task that would remove this notice from any page the bot had tagged. Josh Parris 00:54, 3 January 2010 (UTC)
 * If it is easier to role that into this task as part of the function, that is fine, we aren't tied up on protocol and forms here.  MBisanz  talk 01:34, 3 January 2010 (UTC)
 * Remove it when? I don't quite follow, you mean once the action has been all done, then the bot removes the notice? If it's directly related to this same task, just removing the notice, it seems that it is appropriate to just add it to this RFBA without the need to file new. I agree with MBisanz on this. -- IP69.226.103.13 |  Talk about me.  04:56, 3 January 2010 (UTC)
 * The proposal has been extended to include this. Josh Parris 07:39, 5 January 2010 (UTC)

Source code is now available (see in the proposal). Dry-run testing has been undertaken. Josh Parris 07:39, 5 January 2010 (UTC)

Consensus

 * I would advice against this since it gives editors the impression that they did something bad, and distracts from the writing. It also makes needless clutter. Sure, it would be nice if all new articles were perfect from the beginning, but we must not forget that people have different priorities. Let those who are great writers write, and we can take care of the links later. --Apoc2400 (talk) 15:47, 5 January 2010 (UTC)
 * You find the wording of the message box accusatory, and prefer a clean talk page. Do you have an alternative suggestion for the message box? If the talk page of the article is the wrong place to mention ambiguous links, can you suggest a more appropriate place? Josh Parris 16:03, 5 January 2010 (UTC)
 * I don't think the message should be put anywhere. It is easy enough to find links to disambiguation pages anyway. Disambiguating links is a task often best done by those who specialize in it. --Apoc2400 (talk) 18:41, 5 January 2010 (UTC)
 * When we started keeping stats on June 1, 2009, there were 1,355,714 links to disambig pages. After a really good year for the disambig wiki project, that number is down to 1,110,627 links as of January 6. We need help if we want to get this under control. -- Ja Ga  talk  18:11, 6 January 2010 (UTC)
 * I would suggest it's not easy to find links to disambiguation pages, as about 10% of new articles have ambiguous links in them. The tools for identifying mechanical problems with a page (of which links to dabs are one) aren't easily found.  I'd really prefer not to specialize in disambiguating, I'd rather write articles and contribute content, but I find there's too many ambiguous links for me to ignore.  Somebody has to disambiguate, surely it's best that those closest to the pages with ambiguous links on them are alerted to the ambiguous links; they're not forced to fix them, the talk page just has a note identifying the links. Josh Parris 03:03, 7 January 2010 (UTC)
 * The more I think about this bot, the less I like it. Eg, recently I disambiguated links to Los Alamos;  some links are from articles about television soaps.  Editors who wrote those episode synopses may have no idea which Los Alamos, and it could be a fictional Los Alamos.  Also, editors who write from reliable sources may not have enough knowledge to know the intended meaning.  Presumably, the author of the reliable source knew, but maybe not.  This is a very common problem with common names of plants.  I would like a bot like this that editors could use by templating the talk page.  Say I fix most dablinks on an article except a few I cannot work out.  Then I could use this bot to recruit help fixing the remaining dablinks.  --Una Smith (talk) 17:38, 5 January 2010 (UTC)
 * The linked instructions on disambiguating (WP:WikiProject Disambiguation/Fixing a page) are targeted at inexperienced users; the instructions state quite clearly to not fix anything you're not sure about. Feel free to make any alterations necessary to make this more explicit. Josh Parris 02:42, 7 January 2010 (UTC)
 * Considering this is a very non-obtrusive bot (only adds the template to the talk page) that has a very narrow scope (only the newly created articles that have dab links), I don't see anything wrong with trying it out. A lot of new editors have no idea they're creating dab links, so this would raise awareness without uglifying articles or biting new editors. Personally, I preferred adding the dn template to new articles, but this is a safer approach that averts controversy. -- Ja Ga  talk 06:13, 6 January 2010 (UTC)
 * It was correctly pointed out that that's not what dn is for, but in turn it's suggested an alternative use for this bot that I'll raise on WP:DPL] Josh Parris 03:05, 7 January 2010 (UTC)
 * I also think this bot is worth trying out. First of all, I strongly disagree with the assertion that disambiguation is better done by those who specialize in it rather than the creator of an article. I do a lot of disambiguation and I've started about 50 articles. In my experience, the one who created an ambiguous link have a huge comparative advantage in speedily determining the correct link than an editor who disambiguates. I think this bot could result in large savings in editor-hours by bringing ambiguous links to the attention of those in the best position to fix them. Second, I agree that the proposal is a relatively non-obtrusive approach. It's not forcing page-starters to do anything. They can decide to fix their ambiguous links or they can ignore the Talk page tag and go along on their merry. Frankly, I think lots of mundane, firmly accepted practices have a greater potential to offend sensitive editors that the activities of this bot. It seems like a POV tag or an advertising tag or a WikiProject Talk page assessment deemed an article to be low importance (or even just having another editor change your article content) would be more likely to offend than a Talk page note that there are ambiguous links. --JamesAM (talk) 04:16, 7 January 2010 (UTC)


 * I agree with the above. This bot could greatly help users find the right article and those involved with disambiguation can find it very difficult to work out which is the right link. Boleyn (talk) 19:25, 7 January 2010 (UTC)
 * I don't understand how the bot can help the user find the right article? I think on talk pages, it might be okay. I don't like the timing, though, immediate, or has that been changed. I think the tags on talk pages could be useful. I would go and disambiguate all articles I've created if they were tagged like this on their talk page. It seems courteous, in fact. -- IP69.226.103.13 |  Talk about me.  05:35, 9 January 2010 (UTC)

Trial
BAGAssistanceNeeded It seems the discussion has died down. Is a trial appropriate? Josh Parris 00:32, 12 January 2010 (UTC)
 * @harej 14:47, 12 January 2010 (UTC)


 * Thank you, Josh Parris, for again completely ignoring my question. You want I should repost my questions? How many times? I think the bot operator is hostile to Wikipedia editors, making it a bad bot. That's my opinion. -- IP69.226.103.13 |  Talk about me.  21:59, 12 January 2010 (UTC)
 * I really doubt his silence is a sign of hostility, especially considering he apparently tried to explain it to you once. I also don't know the context behind this debacle you're having with him. I'll let Parris know about your concern. @harej 23:14, 12 January 2010 (UTC)


 * Forgive me, but having read the entire page again, I still can't find any unanswered questions. Which question are you referring to? Josh Parris 07:24, 13 January 2010 (UTC)

Bot is running; I've programmed it to skip articles with an odd number of characters, so that stats on articles with and without the tag can be generated. This means it will leave a note on about half of the candidate articles. Josh Parris 11:49, 13 January 2010 (UTC)

Looking at the bot's edits, there are some highlights. WildBot removed the notice when there were no ambiguous links left. Wildbot adjusted the notice when the list of links changed. An experienced editor got one of the first notices, and when queried if it was helpful said "Yes, it kept me on my toes. I fixed the problem it addressed." which I choose to interpret as a ringing endorsement. Same editor beat the bot to removing the talk notice (the bot correctly didn't remove it twice). The bot has chalked up it's first two deleted edits too. In a couple of day's time I intend to look at some statistics regarding disambiguation of the half of the articles the bot acted on and the half it didn't act on, to see if the bot is actually making a difference in the world. But for now: bed beckons. Josh Parris 16:10, 13 January 2010 (UTC)

For future reference, here is a link to the edits during the trial. Josh Parris 02:24, 14 January 2010 (UTC)

Raw data
The figures at 36hrs are unchanged from 18hrs.

At 72hrs, something happened: another deletion. That requires a whole new table:

Analysis
This is a very small sample, but things certainly seem to be leaning in the right direction. A sample as small as this ought to be regarded more as anecdote than data. Data collection was a pain given the number of new articles deleted and moved early in their life. I excluded articles deleted by the 18 hr sample point. Whether an article was tagged or left as a control was based on odd-or-even article-title length.

Getting on towards half of the articles tagged have their ambiguous links cleared out within 18 hours versus nearly none for untagged articles. There was one article with 18 ambiguous links in the sample set and that really threw the statistics; even so, the percentage of links cleaned up is greater than without tagging.

Conclusion: people fix stuff when they know it's broken. Josh Parris 12:37, 14 January 2010 (UTC)


 * In spite of the lack of communication of the bot operator, I think the templates are very useful. They list the ambiguous terms, they're on the talk page. I think it's a good idea. I think the bot operator should be more responsive, in general, to answering questions when they're asked, and to answering the question asked, not trying to mindread the asker, while operating the bot. -- IP69.226.103.13 |  Talk about me.  17:36, 14 January 2010 (UTC)


 * It would be lovely if, rather than repeatedly asserting that I don't answer questions, you would ask a question. I am at a loss to explain your combative attitude towards me, and ignorant of how to repair my evidently poor standing in your eyes. Josh Parris 00:01, 15 January 2010 (UTC)

The figures at T+36 hours are unchanged from T+18 hours, which I interpret as: if the author is going to notice (and act on) the talk page entry, they're going to do it early rather than late. I think I should look at article changes within the hour of tagging. Josh Parris 05:57, 15 January 2010 (UTC)


 * Very cool, seems helpful from the comments on your talk page. As long as it uses wikipedia's resources efficiently, it looks good to me. Tim1357 (talk) 16:55, 16 January 2010 (UTC)
 * One very minor thing: the API cuts you off when you have a long summary. See here for an example. Maybe have the summary be "Found ambiguous links to n pages" when there are more then 4 or 5 Tim1357 (talk) 19:56, 16 January 2010 (UTC)
 * I had considered that, but feel that even a truncated list is more informative than a summary; both mean "there's a lot of work", one gives examples of/links to dab pages, the other gives a number.  Josh Parris 21:35, 16 January 2010 (UTC)

The figures at T+72 hours are mostly unchanged from T+18 hours. Analysis is unchanged. Josh Parris 21:35, 16 January 2010 (UTC) One thing to note: the two articles that had their links removed were repaired by editors who came to add WikiProject templates onto the talk page; this is in spite of one of the articles having the original author still active on the page. Conclusion: sometimes even the talk page is not enough. Josh Parris 21:54, 16 January 2010 (UTC)

In a later task, I intend to add to the existing tags on the article itself, to add to the tag-bombing if the talk-page note isn't acted on in a timely way (which figures are currently indicating means it won't be acted on at all). Josh Parris 21:35, 16 January 2010 (UTC)
 * @harej 22:08, 16 January 2010 (UTC)


 * The above discussion is preserved as an archive of the debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA.