Wikipedia:Bots/Requests for approval/anybot


 * The following discussion is an archived debate. Please do not modify it. Subsequent comments should be made in a new section. The result of the discussion was Symbol no support vote.svg Revoked.

After discussion at Wikipedia talk:Bots/Requests for approval/Archive 4, this bot approval has been revoked. User:Smith609 is free to submit future requests for approval, for this bot or for other bots; however: Note that these restrictions were suggested by Martin himself, and are generally a good idea for any bot op. Please direct discussion to WT:BRFA. Anomie⚔ 03:04, 27 June 2009 (UTC)
 * 1) The source code should be provided for BAG review.
 * 2) Any bots that can be activated in a publicly-accessible manner (e.g. visiting a web page) must be restricted so only User:Smith609 may activate them (e.g. password protection and/or source IP restriction).
 * 3) Tasks creating content must be reviewed by editors experienced in the field before approval, and should be periodically reviewed while the task is running.


 * The following discussion is an archived debate. Please do not modify it. Subsequent comments should be made in a new section. The result of the discussion was Symbol keep vote.svg Approved.

anybot
Operator: Martin  (Smith609 – Talk) 

Automatic or Manually Assisted: Automatic

Programming Language(s): PHP

Function Summary: Creates and maintains stubs on botanical taxa

Edit period(s) (e.g. Continuous, daily, one time run): One-off to create stubs, will be run again if updates, e.g. to taxonomy, are requested.

Already has a bot flag (Y/N): New bot

Function Details: The bot will create a short stub with taxobox on any organism where reliable taxonomic data can be acquired, much as User:Polbot did for the animals and plants. This database is an example of a good place for the bot to start. Of course, it will only acquire data from sites that permit automated access. I for one will find it very useful to be able to find the affinities and synonyms of a given genus without the excessive googling it currently takes! When revisions make reclassifications necessary, the bot will also amend taxoboxes appropriately, on request.

Discussion
The last several proposed bots that sought to create stubs have generated controversy. Approximately how many stubs would be created and what is the plan for maintaining the stubs going forward? --MZMcBride (talk) 20:12, 25 July 2008 (UTC)


 * My personal interest is in bringing the coverage of algae up to speed; as a conservative estimate I think there are in the order of thousands to tens of thousands of genera covered in algaebase, a hopeful looking database. Of course, many of these may already possess articles, but I suspect that around 90% won't (from my experience).  This doesn't include synonyms, for which redirects would be created.  Martin  (Smith609 – Talk)  20:48, 25 July 2008 (UTC)

I think this would be great, and I'd be willing to help with it if you like. (I ran Polbot's plant- and animal-creating functions.) A few comments: Good luck with this! – Quadell (talk) 23:13, 25 July 2008 (UTC)
 * Be careful not to copy full sentences from your sources, since that can lead to copyright problems. So long as raw data can be extracted and recreated into sentences, you're in the clear. But as tempting as it may be to copy something like "a denitrifying species with genetically diverse isolates from activated sludge" directly from the source, don't do it.
 * Polbot created tens of thousands of plant and animal species stubs. None were deleted, and one was expanded to become a featured article. There were complaints, but these could be divided into 2 categories:
 * Illogical complaints that the creation of these stubs was "clogging up New Pages" (as if we should refrain from creating articles so as to keep New Pages pure), or that this bot was "stealing thunder" from human editors (who hadn't yet created these articles in Wikipedia's seven-year history). Not much you can do about these except be polite.
 * Valid complaints from WikiProjects (such as WikiProject Birds) that I hadn't worked closely enough with them. It would benefit you to get to know WikiProject Microbiology, WikiProject Tree of life, WikiProject Prokaryotes and protists, WikiProject Fungi (if applicable), WikiProject Marine life, and maybe WikiProject Evolutionary biology and WikiProject Extinction as well. Just leaving a note ahead of time and watching for comments can mean a great deal. Besides, they can help to figure out which stub-types and talk-page templates should be added.
 * Polbot would search Wikimedia Commons on the binomial name, just in case there's a page or category there with that same name. If so, Polbot left me a note to look it over by hand and see if a suitable image could be added. Only maybe 1 in 100 did, but that's certainly worth adding.
 * Creating redirects at synonym pages is a good idea, even for articles that already exist.
 * Re: "Polbot created tens of thousands of plant and animal species stubs. None were deleted". This is flatly untrue as I had to personally delete many of the superfluous stubs for monotypic plant taxa, or where the bot generated a duplicate article under a new name.  Just cleaning up the taxoboxes for Polbot's monocot articles took months of dedicated work on the part of several editors.  --EncycloPetey (talk) 18:11, 15 August 2008 (UTC)

Do you have an example page of what the stubs will look like? Mr.Z-man 00:06, 26 July 2008 (UTC)


 * Here's what I would like to see. I'd love to see 50 example articles in yours or your bots userspace, and, the relevant wikiprojects invited to comment on those trial articles here please. SQL Query me!  07:00, 26 July 2008 (UTC)


 * Great, I'll get on it, as time permits. Martin  (Smith609 – Talk)  10:34, 27 July 2008 (UTC)
 * Is there any such update on this request? — E  ↗TCB 04:48, 3 August 2008 (UTC)
 * Hi, I've started out with Algaebase; its owner has been understandably hesitant to grant full access to his MYSQL database to a complete stranger claiming to want his data to improve a competing website. So I've been negotiating delicately, and he granted me access yesterday.  As I'm now rather busy it might take a little while for me to code and test the bot, but I'll post an update here (and on WikiProject pages) when I'm ready to invite comment on test pages in userspace. Martin  (Smith609 – Talk)  08:59, 3 August 2008 (UTC)
 * Oh, that's good news. When I was considering doing this, I'd thought I'd have to drill down through the unfriendly taxonomy navigation system to get a list of all algae, and this proved very difficult. If you've got access to the direct data, then bully on you! I look forward to seeing more, – Quadell (talk) 14:44, 3 August 2008 (UTC)

You should also contact User:WillowW, who has already created pages from AlgaeBase on most (all?) genera and higher taxa of green algae. She could give you some insight into the taxonomic problems she encountered. --EncycloPetey (talk) 18:18, 15 August 2008 (UTC)
 * Addendum: She's left some comments here. --EncycloPetey (talk) 19:49, 15 August 2008 (UTC)


 * Looks like it's trickling along. I have a couple of questions: (User:Anybot/Phymatolithon is my example in all these.)
 * How did you auto-generate complete sentences such as "The organisms possess neither haustoria nor secondary pit connections"?
 * The bot scans the text of the Algaebase articles for key features and works out whether they are present or absent. Once it has a list of what is and isn't there, it choses the most appropriate grammar to use from a preset list.
 * It looks to me like the reference (Guiry and Guiry, Algaebase) can be combined with the notice (This article contains...). They both say the same thing; just link Algaebase in the ref and you're done.
 * Done
 * Can you (and should you) pull Algaebases' references into the Wikipedia article? E.g., can you pull in this for Phymatolithon?
 * ==Further reading==
 * * Foslie, M. (1898). Systematical survey of the Lithothamnia. Kongelige Norske Videnskabers Selskabs Skrifter 1898(2): 1-7.
 * * Irvine, L.M. & Woelkerling, W.J. (1986). Proposal to conserve Phymatolithon against Apora (Rhodophyta: Corallinaceae). Taxon 35: 731-733.
 * I've considered this but it's difficult, because of the way the database is structured. The data is easily accessible via Algaebase to anyone who's interested enough to want a reference.
 * Are you planning to create talkpages for these articles? They could contain and such, and could perhaps contain a disclaimer that the page was created by a bot, if you wish.
 * I had not planned to. WP Micro is not really relevant; I wouldn't really call algae microorganisms. Marine life could probably be added though.  It's easily done if you think it's worthwhile.
 * Is it useful/possible for you to search Commons for images for each article? If a successful match were noted on the talk page, that would be a boon to the human editors who come after.
 * That would take me an awful lot of time; I'm not sure it would be worth the effort saved - editors can search themselves, and images may become available after the bot has been run.
 * Is it better to categorize all these as Category:Algae, or should you categorize by family (creating Category:Corallinaceae)? After all, Category:Algae is going to get really big really fast if you put everything in there.
 * Categorisation by family is a good idea.
 * All the best, – Quadell (talk) 13:26, 18 August 2008 (UTC)


 * Thanks for your ideas. Comments above. Martin  (Smith609 – Talk)  18:00, 29 November 2008 (UTC)

Test pages
The bot is almost up and running, and pages will begin appearing here soon. Comments would be welcome; please leave all comments on this page. Martin  (Smith609 – Talk)  17:42, 15 August 2008 (UTC)

NB. Notices have been left at WikiProjects on Plants, Microbiology, Marine life and the Tree of life.
 * Umm... whose classficiation system(s) are you using? AlgaeBase in inconsistent in the application of taxonomic nomenclature. Arguably, there has been more revision of algal systematics over the last 20 years than for any plant group, so this could mean a lot of cleanup, which was one of the biggest problems of PolBot. On the more detailed side of things, why are you using "phylum" instead of "division" for algae?  And how will the bot handle situations where the entry already exists?  Note that an existing entry does not mean the entry is necessarily an alga article. --EncycloPetey (talk) 18:01, 15 August 2008 (UTC)
 * The way User:ProteinBoxBot dealt with existing pages was to note they exist in a log (eg User:ProteinBoxBot/PBB Log Wiki 11-8-2007-A2-9) and leave any further actions to a human editor. Tim Vickers (talk) 18:13, 15 August 2008 (UTC)


 * Good points, thanks Petey. I guess it would make sense to go with the classification of the most recent edition of Phycology(Lee 2008); it should be quite easy to translate to this scheme from the information within Algaebase.  I imagine that cleanup could be performed automatically by this bot; ideally someone in the know will tell me what needs doing!
 * The bot now uses "divisio".
 * Where an article exists, I guess one approach would be to look for: a taxobox, an "alga-stub" template, or a "category:algae" note in the page text. If none of those are present, the page could be created at Genus (alga); otherwise they would be logged for a manual decision. Martin  (Smith609 – Talk)  18:22, 15 August 2008 (UTC)
 * Logging might be best. Consider: If there is already a non-alga article under the genus name, the algal genus may have an article somewhere under another name, possibly as "Genus (alga)" but possibly under some other name.  This isn't all that uncommon since "plants" and "animals" can share the same generic name.  It's also conceivable that the existing article may be a redirect that would be better converted to an alga article, or to a disambiguation page. --EncycloPetey (talk) 18:30, 15 August 2008 (UTC)
 * Hmm, yes, I quite agree. Thanks for your wisdom. Martin  (Smith609 – Talk)  18:42, 15 August 2008 (UTC)
 * Well, I was going to make an elaborate argument for "phylum" (which is synonymous with divisio under the ICBN and of course some of these protist groups have both "plants" and "animals" in them) but if you want to go with divisio, it doesn't seem worth disputing. I'm a little unclear as to the authoritative status of AlgaeBase (based on a very limited amount of trying to use it as a reference) but I guess you have spent more time looking into it than I have. But if you think the bot can make stubs which are at least fairly accurate, it would be great to do *something* to get a bit more algae coverage. Kingdon (talk) 03:49, 16 August 2008 (UTC)
 * Noticed a problem with punctuation: "Specimens can reach around 30 cm in size. ." Kaldari (talk) 17:56, 1 December 2008 (UTC)

Automated creation of alga articles

 * Copied from Wikipedia talk:WikiProject Plants

Hi,

I'm writing a bot which will automatically create stubs on algal taxa, from the genus level up. While this doesn't strictly fall under the juristiction of plants, any feedback on the stub articles would be very welcome. Stubs will appear over the next couple of days here; more details are available here, where any comments would be gratefully received.

Thanks, Martin  (Smith609 – Talk)  17:48, 15 August 2008 (UTC)


 * Umm... whose classficiation system(s) are you using? And which groups of algae do you mean?  Arguably, there has been more revision of algal systematics over the last 20 years than for any plant group. --EncycloPetey (talk) 18:01, 15 August 2008 (UTC)


 * Hi Martin,


 * That's great news! :) I went on an enthusiastic spree roughly a year ago, making articles for most taxa of the Chlorophyta and Charophyta (the green algae) at genus-level and higher, but undoubtedly I missed some and new taxa have been added in the meanwhile.  You might also consider developing a bot to improve those earlier articles with a fuller description, images, more links to literature and databases, etc.  I did the best I could, but I'm no expert and they were pretty rudimentary.  Thanks for your good work, Willow (talk) 19:41, 15 August 2008 (UTC)


 * PS. I made a few templates and categories to help with the work, which you might consider? I made separate categories for each of the (major) taxonomic levels of algae: Category:Algae taxonomic classes, Category:Algae taxonomic orders, Category:Algae taxonomic families, and Category:Algae genera.  Secondly, I made two templates for linking to taxonomic references and databases, unimaginatively titled Taxonomic references and Taxonomic links. ;)  We can modify or specialize them for you, if that'd help! :) Willow (talk) 19:41, 15 August 2008 (UTC)


 * Thanks - it should be relatively easy to expand your articles, as they appear to use a pretty consistent format. I'll bear this in mind.  It might be easier to leave any further discussion at Bots/Requests_for_approval/anybot to keep it in one place.  Cheers, Martin  (Smith609 – Talk)  20:20, 15 August 2008 (UTC)

Alga creation from algaebase

 * Copied from User talk:WillowW

Hi, apparently you have some experience with creating Wikipedia articles using data from Algaebase. I'm developing a bot using the Algaebase database to add the remaining genera to WP, and wondered if you might be able to spare any wisdom - particularly in terms of taxonomy, which apparently is somewhat inconsistent in Algaebase. Any advice would be very gratefully received! Martin  (Smith609 – Talk)  19:50, 15 August 2008 (UTC)


 * Hi Martin,


 * I left you notes at the various WikiProjects, in the hopes you would find them, but I suppose I should've written directly to you. My basic advice is to hold off on mass article creation until you've perfected your bot as much as it can be.  Patience and a scrupulous attention to detail will forestall grief and regret later on; it's much easier to mass-create articles than to mass-edit them, as I'm sure you're aware.  In my case, I created the articles offline, proofread them, and then uploaded them by hand, which is a lot more painful on the wrists than a bot. ;)


 * I'm surprised that you estimate that Wikipedia is missing 90% of the Chlorophyta and Charophyta. I'm willing to bet that the number is less than 10%, although I daresay my estimate smacks of hubris and I'll be punished for it. ;)


 * In my letters at the WikiProjects, I mentioned my article classification scheme for the taxonomic articles, as well as two templates that seemed useful to me, although we could specialize them for algae articles.


 * I'm delighted that you've made personal connections with the people who run AlgaeBase, which seems to be the best resource for algal taxonomy by far. I was very clueless when I began, and didn't know about them at all; the algae articles were just a random enthusiasm and I took my data from the NCBI.  So my taxonomic data were worse than yours.  My advice here would be to use the most reliable but available resource from the first, and then go back and fix the errors on a case-by-case basis.  You'll need to go back to each article anyway (see below).


 * If you were interested in improving the articles I created, you might consider adding the AlgaeBase reference instead of the NCBI reference. Also, you'll want to add AlgaeBase's citation for the lowest taxonomic level, but not for the upper ones.  If the AlgaeBase people would let you add to the articles their lists of scientific citations for each taxon and their images as well, that would be incredible! :)  Releasing at least one image for each algal genus under the GFDL or a similar license would be a huge benefit for the articles.


 * The really hard work begins after the automated uploads. You'll need to go through the taxa one by one, read the literature and summarize it to make it more than a stub.  I'm easily distracted, so I never got back to that, trusting that real algae experts such as yourself would eventually turn up and do the job better than I could.


 * Good luck and God speed your work, Willow (talk) 20:32, 15 August 2008 (UTC)

What's the status on this? Mr.Z-man 03:15, 2 November 2008 (UTC)
 * I'm gradually improving it whenever I have the opportunity, but my free time has been very limited this last month or two.  I'll continue to work on it as and when I get the chance. Martin  (Smith609 – Talk)  18:52, 2 November 2008 (UTC)

BAGAssistanceNeeded
 * Progress is being hindered because the bot doesn't seem to have bypassed captchas when introducing external links. The account's been active for a few months now. How can I avoid the captchas? Martin  (Smith609 – Talk)  17:18, 29 November 2008 (UTC)

A minor grammatical fix, but I guess it is better to fix it now than after you've created 1000 stubs: it's "genus of algae" not "genus of alga"; you wouldn't say "is a genus of animal". Hesperian 03:55, 30 November 2008 (UTC)
 * Thanks, these minor fixes are important! I think that "A genus of plant" is more correct than a "genus of plants"; I guess whether you interpret the sentence as "A genus in the kingdom Animalia" or "A genus containing animals". The former is my intended meaning, so I think that the singular is appropriate in that context. Martin  (Smith609 – Talk)  17:57, 30 November 2008 (UTC)
 * Also, some of your example genus articles include the subclass in the taxobox. Is this necessary? We don't include minor ranks unless they are important to the topic, and generally this translates to only including minor ranks up to the next lowest major rank; for example a genus article will include minor ranks between family and genus, but no minor ranks above family rank. (There are exceptions to this rule: for example all bamboo species articles have subfamily Bambusoideae in the taxobox, because it is important in the context.) Hesperian 04:06, 30 November 2008 (UTC)
 * Okay, I'll remove the subclass. Martin  (Smith609 – Talk)  17:57, 30 November 2008 (UTC)
 * One more thing: if you end up doing any green algae, can you tag them with plant-stub too please? Hesperian 04:09, 30 November 2008 (UTC)
 * No, no, no. Please do not use plant-stub if there is another appropriate stub.  The generic stub tagging is only for stubs that have no more specific stub category in which to be placed, and is to be avoided whenever possible.  If a large number of green algae stubs are expected, we can create a new stub specifically for them.  --EncycloPetey (talk) 04:15, 30 November 2008 (UTC)

Ready to go
Aside from the problem with punctuation mentioned above, no-body has expressed any reservations about the creation of these pages (or offered any suggestions for improvement), despite my best efforts to advertise the bot. The Algaebase team have expressed their satisfaction with the output. The next stage is to create some real pages, and I may have time to do this over the weekend. May I create a few (20? 50?) and see if this elicits any feedback? Martin  (Smith609 – Talk)  03:36, 13 December 2008 (UTC)
 * BJ Talk 03:42, 20 December 2008 (UTC)
 * Any news on the status of this project?  RichardΩ612  Ɣ ɸ 13:38, 1 January 2009 (UTC)
 * The authorisation coincided with my going home for Christmas so I've not had the chance to act on it yet. I'll respond here when the pages have been created.  Martin  (Smith609 – Talk)  17:36, 3 January 2009 (UTC)
 * Mainspace edits now being created: Contributions/Anybot Martin  (Smith609 – Talk)  01:28, 6 February 2009 (UTC)
 * The bot seems to have lost the authority to create pages. If anyone can suggest how I can rectify this, that'd be awesome. I think I've created about 25 but wanted to experiment with creating redirects for 'common names'. Martin  (Smith609 – Talk)  03:58, 6 February 2009 (UTC)
 * There don't seem to have been any problems with the trial pages. Can I request permission to go live? Thanks, Martin  (Smith609 – Talk)  22:43, 13 February 2009 (UTC)

BAGAssistanceNeeded 03:39, 17 February 2009 (UTC)
 * I disagree with using "Article maintenance" as the edit summary when creating pages, but yes, other than that it looks just fine. — neuro  (talk)  11:57, 19 February 2009 (UTC)
 * By the way, if the trial is complete, you should use BotTrialComplete. — neuro  (talk)  12:18, 19 February 2009 (UTC)
 * Thanks - I hadn't realised that. I'll change the edit summary to something like 'Created new article from AlgaeBase'. Martin  (Smith609 – Talk)  14:14, 19 February 2009 (UTC)
 * Changing the edit summary is a good idea, but as all the concerns have been resolved/fixed, the trial went well, and nothing new has come up, I see no reason not to get this running fully.  Richard 0612  11:57, 20 February 2009 (UTC)


 * The above discussion is preserved as an archive of the debate. Please do not modify it. Subsequent comments should be made in a new section.


 * The above discussion is preserved as an archive of the debate. Please do not modify it. Subsequent comments should be made in a new section.