Wikipedia:Bots/Requests for approval/Qbugbot 2


 * The following discussion is an archived debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA. The result of the discussion was

Qbugbot 2
Operator:

Time filed: 05:47, Thursday, February 22, 2018 (UTC)

Automatic, Supervised, or Manual: Automatic. Edits will be programmatically scanned, and manually spot-checked soon after being made.

Programming language(s): VB

Source code available: Yes: User:Qbugbot/source

Function overview: Qbugbot creates and uploads stub articles for insects, spiders, and other arthropods.

Links to relevant discussions (where appropriate):


 * Village Pump: A Bot for Creating Arthropod Stubs
 * Wikipedia_talk:WikiProject_Tree_of_Life
 * Wikipedia_talk:WikiProject_Arthropods
 * User_talk:Edibobb
 * User_talk:Edibobb
 * User_talk:Edibobb
 * Community RfC approving this task

About 2,000 new articles have been generated and manually posted for testing and discussion.

Edit period(s): Most nights, for a few hours per night.

Estimated number of pages affected: 15,000

Namespace(s): Mainspace/Articles

Exclusion compliant (Yes/No): Yes. The bot skips all existing articles. It only creates new pages.

Function details:

I am interested in adding about 15,000 new stub articles for arthropod species. These can be added over a period of a several weeks or a few months with a bot. The article content is significantly better than the minimal stub frequently seen on these and similar Wikipedia topics.

In my mind, these stubs will serve three primary purposes:


 * They will give users some idea of the organism, including references and, if available, a photo or two.
 * They will make online and print references available to users.
 * They will contain enough material to make it convenient for editors to expand the article. A casual editor can spend significantly less time expanding an existing article than creating a new one, because it takes time to learn and work with the various templates and other "standards". An article with online references makes it even more likely for the article to be expanded into a start-class or better article.

I have requested and received positive input from the Arthropods and Tree of Life projects. I have been manually posting new stub articles generated for a variety of arthopods, primarily insects and spiders, over the past few weeks at the recommendation and encouragement of project members and interested editors. I have received positive feedback, and the stub quality has evolved and improved significantly. Several of the stubs have already been expanded by various editors.


 * Village Pump: A Bot for Creating Arthropod Stubs
 * Wikipedia_talk:WikiProject_Tree_of_Life
 * Wikipedia_talk:WikiProject_Arthropods
 * User_talk:Edibobb
 * User_talk:Edibobb
 * User_talk:Edibobb

On February 3 I applied for approval for bot operation, and was directed to the Village Pump for more discussion. At the Village Pump I received several suggestions and implemented most. The project received broad support at varying levels, with no explicit opposition.

I have now generated and manually posted about 2,000 stub articles on Arthropods, a fairly thorough test of the content generation.

Operation Details

VB source code is available at User:Qbugbot/source.

Species selection

ITIS has most long-established arthropod species in its database, although it does not have many of the newer species and may not reflect recent reorganization of genera and higher taxa. The species in Bugguide generally reflect the latest research, and are limited primarily to species photographed or collected in North America by its users. By selecting the species that appear both in ITIS and Bugguide, we end up with a set of non-controversial species (from a taxonomic standpoint) that are not overly rare or obscure.


 * 35,000 arthropod species are in bugguide.
 * 23,000 of these are in ITIS (out of 250,000 total arthropod species in ITIS).
 * 15,000 of these have no Wikipedia article (there were 17,000 before 2,000 test articles were manually posted.)

Article creation

Articles are created in the following steps:


 * A Speciesbox template is created for the (taxonomic) ancestry of the "bug". If necessary, taxonomy templates are created and saved at the genus level and above. These are required for the Speciesbox. Synonyms are added to the Speciesbox if they appear in the ITIS database. (Other catalogs may have ridiculous numbers of synomyms.)


 * A text introduction is generated, such as "Andrena perarmata, the well-armed andrena, is a species of mining bee in the family Andrenidae," giving the scientific name, common names (if any), taxonomic rank, an ancestor's common name, and the scientific name of the family or order.


 * This is followed, as available, by the distribution range, the IUCN conservation status, Hodges number, ITIS taxonomic notes, additional images, and a list of taxonomic children (if any). If there are too many children, a link to a separate list page (created afterward) is included. The distribution data comes from ITIS, World Spider Catalog, or Odonata Central.


 * References may include inline citations, further reading, and external links. All references use the standard WP:CS1 templates Cite journal, Cite book, and Cite web.


 * A Wikimedia Commons template is added if there are photos, a Taxonbar is added (with Q-number if possible), the appropriate Wikipedia category is selected, and the proper stub template is selected for the talk page.


 * After the species page is created, pages are created for the genus and upper levels of the taxonomy of the species, as needed. The Automatic taxobox is used for this, again requiring the creation of the appropriate taxonomy templates. If available, one or two images are selected for the upper level pages.

Article upload

Articles are created on demand for upload. An article will be uploaded only if no article exists for that title. If one does exist, the article will be skipped. No existing articles will be altered. If the taxonomic parent of an uploaded article does not exist, it will be generated and uploaded. If a list of more than 100 "children" is included in the article, it will be split off as a separate list article. A talk page with the proper stub template is created for each article.

Manual verification

During the test period, every article created will be viewed on Wikipedia to verify that it exists, it is the correct article, and the information is proper. Later on, the text of all the day's articles will be downloaded and verified manually or automatically. At least one article daily will be manually viewed on Wikipedia.

Sample articles

Here is a list of some random test articles generated and manually posted on February 21.

• Centrodera decolorata

• Deraeocoris bakeri

• Deraeocoris poecilus

• Dialytellus

• Dialytellus dialytoides

• Elaphria deltoides

• Eremothera sculpturata

• Exochomus aethiops

• Liris partitus

• List of Synopeas species

• Metachroma

• Metachroma suturale

• Metadioctria

• Metadioctria parvula

• Nephus ornatus

• Periscepsia helymus

• Perlodinae

• Phytomyptera tarsalis

• Pseudanarta caeca

• Ptomaphagus

• Ptomaphagus merritti

• Pytho seidlitzi

• Skwala

• Skwala americana

• Suillia longipennis

• Synopeas

• Synopeas hopkinsi

• Thera otisi

• Tollius setosus

• Xysticus nigromaculatus

Discussion
I have only just seen this BRFA. I am very impressed by the amount of thought and care that has obviously gone into preparing it, and thank you very much for your work on this.

I hope that, instead of using inline long horizontally formatted CS1/2 citation templates, you can use short-form citation instead. This will make the wikitext of your new articles much more pleasant to read, and much easier to edit. I am very glad to see that, in the "Further reading" section, you already set out the CS1/2 citation templates in a sensible, pleasant to read format. This leads me to believe that it probably won't be very difficult for you to set out, in a bibliographic listing, the long CS1/2 citation templates in a similar form, for use in conjunction with short-form referencing.

I believe there really is no excuse for using what I call "LHT clutter" (long horizontal template clutter) in a large number number of brand new articles. LHT clutter only persists in existing articles because of a (very strong) legacy effect. I think LHT clutter is unacceptable, and that Wikipedia should gradually move away from using it. In most (but not all) cases, short-form referencing is the best way of doing so. For those who are interested, the case against LHT clutter is set out at length in this very long thread. That thread refers to what I call "ETVP" (easy to visually parse) formatting of long citation templates, and points to some examples; you will see that the ETVP formatting in those examples is very similar to what you are already using in the "Further reading" sections.

--NSH001 (talk) 08:40, 22 February 2018 (UTC)


 * Actually, the operator has changed from using plain-text citations to CS1 due to standardization & maintenance, with no qualms and only support from WP:TREE.  ~ Tom.Reding (talk ⋅dgaf)  13:59, 22 February 2018 (UTC)
 * , oh you meant Sfn. Can you make your case at WT:TREE? I don't think this has been discussed by the project.  ~ Tom.Reding (talk ⋅dgaf)  14:08, 22 February 2018 (UTC)
 * , thanks for the response. Happy to mention it at WP:TREE if people think it will help, but I could only see one thread there relevant to this bot. There's not much to say - short case: much easier to read wikitext, much easier to edit wikitext, better to get a decent citation/referencing style from scratch, rather than changing it later; long case: see the very long thread. I'll wait for a response from (Bob Walker) before posting there, but I don't anticipate any great difficulty, as Bob is already using ETVP in the "Further reading" section. I also have a script that could be used to change the trial articles already set up to use short-form, but that will need consensus per WP:CITEVAR. --NSH001 (talk) 21:12, 22 February 2018 (UTC)
 * I opted to go with the vertical form of CS1 template in the text of the article in addition to the Further Reading and External Link sections. I like the short citations, but they end up duplicating the references in two sections at the bottom. This is fine for a longer article, but these stubs are already a little "bottom-heavy". An ideal solution (for me) would be to have a section at the bottom of the wikitext where all the references could be defined (and not displayed), and then referred to by name throughout the article. Reflist would bring them up only once. This would probably require some core code change, so it may not happen right away. Bob Webster (talk) 19:23, 26 February 2018 (UTC)
 * WP:LDR exists. &#123;&#123;3x&#124;p&#125;&#125;ery (talk) 19:15, 12 March 2018 (UTC)
 * This is kind of a tangent, but: between the visual editor and the syntax highlighter, most people who are making manual edits aren't going to be bothered much, if at all, by the "clutter".  sfn is relatively unpopular even among experienced editors.  Its optimal use-case is a longer article that repeatedly cites the same handful of books, with different pages for different claims.  I would recommend against using it in short articles, including these.  The more familiar cite.php ("ref tags") markup is much less likely to confuse newer editors (and an experienced one knows how to follow WP:CITEVAR and propose a change, if he's significantly expanding an article and the format bothers him enough).  WhatamIdoing (talk) 03:09, 10 March 2018 (UTC)


 * Proceeding with an abundance of caution, please run a trial of 20 stubs created. Once done, mark the trial completed here and post the list of stubs for review at relevant WikiProjects, including a link back to the BRFA. There will be a decent wait between this first trial run and an extended trial to allow for feedback and to ensure the stubs don't need any cleanup tags, etc. ~ Rob 13 Talk 06:57, 24 February 2018 (UTC)


 * Thanks! The trial of 20 stubs is complete, and they've been posted at Village Pump, Tree of Life project, and Arthropods project. Everything worked fine, except for a minor bug that has been fixed (talk pages were blank for pages with images). Here is a list of the 20 stubs created:

• Bledius annularis

• Bledius

• List of Bledius species

• Bombylius albicapillus

• Calligrapha alnicola

• Cerotainiops abdominalis

• Cerotainiops

• Efferia tuberculata

• Eremochrysa pallida

• Glyptina spuria

• Glyptina

• Hister civilis

• Hydroporus rectus

• Kuschelina jacobiana

• Kuschelina

• Osorius planifrons

• Osorius

• Paropomala virgata

• Paropomala

• Walckenaeria directa
 * Bob Webster (talk) 03:34, 26 February 2018 (UTC)


 * — CYBERPOWER  ( Chat ) 15:55, 12 March 2018 (UTC)
 * Added autopatrol for 1 month to support the extended trial above. — xaosflux  Talk 16:03, 12 March 2018 (UTC)
 * I was actually thinking doing this one without autopatrolled. This will afford the opportunity for NPP to find any issues with the bot and report it to the bot owner.  If the 500 pages are problem free, we would approve it for another larger trial with auto patrolled, and then finally give it the bot flag, and approve it completely.  500 pages shouldn't be that much of a burden for NPPers for a bot trial.— CYBERPOWER  ( Chat ) 19:25, 12 March 2018 (UTC)
 * maybe 100-200 without? I'm a bit warry (and that RfC mentioned it a bit) about flooding NPP backlogs. —  xaosflux  Talk 19:35, 12 March 2018 (UTC)
 * ping fix. — xaosflux  Talk 19:35, 12 March 2018 (UTC)
 * Okay. I reduced the trial to 200 pages.  Stubs should be easy to patrol, but it should be done by those that are experienced with it.— CYBERPOWER  ( Chat ) 19:38, 12 March 2018 (UTC)
 * AP removed, perhaps a 2 phase, re-add after the first 200. — xaosflux  Talk 15:32, 15 March 2018 (UTC)
 * QBugbot completed the second trial with 200 pages created, listed in Qbugbot/info. The bot operated 8 days, creating 20 articles on day 1, 25 on days 2-7, and 30 articles on day 8 (this morning). I added the GBIF database for lists of species and other taxa, now show which databases those taxa are in, and changed the arrangement of references to make the wikitext more readable. As usual, I introduced a bug or two with these changes so I fixed the bugs and stopped making non-trivial changes.
 * Where are the bot's edits? I don't see any?— CYBERPOWER  (Around ) 03:25, 21 March 2018 (UTC)
 * When running a bot trial, the edits should be made from the bot in question. Unfortunately, this trial is meaningless as it was made from your account which has the autopatrolled flag.  The purpose of this trial was to see if there any issue with the articles that new page patrollers are more likely to identify than members of BAG.— CYBERPOWER  (Around ) 03:29, 21 March 2018 (UTC)
 * Apparently I did something wrong. The edits were made from the bot. The bot logs in with edibobb@qbugbot and the 32-character bot password generated online, and receives an edit token under that login. The edit command uses a post with "bot" "true" parameter. Should I be logging in differently? Bob Webster (talk) 14:37, 21 March 2018 (UTC)
 * You generated a password for your main account. You need to login to your bot account and generate a new password.— CYBERPOWER  ( Chat ) 15:41, 21 March 2018 (UTC)
 * Thanks, I've got it straightened out now. Should I run the trial again? Bob Webster (talk) 18:39, 21 March 2018 (UTC)
 * Yes please, and make sure it's coming from the right account. I would like to see some NPPs review these articles to see if they have any suggestions, complaints, or other comments.— CYBERPOWER  ( Chat ) 18:42, 21 March 2018 (UTC)
 * I'll start on it today. I have "uncompleted" this trial. (Hopefully that was the right thing to do.) Thanks again for your help. Bob Webster (talk) 19:09, 21 March 2018 (UTC)
 * Finished 200 pages, 25 per day for 8 days. These were reviewed. A little wording and some common names were improved. There was a missing photo and an incorrect photo. A few references were corrected, a few were added, and a couple were removed. The new pages are listed at the bottom of user:qbugbot/info Bob Webster (talk) 17:06, 28 March 2018 (UTC)


 * Have you implemented all of the suggested changes?— CYBERPOWER  (Around ) 02:33, 7 April 2018 (UTC)
 * I've implemented most of them.
 * Changes not made:
 * I did not add redirects for common names.
 * I did not add categories for years described or geographic areas.
 * I don't plan to add articles for all entries in GBIF.
 * I did not do any interlanguage linking.


 * Changes made:
 * The prose text was adjusted to accommodate most of several suggestions.
 * Phylogenetic Sequence Number for moths were removed.
 * Photos were moved to the right side of the page.
 * The pages use Species Box and Automatic Taxobox templates.
 * Taxon authority were added (when possible) for the infoboxes.
 * The minimum number for a separate species or genus list page was raised from 30 to 100.
 * The pages use Taxonbar templates with Q-codes when possible.
 * An introductory sentence was added to the species lists, genus lists, etc.
 * Image captions were added, if available, only when they are different from the page title.
 * The Commons template uses the in-line form in External Links.
 * "Photo needed" was removed from the talk page stub template. (There were suggestions for and against this.)
 * Each page is in the category, "Articles created by Qbugbot" and has a note in the talk page about Qbugbot and contact information.
 * The Qbugbot user page has information on the automated origin of the article, datasets accessed, links to project discussion, a link for comments and suggestions, and ideas on how editors can expand the articles.
 * En-dash is used properly.
 * Some common names were changed and some deleted from the upper level taxa to make them sound better in the articles.
 * The GBIF and Catalogue of Life databases were added as sources for "descendants" and "ancestors" to include more arthropods from outside the Americas.
 * The format, placement, and content of references has been adjusted a several times according to several suggestions. Some of the suggestions were conflicting, but the references have been cleaned up significantly.
 * The number of general references in "further reading" was reduced, and the scope of many references was reduced.
 * The authors in all the references have been checked, along with the placement of volumes, issues, publications, publishers, and pages.
 * References were formatted so periods are used consistently, publisher does not include company type or location, and other minor errors were fixed.
 * The general references were moved to "further reading", and all the web-based general references were moved to "external links".
 * Doi, URL, or ISBN was added to most references.
 * References were added to the infobox when appropriate.
 * The bot uses CS1 format for citations. There was a suggestion for a form that is more readable in wikitext, but using the reflist template accomplished this.
 * Bugguide and EOL were moved to be the last references, ITIS, Catalogue of Life, and GBIF first.
 * A public domain notice is included with the ITIS reference when an ITIS Taxonomic Note is used in the article.
 * Some journal articles were added with a relatively narrow scope. Bob Webster (talk) 04:39, 7 April 2018 (UTC)


 * Let's do 100 more with the suggested modifications.— CYBERPOWER  ( Chat ) 17:00, 8 April 2018 (UTC)
 * I'll start on these today.
 * I did not intend to make the four changes not yet made:
 * Redirects for common names would have some quality issues, and met some opposition in discussion.
 * Adding categories for geographic areas would not be very accurate because of a lack of consistent distribution information. I prefer not to add the categories for years described. While this was suggested, it was not much of a consensus.
 * Adding articles for all GBIF entries is outside the method of choosing articles outlined and discussed in this bot request. It would result in 20 to 50 times as many articles being created.
 * I don't understand how using interlanguage linking would work.
 * I can do any of these if required for bot authorization, but I think it's better not to. Let me know if I should do these.
 * The suggested modifications made ("Changes made" listed above) were done before the last 200 articles were generated. Since then, some improvements to common names and references were made. I'll continue to add recent journal references when I run across them. The next 100 articles should be complete on Wednesday. Like the others, they will be listed at user:qbugbot/info. Bob Webster (talk) 20:24, 8 April 2018 (UTC)
 * Hi . Don't worry about adding interlanguage links - they should only be assigned at Wikidata anyway and there are WD bots to keep them up-to-date. New WD entities might have to be made for some of the articles, but I believe there are periodic bots for that too; and we can probably nudge a WD botop into filling in the gaps soon after the main run is complete :)  ~ Tom.Reding (talk ⋅dgaf)  21:10, 8 April 2018 (UTC)
 * That's okay. I just want to see NPP review the articles with the changes made to your bot task.— CYBERPOWER  ( Chat ) 23:27, 8 April 2018 (UTC)

100 pages complete, 25 per day for 4 days. These were reviewed.

Changes: "ITIS taxonomic notes" were removed. (They're confusing and often use poor wording.) A duplicate "and" was fixed in the distribution range for certain spiders. Spacing was corrected around some commas in the Catalogue of Life database. An odd common name was removed. Some photos were added for genus and higher taxa. Bob Webster (talk) 19:32, 11 April 2018 (UTC)
 * This is a full speed run with the auto-patrolled bit.— CYBERPOWER  (Around ) 03:08, 12 April 2018 (UTC)

500 pages complete, 200 last night and 300 today (April 12).

Changes: Unnecessary public domain template was removed from some ITIS references, the scope of some references was narrowed. Bob Webster (talk) 23:34, 12 April 2018 (UTC)
 * One thing I would like addressed is the edit summaries. It would be nice if your bot would use them instead of letting MW create those canned summaries that get cut off.  I can't think of a good one right now, but I'll leave that up to you.— CYBERPOWER  (Around ) 02:19, 13 April 2018 (UTC)
 * Good idea. That's done now, with these summaries:
 * Created page for the subfamily Ectyphinae
 * Created talk page: stub class, low importance
 * Created list page for the subfamily Ectyphinae
 * Created talk page: list class, low importance
 * Created template:Taxonomy/Ectyphinae


 * Bob Webster (talk) 04:07, 13 April 2018 (UTC)


 * Alright. Final trial to test the edit summaries.  No throttle.— CYBERPOWER  ( Chat ) 17:15, 20 April 2018 (UTC)

All done, no problems. Bob Webster (talk) 20:15, 20 April 2018 (UTC)
 * — CYBERPOWER  ( Chat ) 20:58, 20 April 2018 (UTC)
 * The above discussion is preserved as an archive of the debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA.