Wikipedia:Bots/Requests for approval/CobraBot 2


 * The following discussion is an archived debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA. The result of the discussion was Symbol keep vote.svg Approved.

CobraBot 2
Operator: Cybercobra

Automatic or Manually assisted: Automatic

Programming language(s): Python (PyWikipedia)

Source code available: Yes Forthcoming (still being written)

Function overview: Adds Dewey Decimal Classification and Library of Congress Classification data to Infobox book instances based on ISBNs.

Edit period(s): Multiple runs as personal time permits; once initial sweep complete, periodic infrequent re-runs (e.g. quarterly)

Estimated number of pages affected: 11K articles or within that order of magnitude, based on previous task

Exclusion compliant (Y/N): Y (via PyWikipedia defaults)

Already has a bot flag (Y/N): Y

Function details: Essentially the same as task #1, just with different, new fields (that don't link and never have).


 * 1) Bot chooses an article that transcludes Template:Infobox Book
 * 2) Bot locates the template in the article
 * 3) Bot checks if |congress= and/or |dewey= parameters are present
 * 4) If both yes, and both values is non-whitespace, page is skipped (since the datums are already present). GOTO step 1.
 * 5) If yes, and value(s) is/are whitespace, parameter(s) removed.
 * 6) If no, continue to 4.
 * 7) Bot grabs |isbn= parameter
 * 8) If parameter not present, page is skipped (No ISBN to use for data lookup). GOTO step 1.
 * 9) The value of the parameter is obtained, extra preceding "ISBN" text or dashes are stripped from the obtained value
 * 10) If the value is "N/A" or similar, page is skipped (No useful ISBN to use for data lookup). GOTO step 1.
 * 11) Using a proprietary process, the corresponding Dewey Decimal number and Library of Congress Classifications are found for the given ISBN.
 * 12) The data is added to the infobox body using the aforementioned parameters
 * 13) Page changes are saved.
 * 14) GOTO 1 until all pages either processed or skipped.

Discussion

 * Should have code ready in next few days. --Cyber cobra (talk) 18:31, 14 October 2009 (UTC)
 * And done! --Cyber cobra (talk) 05:22, 18 October 2009 (UTC)
 * Both systems are US-centric, although I think Dewey is also used in British books; and it would, imo, make the book infoboxes crowded. At the very least, concerned editors at WikiProject_Books and WikiProject Novels may want to discuss the results to the infobox book after the change. I see did you posted at Template talk:Infobox book, but I don't think this is enough. One of the posters there did raise the issue that the BRFA for CobraBot 1 (adding OCLC# parameter to the info box) only discussed the technical merits of adding the bot and not raise any issues about the community consensus for the task. It seems to have been requested, approved for trial, and approved as a bot in under 2 days.
 * I would like to see more wide-spread community consensus for this task before it is approved. --69.225.5.183 (talk) 07:05, 18 October 2009 (UTC)
 * I concur regarding the duration of the discussion - I would like this one to stay open for at least a week (starting now) so that enough input and notice is given and gathered. However, this is much less controversial than the OCLC# one as no linking is involved, and in fact one of the two primary detractors of the OCLC task approves of this one.
 * Regarding WikiProject Books, I already left a notice there and also posted there when the template change was being discussed. The project seems rather inactive judging by the lack of response. I'd be happy to notify the Novels folks too however. --Cyber cobra (talk) 07:36, 18 October 2009 (UTC)
 * And they've now been notified, and I added details to the WP Books posting too. --Cyber cobra (talk) 07:47, 18 October 2009 (UTC)
 * Thanks. I think that is a good starting place for discussion, well, the Infobox book page was probably most important, but alerting other potentially interested editors is important. Yes, this task, other than its US-centric issue, is less controversial in general than the OCLC task which had no discussion, imo.
 * Also, I think the original CobraBot should be reconsidered. --69.225.5.183 (talk) 08:38, 18 October 2009 (UTC)
 * That is rather unrelated to this BRFA and is a separate, orthogonal issue. --Cyber</b> cobra (talk) 05:51, 24 October 2009 (UTC)
 * I support CobraBot2. Infoboxes are stuffed over on the right of a page, and do not interfere with the flow of an article. There is therefore plenty of room for the parameters dewey and congress to be populated by the bot and displayed. HairyWombat (talk) 22:12, 24 October 2009 (UTC)

BAGAssistanceNeeded --<b style="color:#3773A5;">Cyber</b> cobra (talk) 02:24, 25 October 2009 (UTC)
 * Ok. Anomie⚔ 02:31, 25 October 2009 (UTC)
 * Interesting choice of magic number. 37 edits for examination. --<b style="color:#3773A5;">Cyber</b> cobra (talk) 08:05, 25 October 2009 (UTC)
 * I got tired of round numbers. The edits look generally good, although I haven't even tried to check whether the numbers are correct or not. I do wonder, however, whether "[Fic]" is really a useful value in Starship Troopers. Anomie⚔ 22:46, 25 October 2009 (UTC)
 * Yep, fiction has no Dewey number. I guess only non-fiction should be populated. HairyWombat (talk) 03:41, 26 October 2009 (UTC)
 * (tsk-tsk) Read our article on Dewey: "It is a common misconception that all books in the DDC are non-fiction. The DDC has a number for all books, including fiction: American fiction is classified in 813. Most libraries create a separate fiction section to allow shelving in a more generalized fashion than Dewey provides for, or to avoid the space that would be taken up in the 800s." Apparently the database I'm pulling from also does this for some works of fiction. I'll edit I've now edited the code to screen out "[Fic]" for Dewey as it is indeed not very helpful. If anyone knows a better data source for Dewey, let me know. --<b style="color:#3773A5;">Cyber</b> cobra (talk) 04:08, 26 October 2009 (UTC)
 * My mistake. I should have said that fiction has no useful Dewey number. It is no accident that libraries choose to classify fiction using schemes other than Dewey. Classifying the whole of American fiction as "813" is only slightly better than "[Fic]". I still suggest that the dewey parameter only be populated for non-fiction. HairyWombat (talk) 21:12, 26 October 2009 (UTC)
 * Eh, it does subcategorize it somewhat further than that and at least in theory some libraries are actually using it. And it avoids bias towards LC Classification. But if people want fiction screened out from Dewey, I could implement that. --<b style="color:#3773A5;">Cyber</b> cobra (talk) 22:35, 26 October 2009 (UTC)
 * The library at the university I attended used the Dewey numbers for fiction, so at least one library is using it. IMO, just "813" would be useless but the full number would be worth including. Anomie⚔ 22:49, 26 October 2009 (UTC)
 * Fortunately the bot gets the full number (when it's not listed as just [Fic]) --<b style="color:#3773A5;">Cyber</b> cobra (talk) 23:47, 26 October 2009 (UTC)
 * My guess is Anomie's university library didn't have much fiction, and so Dewey worked for them. However, including fiction does no damage—my whole argument has been that there is plenty of room in the Infobox—so let's include fiction. All the bot now has to do is screen out the "[Fic]"s. Do we have a consensus? Can we now unleash the beast? HairyWombat (talk) 02:18, 27 October 2009 (UTC)
 * I couldn't say how many of the books were fiction, but I did find out that there were 4.6 million books in the main library as of a few years ago. Anomie⚔ 03:57, 27 October 2009 (UTC)

No one seems to have objected to the trial this time. Anomie⚔ 11:27, 28 October 2009 (UTC)
 * The above discussion is preserved as an archive of the debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA.