Wikipedia:Bots/Requests for approval/RjwilmsiBot 6


 * The following discussion is an archived debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA. The result of the discussion was Symbol keep vote.svg Approved.

RjwilmsiBot 6
Operator:

Time filed: 20:14, Wednesday January 5, 2011 (UTC)

Automatic or Manually assisted: Automatic

Programming language(s): AWB, C#

Source code available: AWB

Function overview: Reformat 10 digit ISBNs, or 13 digit ISBNs without dashes to use the 13-digit dashed ISBN standard per WP:ISBN.

Links to relevant discussions (where appropriate): WP:ISBN

Edit period(s): On release of database dump

Estimated number of pages affected: Thousands, TBC.

Exclusion compliant (Y/N): Y

Already has a bot flag (Y/N): Y

Function details: For each ISBN (whether as isbn in a citation template, or as raw ISBN using WikiMagic) if the ISBN is a 10 digit version, or the 13 digit version with no dashes, convert to the 13 digit version with dashes. This is per WP:ISBN, the dashes convey information and ISBN-13 is now preferred. I will use a database dump to get all the ISBNs used on en wikipedia namespace 0, then use this service to convert ISBN 10 to ISBN 13 in bulk. I will then use the Worldcat API to have the ISBN-13 formatted with dashes in the right place.

I do not propose to update existing ISBN-13 entries with dashes, I'll assume they're already correct. Invalid ISBN-10s will be rejected by the bulk tool, and invalid ISBN-13s will be rejected by the worldcat API, so I can be sure to only update correct ISBNs.

I'll maintain a full log of original ISBN and the one it's replaced with.

An example of the operation would be diff, though I did that one manually.

Discussion
Fully support this task. -- Magioladitis (talk) 20:37, 5 January 2011 (UTC)

Is ISBN-10 deprecated and should it be necessarily replaced by with ISBN-13? What's the rationale here? I don't see WP:ISBN mentioning this. — HELL KNOWZ  ▎TALK 20:46, 5 January 2011 (UTC)
 * ISBN-13 contains the info of ISBN-10 anyway. ISBN-13 gives more information. -- Magioladitis (talk) 21:12, 5 January 2011 (UTC)
 * Besides EAN? Is using ISBN-10 on WP ambiguous? Do 10 and 13 (minus EAN and checkdigit obv.) ever not match? — HELL KNOWZ  ▎TALK 21:24, 5 January 2011 (UTC)
 * ISBN-10 is deprecated in that ISBN-13 has been introduced as the new ISO standard and new books use that. Also most websites (check Amazon for example) now dual list ISBN-10 and ISBN-13.
 * As described at ISBN, if the 9 common digits didn't match in the ISBN-10 and ISBN-13 one or both would be invalid. The two conversion tools I would be using would both be checking for invalid ISBNs, so I would not change them.
 * The addition of the appropriate dashes adds information, and complies with the ISBN user manual (referenced in the ISBN article). I suppose we could convert from ISBN-10 with no dashes to ISBN-10 with the correct dashes, but if doing that it would seem sensible to update to ISBN-13 at the same time, per recommendation at WP:ISBN and using the current standard. Rjwilmsi  22:01, 5 January 2011 (UTC)
 * It is O.K. with me as a minor edit. ISBN is slowly deprecating (though fully supported), ISBN-13 is the new form, dashes add better separation of units (i.e. publisher, title, etc.) But I still don't see any actual problems that would happen if we used ISBN-10 or omitted dashes -- i.e. WP:NOTBROKEN (yet anyway). I guess we can call this "standardization of ISBN syntax". Still, the change is "cosmetical", as the actual output change is minimal and the links still point to the same source.
 * That said, . But I am hesitant to approve this as a stand-alone task just yet. — HELL KNOWZ  ▎TALK 22:36, 5 January 2011 (UTC)

Some questions. 1) has the hyphenation been defined for the entire 979 grouping of ISBN? If not, what does your bot do with it? 2) What does your bot do with invalid lengths, checksums and X in the wrong place? 3) Are instances of the text ISBN-10 and ISBN-13 changed to ISBN by the bot?Naraht (talk) 00:20, 6 January 2011 (UTC)
 * 1) Hyphenation depends on a lookup table, hence the use of the worldcat API to do that for me. If it returns an unhyphenated ISBN I'll skip it. 2) All will be skipped since one or both of the tools I'll be using will mark them as invalid (I could log them but WP:CHECKWIKI already seems to list such errors). 3) I was not planning to do that, but could do that in addition. Rjwilmsi  00:29, 6 January 2011 (UTC)
 * On reflection I think some editors may object to ISBN-10 to ISBN-13 conversion if the physical copy they own only lists the former, even though the two numbers are equivalent, so for the moment I've not done the ISBN-10 to ISBN-13 conversion (I will post on the talk page of WP:ISBN what is meant by the ISBN-13 being available, since any ISBN-10 can be converted). So I have done the trial on dashes formatting without ISBN-10 to ISBN-13 conversion. We could approve dashes without ISBN-13 conversion, or both dashes and ISBN-13 conversion. And yes, this is mainly cosmetic, but it is to comply with an agreed standard, and I have approval for an earlier task of page range dash formatting, which is also cosmetic but useful. Rjwilmsi  12:56, 9 January 2011 (UTC)
 * Ah, I see, so both ISBN10 and ISBN13 dash positions matter? I always though the dashes in ISBN10 were always at the same location (guess I just happened to have books like that). WP:ISBN really needs to be clearer on this. Anyway, I suppose in this sense, this is a contextual change and not just cosmetic. — HELL KNOWZ  ▎TALK 17:43, 9 January 2011 (UTC)

 MBisanz  talk 10:03, 28 January 2011 (UTC)
 * The above discussion is preserved as an archive of the debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA.