User:Aymatth2/Citing books

Recently an editor took the time to carefully remove "unnecessary" information from a list of source books at the end of an article I had started. This got me into a debate with that editor, some research, and then another debate on the talk page of WP:CITE. This essay gives some thoughts, in part based on those discussions.

Accuracy of books
Most books are good sources, more accurate than typical websites, but not always. I once found three serious books that said Pan American Petroleum was sold to Standard Oil of New Jersey in 1932, which seemed unlikely. It turned out the overseas assets were sold in 1932, but not the American parent company. Three books using the same source made the same mistake. A travel guide may give a good description of a town but an inaccurate history based on notes jotted down in a bar. Self-published eBooks are similar to blogs. A book may present the opinions of an author who is sure it is turtles all the way down. There could be typos. A book is like any source. If an editor uses reputable authors and avoids implausible assertions they will reduce but not eliminate errors.

ISBN
International Standard Book Numbers (ISBNs) were invented in 1965 and rapidly caught on. Almost every book printed since 1970 has an ISBN. Few booksellers will accept a book without one. In theory, a new ISBN is assigned for any new edition of a book apart from the most trivial corrections, so an ISBN could be used as a shorthand equivalent to a full description of the book. From the ISBN we could derive the title, author, year and publisher. Unfortunately, some publishers do not follow the rules. They treat ISBN as a product code, and as long as a new edition of the book is essentially the same product they keep the same ISBN. They may even use the same ISBN after they have made significant changes to the text, added a new introduction (which may be what you are citing), changed the font size and renumbered the pages.

ISBNs still have value. They provide links to the special "Book sources" page in Wikipedia, which in turn links to Google, Amazon, Worldcat, Goodreads and other indexes. The indexes may describe different editions from the one being cited, but they allow the reader to check whether the book can be purchased online, to see if it is available in their local library and to see what other readers have said about it. If the book has an ISBN it should be included in the citation.

Metadata
When you find a book on the Google or Amazon.com website, the description will include "metadata": title, author, year, publisher, number of pages, ISBN and so on. It is not reliable. Amazon may create an entry before the physical book is available and may guess the number of pages. 128 and 256 are common guesses. Amazon and Google may mix metadata from an early version of a book with scanned pages from a later edition, or vice-versa. Google often garbles diacritics. Google is not alone. Internet Archive has a scanned version of Les Illustrations et les câelâebritâes du XIX siáecle, volume 1, 1882.

If an editor has found a full view page in Google Books but cannot view the page at the front that holds the publishing information, they may use "Book sources" to confirm that the metadata is plausible. There is no guarantee that it matches the edition they are citing. Despite all this, the metadata is probably not wildly inaccurate. It should be provided. With an online link to the scanned page the editor is saying "here is where I got the information, and here is what Google (or Amazon) says the source is." That is better than not giving any metadata.

However, the metadata may be completely wrong. For an extreme example of a mix-up in a scanned book, see Somehow a chapter called The Louvre and its public persona, 1848-52 by Gabriel P. Weisberg has found its way into the scanned version of this work.

With compilations and encyclopedias the author of a chapter or entry may not be among the authors listed in the metadata. If only as a courtesy, you should give the title of the chapter or entry you are citing, and try to track down the author. With a compilation, the author may be given at the start or end of the chapter or may be found in the table of contents. With an encyclopedia entry, the author's initial may be given after the entry, with a section at the start or end of the encyclopedia listing the author for each set of initials. If you cannot find the name, at least give the initials.

Online links
Some would say that the printed book is being cited, and as long as it is correctly identified there is no need to provide an online link, or url. That may be true if you are looking at the printed version and not an online scan or transcription, but the url is convenient to readers who want to see what the book said. If you are looking at an online version, the url points to a copy of the edition that is being cited. The metadata may not be accurate. "TronicBooks.com says this is the June 2006 edition. Whether that is accurate, here is the page I saw." A link may not work for some users or some countries, but that is no reason to withhold it from all users. As with the ISBN, it should be supplied if available.

A url is fragile. Websites often restructure their content. The old urls no longer work or, worse, point to something completely different. Occasionally Google will mix the metadata from one book with scanned pages from another. The result of a correction may be confusing. For this reason, I provide an access date with any url, saying when I found the online version, not just where. It may be argued that sites like Amazon or Google are not archived, and perhaps cannot be archived for copyright reasons, so the access date will not help if the url goes dead. But I like to think there is a huge server farm hidden somewhere that holds the entire web and all its history. It is better to say when and where you saw the online page than to omit information and frustrate some future historian of the primitive internet.

Transcribed and digital books
The French National Assembly provides a scanned version of Robert & Cougny (1891) Dictionnaire des parlementaires français de 1789 à 1889 with transcribed versions of many of the more recent entries, such as Pierre Magne]. The Bavarian Academy of Sciences provides transcribed versions of over 48,000 historical and biographical articles in the Allgemeine Deutsche Biographie and the Neue Deutsche Biographie, such as Rupp, Julius Friedrich Leopold. Transcription errors may have compounded errors in the original text, but a digital version of an entry in a reference book may contain corrections planned for the next printed version – if there ever is another printed version. The online Dictionary of Canadian Biography contains entries that have never been printed. You are citing the online version of the book, not the printed book, which may not be the same. Give the url and accessdate so the reader or reviewer can see what you were looking at.

Google Books may display a paginated version of an eBook rather than a scanned version of the printed book. The url contains &pg=PTnn rather than &pg=PAnn (all parameters after this can and should be dropped). Thus https://books.google.ca/books?id=aPV1johx8NMC&pg=PT518 rather than https://books.google.ca/books?id=S0TLPvG_PwYC&pg=PA174. It is misleading to treat the PTnn value as the page number. If there is a printed version associated with the ISBN, it will have different pagination. Instead, the location in the eBook can be shown as, which makes a footnote like ''' 5. ^Smith 2012, PT24. ''' Assuming the url and accessdate are given in the source definition, this accurately defines the location in the book you are citing. If you want to cite more than one page in an eBook, you have a problem. Google will give a PTnn value for one of the pages that holds your search term. It may let you view other pages that hold your search term, but will not show where they are in the book. The only solution seems to be to copy a string of text from the second page you want to cite, then search for that string of text in Google Books. With luck, you will find the book again, with the PTnn value of the second page. This is not a very satisfactory solution. Any views on better ways to identify a location in an eBook would be welcome.

Bottom line
Books may contain errors. ISBN does not always accurately identify a book edition. Metadata supplied by online sites is often incorrect. Online links do not always work. Digital books present new issues. The best we can do is give all the information we have. "Here is what I found, when and where I found it, and who apparently wrote and published it. You can judge whether to accept it or not."