User talk:Jakob.scholbach/Archives/2007/September

Online sources for references
(copied from my talk page)

Hi KSmrq,

as you are apparently interested in correct citing of references, I have a question for you. I'm still working on zeteo, a database for references. I'm adding a feature which gives a list of websites where a user could find more appropriate information/the full article etc. What are notable online sources for references you are using? (Currently I have books.google.com, digizeitschriften.de, archive.org). Thanks. (By the way, it really takes an eternity to load your talk page.......) Jakob.scholbach 21:26, 12 September 2007 (UTC)


 * (I'm moving the conversation here to reduce the eternity.) We have a page full of reference resources for mathematics editors at WikiProject Mathematics/Reference resources. Almost everything I use I have listed there. In fact, your tool is also listed.
 * You didn't ask, but …: One of the things I still deem essential for zeteo is certification. I go to a great deal of trouble to get the details right in my citations; most writers do not. I am unwilling to lower my standards for convenience. In others' writing I often I find incorrect page numbers, incorrect dates, mysterious (to me) abbreviations, misspelled names, and so on. And the data given is usually much less complete than I can accept: missing book ISBNs and journal ISSNs, missing links to online copies, and much more. We do no one a service if we repeat incomplete and wrong information, so we need to know the source of each entry and/or who has vouched for it. And, as I proposed some time ago, certification must be associated with a specific version. Too often I have seen mistakes introduced by editors who did not know better; I cannot trust a certification that does not specify a version. --KSmrqT 21:49, 12 September 2007 (UTC)


 * OK. I know what you are talking about. Critically scanning references in some dozens of articles, I found that lots of them are either incorrect or incomplete etc. That is certainly an issue. I think I will work on a version history for each reference entry, together with a checkbox to say "verified content". To introduce another field "source" is certainly no big deal either. User management, however, is yet another burden for me to get done. I'm definitely committed to do everything to make this database successful, but at the moment the response from the Wikiproject community is not too overwhelming. So I'm a bit hesitating setting up a database with every fancy feature, but eventually used by very few people.


 * In practice, it is practically pretty easy to get complete and correct information: just go to MathSciNet or Zentralblatt and parse the BibTex entry given there. They are usually in good shape. However, it is not allowed to do that on a bigger scale, it seems. But, at least, everybody in the project can provide his/her personal BibTex files and we can add this information. Another option is the ISBN database. It usually provides correct answers. Given the splitting of reference information into reference details, author, publisher, and journal details, I actually think that inconsistencies can be ruled out effectively.


 * I'm currently waiting for CBM's help to extract all the template information from the math articles. Then, these refs probably need a careful scanning with respect to correctness etc. Adding the corrected/completed items to the db is the next step. Once the db contains some thousands of items, especially basic books which are cited often, the db will give the standard editor at least the benefit of reducing the stupid and tedious work of hacking the ref templates into the articles.Jakob.scholbach 13:32, 15 September 2007 (UTC)

Now I have at hand, thanks to Carl's help, the reference templates pulled out of the math articles. The total no. is about 7.000. To assess the quality of the data, I randomly chose 20 items, which is of course not in any sense statistically significant, but a good heuristic. I disregarded web citations, as these won't be stored by the DB. I checked the information using scholar.google, books.google and isbndb.com. Here are the results:
 * 20 items, of which 11 books, 8 journals, one thesis.
 * 3 items contained one wrong/mal-formatted field (twice the year, once the journal name).
 * 3 books lacked an ISBN, 3 journals lacked the issue information.
 * the average item contains 6.3 correct fields (not counting firstname/name of one person twice).


 * ISBN hyphenation is generally well-done. ISSN numbers don't appear at all (at journal citations).

Overall, I come to the conclusion, that your (and also my) apprehensions are not exactly fulfilled. Overall the data is pretty decent. One of the reasons might be that less diligent editors don't use the templates at all and just hack their
 * Author, title

A few of the items were really perfect (including URL or DOI), most were effectively OK. None of the items was misleading or junk. So replying to your "We do no one a service if we repeat incomplete and wrong information", I'm sure we won't end up repeating wrong information. In few cases we will have minor incompleteness. I will add a field "source" to the database. On preparing the items, I will make sure that the source field contains the article name of the page where the information was from. Additionally, I will add a "verification" field. Perhaps I will do something like "not verified", "partially verified", "fully verified". I would grant these items the status "partially verified", having in mind that the articles were exposed to people reading them and also having in mind the above little statistics, which convinces me of the decent status of the stuff.

So, I will proceed to get the items in good format. If you have any other ideas how to make the outcome as good as possible, please let me know. I think striving for perfection (wrt the content and the functionality of the DB) is good and necessary, but which endeavour in history was already perfect from the very start? Jakob.scholbach 10:44, 16 September 2007 (UTC)


 * An easy but not watertight solution instead of user management is to add a "comments" field. For instance, if it says "verified by KSmrq" then I can be confident that it's correct. There are probably more situations in which such a field can be useful. -- Jitse Niesen (talk) 01:45, 17 September 2007 (UTC)


 * Yes, I have done this for now. Version history is to come, hopefully soon. There is also a "verification" field, allowing to choose "not verified" (default), "verified" or "verified and complete". Jakob.scholbach 13:36, 22 September 2007 (UTC)

Harvard refs
For the links in Harvard refs to work properly you need to add the year as well, as in. R.e.b. 02:03, 25 September 2007 (UTC)
 * OK. Jakob.scholbach 07:50, 25 September 2007 (UTC)