User:Jakob.scholbach/zeteo/Parsing

BibTex files (e.g. from MathSciNet or Zentralblatt_MATH)
 @article { AUTHOR = {Puppe, Dieter}, TITLE = {Homotopie und Homologie in abelschen Gruppen- und Monoidkomplexen. I. II}, JOURNAL = {Math. Z.}, FJOURNAL = {Mathematische Zeitschrift}, VOLUME = {68}, YEAR = {1958}, PAGES = {367--406, 407--421}, ISSN = {0025-5874}, }
 * example

(address goes to Publisher.location, fjournal goes to Journal.name, journal goes to Journal.abbrname, issn to Journal.issn)
 * supported fields: author, title, series, publisher, address, year, isbn, pages, number, fjournal, journal, issn, volume, mrnumber


 * similar examples: @incollection, @inproceedings, @book; for @incollection, additional fields: booktitle (->title) and title goes to chapter.
 * removes { and }, replaces \"a and similar Umlaute and other diacritics by their UTF-equivalent
 * if there is a mrnumber, it goes to the id (using the template).

supported fields: last, first, authorlink, author, coauthors, title, chapter, doi, journal, edition, url, series, publisher, location, year, date, accessdate, origdate, accessdate, origdate, isbn, volume, id, pages

example  

supported fields: last, first, author, authorlink,coauthors, title, url, doi, journal, series, issue, publisher, location, year, accessdate, origdate, accessdate, origdate, isbn, volume, id, pages


 * is currently not supported

supported fields: last, first, author-link, last1, first1, author1-link etc., editor-last, editor-first, editor-link, editor1-last etc., title, edition, chapter, series, journal, issue, pages, publisher, location, accessdate, origdate, accessdate, origdate, year, date, isbn, volume, edition, doi, id, url, contribution (goes to chapter), contribution-url (goes to chapter-url)

XML files
example

 0-071391401  Harrison, Tinsley Randolph; Dennis L. Kasper Harrison's principles of internal medicine McGraw-Hill Medical Publishing Division New York 2005 0-071391401 

(source: User:Diberri's tool)

supported fields: author, title, series, publisher, location, year, pages, oclc, doi, isbn, volume

Warning
In some cases there will be a warning requesting the user's attention:
 * The title contains $ or \ (likely to be the result of LaTEX formatting or other incomplete parsing of BibTex entries)
 * An author, publisher or journal does not (yet) exist in the database.
 * The volume is non-numeric (warning because this is likely due to misplaced edition information etc.).
 * 'Ed' or 'Edition' occur in the title (edition should be input into the appropriate field).
 * '(', ')' or 'Eds' or 'Editors' occur in the author (likely this is an editor instead of an author, or something like '(transl.)', which should be put in the others field.
 * 'Ch.' or 'Chapter' occurs in chapter.

None of these warning precludes saving the item as it is. If no warning occurs, the item will be saved without further notice.

Author strings
Author strings will be parsed using the following algorithm:
 * 1) the string is separated into wikilinks and other strings.
 * 2) the wikilinks are parsed into caption (which is then parsed into firstname and name) and wikilink. Inside wikilinks, there is must be only one author
 * 3) remaining strings are separated by ";"
 * 4) the tokens of this are parsed as follows:
 * strings without any spaces or commas are understood as "name"
 * strings with commas are separated along the commas and every one is treated separatedly. if one of these tokens contains only single characters, ".", "-" or spaces, it is considered to be the firstname of the preceding author.
 * tokens without comma: the last word is the "name", the rest is the "firstname". if here "name" is actually a firstname (i.e. only single characters, . and - and spaces) and "firstname" is not, then the two will be swapped

example:

Sommerfeld J, [[Branko Grünbaum|Grünbaum, Branko] ] ;Shephard, G. C., Klaus, Hansen, J.-P, Sommer

goes to
 * name=Sommerfeld, firstname=J
 * name=Grünbaum, firstname=Branko, wikilink=Branko Grünbaum
 * name=Shephard, firstname=G. C.
 * name=Klaus
 * name=Hansen, firstname=J.-P
 * name=Sommer

Parsing of internal Wikilinks
If the title, author, publisher or journal contains a Wikilink, it will be extracted automatically (in the case of author, publisher and journal only when the journal does not yet exist) to the appropriate field (e.g. the wikilink of the reference or the author etc.). Mixed titles etc. are possible (see below), but only one wikilink is possible.

In addition, if the title is a Wiki-URL-link (e.g. Wikipedia ), then the URL will be parsed to the url field of the item, and the true title will be preserved. Mixed titles ( The english Wikipedia... ) is also allowed (will give url=http://en.wikipedia.org, and title=The english Wikipedia...). However only one URL is allowed, i.e. nothing like Wikipedia is an encyclopedia ').

Parsing of URL in title or chapter
if the title or chapter contains an external URL-link, the URL will be put to the url field instead

ISBN
If the id is something like "ISBN 1-234-56789-0", this will be put to ISBN instead of id.