User:Daniel Mietchen/Talks/JATS-Con 2014/References

Markup within references sections is inconsistent, beyond just the variation of citation styles.

The following examples were discovered in 2011, during initial development of the Open Citations database. Most have still not been corrected.

Bad identifiers used in reference sections
Particularly problematic are incorrect article identifiers.


 * Adjacent references to two (ostensibly) different articles, with the same journal/volume/issue/page and the same DOI: see #15 and #16 here. In the tweet about these: "Hopefully we can feed corrections back into PMC."
 * Article with DOIs swapped between two references: see #52 and #72 here. (tweet)
 * Article that cites three different papers with the same DOI (which is not the correct DOI for any of them): see #15, #40, and #49 here. In PMC, the DOIs were removed. (tweet).
 * 44 more examples of articles that have references to different articles, but the IDs used are the same.

Markup of citations to reference ranges is inconsistent
Often in the body of an article, there appears a citation to a range of articles that are listed in the references section.

Here are two ways that these might be marked up (there are others):

Note that the JATS standard doesn't provide any guidance on this. The PMC Tagging Guidelines specifies the second form above.

Other problems

 * Number in an article title that is interpreted as the publication year. See #21 here.  The year is given as 7942!  In the title, "PCC" should be "PCC7942".
 * Authors' names misspelled or missing in citations (more frequent for 'unusual' names¹)
 * Mistranscribed citation titles, including missing or wrong punctuation, prefixes or suffices