User:Daniel Mietchen/Talks/JATS-Con 2014/Recommendations

This section contains a summary and update of the recommendations from our JATSCon paper.

Change PMC recommended license tagging
The PMC Tagging Guidelines (see Licensing information and license element) should specify that the license URI must be in one canonical place: the @xlink:href attribute of the   element. No URIs in the license text, whether or not enclosed in an  element, should be construed as the license URI.

So, the usage example "license with URI in the license text":

 This article is distributed under the terms of the Creative Commons Attribution License (       http://creativecommons.org/licenses/by/4.0/ ), which permits unrestricted use and redistribution provided that the original author and source are credited.

Should be changed so that the license URI is also included in the license/@xlink:href attribute:

 This article is distributed under the terms of the Creative Commons Attribution License (       http://creativecommons.org/licenses/by/4.0/ ), which permits unrestricted use and redistribution provided that the original author and source are credited.

Currently, the Tagging Guidelines specifies that the URI can be in one of these two places, but not both. The reason for this is to avoid ambiguity. However, if it is made clear that any URI appearing within the license text should not be construed as the identifier for this license, then that removes the potential for ambiguity.

Why:
 * The text content is intended for humans, and the content should not be restricted. So, publishers should be able to, for example, refer to other licenses without worrying that the references will be misconstrued.
 * It makes it easier for content reusers, who only have to look in one place for the license URI.

The JATS standard should also include this recommendation
Ideally, the recommendation above should also be included in the JATS standard, preferrably as a specification, but alternatively as a "best practice" recommendation.

The element description of the JATS standard (for example, in Journal Publishing Draft 1.1d1) should be updated.

Harmonize with NISO Open Access Media Indicators
NISO is also in the process of creating a standard for "Open Access Metadata and Indicators". As soon as those recommendations are finalized, the JATS standard should be updated to resolve any conceptual or semantic differences between the two, either by
 * Incorporating the OAMI elements and attributes within JATS, or
 * Defining a simple mapping mechanism between the two vocabularies.

Add examples and recommendations for tagging public domain works
Technically, a "public domain dedication", such as Creative Commons CC0, is not a license. This can lead to doubt and ambiguities about how they should be tagged. We recommend that examples of tagging these should be added both to the JATS standard and to the PMC Tagging Guidelines.

Add automatic check to PMC style checker
The PMC Style Checker could be enhanced to check the validity of any creativecommons.org URIs that appear in that attribute. Creative Commons provides a web service for this purpose (see, for example, this service call). A warning should be issued for unrecognized URIs.

Fix tagging for multiple licenses
Currently, the Tagging Guidelines say (here), "When tagging a license whose terms change over time, tag the information in 1   with multiple ." This should be changed such that, whenever there are multiple licenses, they each get their own independent   tag. We provide an example of our recommended markup here in our paper.

Funder mandates
Funders of research that mandate open-access should stipulate that the license metadata must be tagged according to these best practices.

PMC should fix license data
We recommend that PMC both fix the existing content in their repository, as well as improve their ingest workflow to ensure that new content conforms to these recommendations.

PMC derives and stores fairly accurate license data for each received article, that is determined in part by business rules and publisher agreements, and does not depend heavily on the markup in the source XML. Consistency checks could be added to verify that this derived license data matches that in the markup.

The recommendations above wouldn't be much use unless existing content were repaired to conform to these recommendations. Otherwise, reuse tools would still have to account for all of the existing variability.

If PMC were to fix existing content, then that would also have the effect of improving the accuracy of the search-by-license feature.

Expose license data
The license data that does reside in the PMC system should be better exposed. For example, we recommend that it be added to the E-utilities esummary output for articles, for both PubMed and PMC.

Establish best practice for media type markup
Consensus needs to be reached on the best practice for marking up media types. Either
 * 1) Use both @mimetype and @mime-subtype:  , or
 * 2) Use @mimetype alone:

Option (2) is better, but there might be too much legacy content that uses @mime-subtype to make this change.

Both the JATS standard and the PMC Tagging Guidelines should be updated to explicitly state this best practice.

Comment #444 has been submitted to update the JATS Tag Library: Use of @mimetype.

PMC should fix media types in their XML content
Again, as with license metadata, it would be ideal if PMC could fix the media types in the XML of both their existing content, and in any new content that they receive.

The PMC Style Checker should be updated to enforce the best practice described above.

Both the form (i.e. which attributes to use) and the actual values (i.e. whether the declared media type matches the actual media type of the file) should be checked by PMC when it receives new articles, and fixed if necessary.

PMC could enhance their media type checking utility (which is based on the open-source Unix file command and libmagic library) to apply it to data and media files that accompany articles for ingest.

Finally, it would be nice if PMC could fix their existing content to comply with this best practice.

Allow keyword tags to be used at sub-article level
We agreed with the suggestion, made on comment #252 on the NISO site, that the  element be allowed in additional locations within documents, including   and .

In their responses of November, 2013, the standing committee agreed to those suggestions, and the  element was allowed within the content of several other elements. This is reflected in the latest draft tag sets. See, for example, Journal Publishing Draft 1.1d1. (Note that for some of those elements, it is available one level down, as a child of .)

PMC should allow fixes to be fed back into their system
While working on the OAMI, we uncovered many problems in the markup of the OA subset content. It would be great if others interested in reuse could benefit from the work that we've already done. One way to accomplish this would be for PMC to implement some system whereby outside users could send patches to existing content in a semi-automated fashion. The changes would still need to be approved, of course, but if there were a clean API, and a simple system for presenting and allowing approval of changes, then perhaps this would be feasible.

Another possibility would be a system to host the XML files in an updatable environment with public versioning. Indeed, eLife and Pensoft have demoed this on GitHub.

About our recommendations, "one way to tag things"
Our recommendations are mostly along the lines of establishing one way to tag certain things. These things all fall in the category of machine-readable metadata. We contend that communicating one best-practice way to tag these specific metadata items is not overly restrictive or burdensome, and doesn't prevent publishers from having variation, where variation is important. That is, most content providers want to tag things according to best practices, but they need guidance.

Open source at PMC

 * What about open-sourcing the tagging guidelines, the stylechecker, and the validation services?

Better web platform for NISO
To enable more effective feedback, discussion, and collaboration between NISO, the JATS standing committee, and the community of users and developers.

In it's current form, the public comments page on the NISO site is not conducive to back-and-forth collaboration with the JATS community. For instance, there is no way to "watch" the list, so as to receive notifications when new comments are submitted. There is no discussion thread feature, so it is not possible for the standing committee to ask for clarifications, or for the submitter to give refinements.

The JATS mailing list, could be a good forum for these sorts of discussions, but suggestions and comments on the list are not "formal" in the sense that the comments on the NISO site are. Also, the list tends to be used more for discussions for users of the tag set, rather than those interested in suggesting improvements to the existing standard.

For one example of the kind of thing that could be done, see this W3C community group.

Keep reuse in mind
See next page.

Further details

 * In the paper