Text annotation

Text annotation is the practice and the result of adding a note or gloss to a text, which may include highlights or underlining, comments, footnotes, tags, and links. Text annotations can include notes written for a reader's private purposes, as well as shared annotations written for the purposes of collaborative writing and editing, commentary, or social reading and sharing. In some fields, text annotation is comparable to metadata insofar as it is added post hoc and provides information about a text without fundamentally altering that original text. Text annotations are sometimes referred to as marginalia, though some reserve this term specifically for hand-written notes made in the margins of books or manuscripts. Annotations have been found to be useful and help to develop knowledge of English literature.

Annotations can be both private and socially shared, including hand-written and information technology-based annotation. Annotations are different than notetaking because annotations must be physically written or added on the actual original piece. This can be writing within the page of a book or highlighting a line, or, if the piece is digital, a comment or saved highlight or underline within the document. For information on annotation of Web content, including images and other non-textual content, see also Web annotation.

History
Text annotation may be as old as writing on media, where it was possible to produce an additional copy with a reasonable effort. It became a prominent activity around 1000 AD in Talmudic commentaries and Arabic rhetorics treaties. In the Medieval era, scribes who copied manuscripts often made marginal annotations that then circulated with the manuscripts and were thus shared with the community; sometimes annotations were copied over to new versions when such manuscripts were later recopied.

With the rise of the printing press and the relative ease of circulating and purchasing individual (rather than shared) copies of texts, the prevalence of socially shared annotations declined and text annotation became a more private activity consisting of a reader interacting with a text. Annotations made on shared copies of texts (such as library books) are sometimes seen as devaluing the text, or as an act of defacement. Thus, print technologies support the circulation of annotations primarily as formal scholarly commentary or textual footnotes or endnotes rather than marginal, handwritten comments made by private readers, though handwritten comments or annotations were common in collaborative writing or editing.

Computer-based technologies have provided new opportunities for individual and socially shared text annotations that support multiple purposes, including readers' individual reading goals, learning, social reading, writing and editing, and other practices. Text annotation in Information Technology (IT) systems raises technical issues of access, linkage, and storage that are generally not relevant to paper-based text annotation, and thus research and development of such systems often addresses these areas.

Functions and applications
Text annotations can serve a variety of functions for both private and public reading and communication practices. In their article "From the Margins to the Center: The Future of Annotation," scholars Joanna Wolfe and Christine Neuwirth identify four primary functions that text annotations commonly serve in the modern era, including: (1)"facilitat[ing] reading and later writing tasks," which includes annotations that support reading for both personal and professional purposes; (2)"eavesdrop[ping] on the insights of other readers," which involves sharing of annotations; (3)"provid[ing] feedback to writers or promote communication with collaborators," which can include personal, professional, and education-related feedback; and (4)"call[ing] attention to topics and important passages," for which scholarly annotations, footnotes, and call-outs often function. Regarding the ways that annotations can support individual reading tasks, Catherine Marshall points out that the ways that readers annotate texts depends on the purpose, motivation, and context of reading. Readers may annotate to help interpret a text, to call attention to a section for future reference or reading, to support memory and recall, to help focus attention on the text as they read, to work out a problem related to the text, or create annotations not specifically related to the text at all.

Educational applications
Educational research in text annotation has examined the role that both private and shared text annotations can play in supporting learning goals and communication. Much educational research examines how students' private annotation of texts supports comprehension and memory; for example, research indicates that annotating texts causes more in-depth processing of information, which results in greater recall of information. Because annotations are done while reading with a writing utensil in hand, readers are supposed to be more aware of their thoughts as they read. This means that readers are, along with making notes to help them remember or better understand the content, actively engaged during the activity and are therefore more receptive to the information when annotating a text.

Other areas of educational research investigate the benefits of socially shared text annotations for collaborative learning, both for paper-based and IT-based annotation sharing. For example, studies by Joanna Wolfe have investigated the benefits of exposure to others' annotations on student readers and writers. In a 2000 study, Wolfe found that exposing students to others' annotations influenced their perceptions of the annotators, which in turn shaped their responses to the material and their written products. In a later study, Wolfe found that viewing others' written comments on a paper text, especially pairs of annotations that present opposing responses to the text, can help students engage in the type of critical reading and stance-taking necessary for effective argumentative writing.

While shared annotations can benefit individual readers, "since the 1920s, literacy theory has increasingly emphasized the importance of social factors in the development of literacy." Thus, shared annotations can not only help one to better understand the content of a particular text, but may also aid in the acquirement of literacy skills. For example, a mother may leave marks inside a book to draw the attention of her child to a particular theme or concept; thanks to the development of audio annotations, parents may now leave notes for children who are just starting to read and may struggle with textual annotations.

More recent research in the effects of shared text annotations has focused on the learning applications for web-based annotation systems, some of which were developed based on design recommendations from studies outlined above. For example, Ananda Gunawardena, Aaron Tan, and David Kaufer conducted a pilot study to examine whether annotating documents in Classroom Salon, a web-based annotation and social reading platform, encouraged active reading, error detection, and collaboration in a computer science course at Carnegie Mellon University. This study suggested a correlation between students' overall performance in the course and their ability to identify errors in a text that they annotated in Classroom Salon; it also found that students were likely to change their annotations in response to annotations made by others in the course.

Similarly, the web-based annotation tool HyLighter was used in a first-year writing course and shown to improve the development of students' mental models of texts, including supporting reading comprehension, critical thinking, and the ability to develop a thesis. The collaboration with peers and experts around a shared text improved these skills and brought the communities' understanding closer together.

A meta-analysis of empirical studies into the higher-education uses of social annotation (SA) tools indicates such tools have been tested in several courses, among them English, sport psychology, and hypermedia. Studies have indicated that social annotation functions, including commenting, information sharing, and highlighting, can support instruction designed to foster collaborative learning and communication, as well as reading comprehension, metacognition, and critical analysis. Several studies indicated that students enjoyed using social annotation tools, and that it improved motivation in the course.

"Multi Sensory" annotations have also been found to help students retain not only information in the classroom, but this can also help those who are trying to learn a new language. Images can be placed next to or linked to words for people to get a better understand of what that word means by looking at it. The same can be done with an audio clip of how that word is pronounced and also its meaning. Of course this is easier done using technology and in order to be specifically an annotation it must be embedded within the referenced document. However in physical copies of text a picture can be drawn next to a word and still be a sensory annotation. This form of annotation furthers comprehension, specifically in the classroom because it requires more of students' brains to retain the information being given.

Writing and text-centered collaboration
Text annotations have long been used in writing and revision processes as a way for reviewers to suggest changes and communicate about a text. In book publishing, for example, the collaboration of authors and editors to develop and revise a manuscript frequently involves exchanges of both in-line revisions or notes as well as marginal annotations. Similarly, copyeditors often make marginal annotations or notes that explain or suggest revisions or are directed at the author as questions or suggestions (commonly called "queries"). Asynchronous collaborative writing and document development often depend on text annotations as a way not only to suggest revisions but also to exchange ideas during document development or to facilitate group decision making, though such processes are often complicated by the use of different communication technologies (such as phone calls or emails as well as document sharing) for distinct tasks. Text annotations can also function to allow group or community members to communicate about a shared text, such as a doctor annotating a patient's chart.

Much research into the functionality and design of collaborative IT-based writing systems, which often support text annotation, has occurred in the area of computer-supported cooperative work.

Linguistic annotation
In corpus linguistics, digital philology and natural language processing, annotations are used to explicate linguistic, textual or other features of a text (or other digital representations of natural language). In linguistics, annotations include comments and metadata; non-transcriptional annotations are also non-linguistic.

In these disciplines, annotations are the basis for quantitative research, empirical studies and the application of machine learning. Unlike annotations in the above-mentioned uses (that appear very sparsely), linguistic annotation usually requires that every element (token) within a text carries one or multiple annotations, and that complex relations between different annotations exist. A number of specialized formats (and tools) for this purpose exist, the following illustrates an annotation with as used in the Universal Dependencies project. For clarity, the tab-separated values normally used have been replaced by an HTML table. Ud-ewt-sample.png annotation, English Web Treebank, visualization by Brat]

A visualization of the example is given in Fig. 2. In addition to word-level annotations, the word (and the sentence, etc.) in this format can carry metadata.

Various other annotation formats do exist, often coupled with certain pieces of software for their creation, processing or querying, see Ide et al. (2017) for an overview. The Linguistic Annotation Wiki describes tools and formats for creating and managing linguistic annotations. Selected problems and applications are also discussed under Overlapping markup and Web annotation. Aside from tab-separated values and other text formats, formats for linguistic annotations are often based on markup languages such as XML (and formerly, SGML), more complex annotations may also employ graph-based data models and formats such as JSON-LD, e.g., in accordance with the Web Annotation standard.

Linguistic annotation comes with an independent research tradition and its own terminology: The target of an annotation is usually referred to as a 'markable', the body of the annotation as 'annotation', the relation between annotation and markable is usually expressed in the annotation format (e.g., by having annotations and text side-by side), so that explicit anchors are not necessary.

Structure and design
Research in the design and development of annotation systems uses specific terminology to refer to distinct structural components of annotations and also distinguishes among options for digital annotation displays.

Annotation structure
The structural components of any annotation can be roughly divided into three primary elements: a body, an anchor, and a marker. The body of an annotation includes reader-generated symbols and text, such as handwritten commentary or stars in the margin. The anchor is what indicates the extent of the original text to which the body of the annotation refers; it may include circles around sections, brackets, highlights, underlines, and so on. Annotations may be anchored to very broad stretches of text (such as an entire document) or very narrow sections (such as a specific letter, word, or phrase). The marker is the visual appearance of the anchor, such as whether it is a grey underline or a yellow highlight. An annotation that has a body (such as a comment in the margin) but no specific anchor has no marker.

Annotation display types
IT-based annotation systems utilize a variety of display options for annotations, including: Annotation interfaces may also allow highlighting or underlining, as well as threaded discussions. Sharing and communicating through annotations anchored to specific documents is sometimes referred to as anchored discussion.
 * Footnote interfaces that display annotations below the corresponding text
 * Aligned annotations that display comments and notes vertically in the text margins, sometimes in multiple columns or as a "sidebar" layer
 * Interlinear annotations that attach annotations directly into a text
 * Sticky note interfaces, where annotations appear in popup dialogs over the source text
 * Voice annotations, in which reviewers record annotations and embed them within a document
 * Pen or digital-ink based interfaces that allow writing directly on a document or screen

IT-based text annotation systems
IT-based annotation systems include standalone and client-server systems. In the 1980s and 1990s, a number of such systems were built in the context of libraries, patent offices, and legal text processing. Their design led researchers to produce taxonomies of annotation forms. Text annotation research has taken place at several institutions, including Xerox research centers in Palo Alto and Grenoble (France), the Hitachi Central Research Lab (in particular for annotation of patents), and in relation with the construction of the new French National Library between 1989 and 1995 at the Institut de Recherche en Informatique de Toulouse and in the company AIS (Advanced Innovation Systems).

Annotation functionality has been present in text processing software for many years through inline notes displayed as pop-ups, footnotes, and endnotes; however, it is only recently that functionality for displaying annotations as marginalia has appeared in programs such as OpenOffice.org/LibreOffice Writer and Microsoft Word. Personal or standalone annotation include word processing software that supports embedded or anchored text annotations as well as Adobe Acrobat, which in addition to commenting allows highlights, stamps, and other types of markup.

Web-based text annotation systems
Tim Berners-Lee had already implemented the concept of directly editing web documents in 1990 in WorldWideWeb, the first web browser, but later ported versions removed this collaborative ability. An early version of NCSA Mosaic in 1993 also included a collaborative annotation capability, though it was quickly removed. Web Distributed Authoring and Versioning, WebDAV, was then reintroduced as an extension.

A different approach to distributed authoring consists in first gathering many annotations from a wide public, and then integrate them all in order to produce a further version of a document. This approach was pioneered by Stet, the system put in place to gather comments on drafts of version 3 of the GNU General Public License. This system arose after a specific requirement, which it served egregiously, but was not so easily configurable as to be convenient for annotating any other document on the web. The co-ment system uses annotation interface concepts similar to Stet's, but it is based on an entirely new implementation, using Django/Python on the server side and various AJAX libraries such as JQuery on the client side. Both Stet and co-ment are licensed under the GNU Affero General Public License.

Since 2011, the non-profit Hypothes Is Project has offered the free, open web annotation service Hypothes.is. The service features annotation via a Chrome extension, bookmarklet or proxy server, as well as integration into a LMS or CMS. Both webpages and PDFs can be annotated. Other web-based text annotation systems are collaborative software for distributed text editing and versioning, which also feature annotation and commenting interfaces.

Specialized Web-based text annotations exist in the context of scientific publication, either for refereeing or post-publication. The on-line journal PLoS ONE, published by the Public Library of Science, has developed its own Web-based system where scientists and the public can comment on published articles. The annotations are displayed as pop-ups with an anchor in the text.