Subject (documents)

In library and information science documents (such as books, articles and pictures) are classified and searched by subject – as well as by other attributes such as author, genre and document type. This makes "subject" a fundamental term in this field. Library and information specialists assign subject labels to documents to make them findable. There are many ways to do this and in general there is not always consensus about which subject should be assigned to a given document. To optimize subject indexing and searching, we need to have a deeper understanding of what a subject is. The question: "what is to be understood by the statement 'document A belongs to subject category X'?" has been debated in the field for more than 100 years (see below)

Charles Ammi Cutter (1837–1903)
For Cutter the stability of subjects depends on a social process in which their meaning is stabilized in a name or a designation. A subject "referred [...] to those intellections [...] that had received a name that itself represented a distinct consensus in usage" (Miksa, 1983a, p. 60) and: the "systematic structure of established subjects" is "resident in the public realm" (Miksa, 1983a, p. 69); "[s]ubjects are by their very nature locations in a classificatory structure of publicly accumulated knowledge (Miksa, 1983a, p. 61). Bernd Frohmann adds:

"The stability of the public realm in turn relies upon natural and objective mental structures which, with proper education, govern a natural progression from particular to general concepts. Since for Cutter, mind, society, and SKO [Systems of Knowledge Organization] stand one behind the other, each supporting each, all manifesting the same structure, his discursive construction of subjects invites connections with discourses of mind, education, and society. The Dewey Decimal Classification (DDC), by contrast, severs those connections. Melvil Dewey emphasized more than once that his system maps no structure beyond its own; there is neither a "transcendental deduction" of its categories nor any reference to Cutter's objective structure of social consensus. It is content-free: Dewey disdained any philosophical excogitation of the meaning of his class symbols, leaving the job of finding verbal equivalents to others. His innovation and the essence of the system lay in the notation. The DDC is a poorly semiotic system of expanding nests of ten digits, lacking any referent beyond itself. In it, a subject is wholly constituted in terms of its position in the system. The essential characteristic of a subject is a class symbol which refers only to other symbols. Its verbal equivalent is accidental, a merely pragmatic characteristic... .... The conflict of interpretations over "subjects" became explicit in the battles between "bibliography" (an approach to subjects having much in common with Cutter's) and Dewey's "close classification". William Fletcher spoke for the scholarly bibliographer.... Fletcher's "subjects", like Cutter's, referred to the categories of a fantasized, stable social order, whereas Dewey's subjects were elements of a semiological system of standardized, techno-bureaucratic administrative software for the library in its corporate, rather than high culture, incarnation". (Frohmann, 1994, 112–113).

Cutter's early view on what a subject is, is probably wiser than most understandings that dominated the 20th century – and also the understanding reflected in the ISO-standard quoted below. The early statements quoted by Frohmann indicate that subjects are somehow shaped in social processes. When that is said, it should be added that they are not particularly detailed or clear. We only get a vague idea of the social nature of subjects.

S. R. Ranganathan (1892–1972)
A classification system with an explicit theoretical foundation is Ranganathan's Colon Classification. Ranganathan provided an explicit definition of the concept of "subject":

"Subject – an organized body of ideas, whose extension and intension are likely to fall coherently within the field of interests and comfortably within the intellectual competence and the field of inevitable specialization of a normal person."

A related definition is given by one of Ranganathan's students:

"A subject is an organized and systematized body of ideas. It may consist of one idea or a combination of several..."

Ranganathan's definition of "subject" is strongly influenced by his Colon Classification system. The colon system is based on the combination of single elements from facets to subject designation. This is the reason why the combined nature of subjects are emphasized so strongly. It leads, however, to absurdities such as the claim that gold cannot be a subject (but is alternatively termed "an isolate"). This aspect of the theory has been criticized by Metcalfe (1973, p. 318). Metcalfe's skepticism regarding Ranganathan's theory is formulated in hard words (op. cit., p. 317): "This pseudo-science imposed itself on British disciples from about 1950 on...".

It seems unacceptable that Ranganathan defines the word subject in a way that favors his own system. A scientific concept like "subject" should make it possible to compare different ways of establishing access to information. Whether or not subjects are combined or not should be examined once their definition has been given, it should not determined a priori, in the definition.

Besides the emphasis on the combined, organizing and systematizing nature of subjects contains Ranganathan's definition of subject the pragmatic demand, that a subject should be determined in a way that suits a normal person's competency or specialization. Again we see a strange kind of wishful thinking mixing a general understanding of a concept with demands put by his own specific system. One thing is what the word subject means, quite another issue is how to provide subject descriptions that fulfill demands such as the specificity of a given information retrieval language which fulfill demands put on the system, such as precision and recall. If researchers too often define terms in ways that favor specific kinds of systems, that are such definitions not useful to provide more general theories about subjects, subject analysis and IR. Among other things are comparative studies of different kinds of systems made difficult.

Based on these arguments, as well as additional arguments which have been used in the literature, we may conclude that Ranganathan's definition of the concept "subject" is not suited for scientific use. Like the definition of "subject" given by the ISO-standard for topic maps, may Ranganathan's definition be useful within his own closed system. The purpose of a scientific and scholarly field is, however, to examine the relative fruitfulness of systems such as topic maps and Colon Classification. For such purpose is another understanding of "subject" necessary.

Patrick Wilson (1927–2003)
In his book Wilson (1968) examined – in particular by thought experiments – the suitability of different methods of examining the subject of a document. The methods were:


 * identifying the author's purpose for writing the document,
 * weighing the relative dominance and subordination of different elements in the picture, which the reading imposes on the reader,
 * grouping or count the document's use of concepts and references,
 * construing a set of rules for selecting elements deemed necessary (as opposed to unnecessary) for the work as a whole.

Patrick Wilson shows convincingly that each of these methods are insufficient to determine the subject of a document and is led to conclude ( p. 89): "The notion of the subject of a writing is indeterminate..." or, on p. 92 (about what users may expect to find using a particular position in a library classification system): "For nothing definite can be expected of the things found at any given position". In connection to the last quote has Wilson an interesting footnote in which he writes that authors of documents often use terms in ambiguous ways ("hostility" is used as an example). Even if the librarian could personally develop a very precise understanding of a concept, he would be unable to use it in his classification, because none of the documents use the term in the same precise way. Based on this argumentation is Wilson led to conclude: "If people write on what are for them ill-defined phenomena, a correct description of their subjects must reflect the ill-definedness".

Wilson's concept of subject was discussed by Hjørland (1992) who found that it is problematic to give up the precise understanding of such a basic term in LIS. Wilson's arguments led him to an agnostic position which Hjørland found unacceptable and unnecessary. Concerning the authors' use of ambiguous terms, the role of the subject analysis is to determine which documents would be fruitful for users to identify whether or not the documents use one or another term or whether a given term in a document is used in one or another meaning. Clear and relevant concepts and distinctions in classification systems and controlled vocabularies may be fruitful even if they are applied to documents with ambiguous terminology.

"Content oriented" versus "request oriented" views
Request oriented indexing is indexing in which the anticipated request from users is influencing how documents are being indexed. The indexer ask himself: "Under which descriptors should this entity be found?" and "think of all the possible queries and decide for which ones the entity at hand is relevant" (Soergel, 1985, p. 230).

Request oriented indexing may be indexing that is targeted towards a particular audience or user group. For example, a library or a database for feminist studies may index documents different compared to a historical library. It is probably better, however, to understand request oriented indexing as policy based indexing: The indexing is done according to some ideals and reflects the purpose of the library or database doing the indexing. In this way it is not necessarily a kind of indexing based on user studies. Only if empirical data about use or users are applied should request oriented indexing be regarded as a user-based approach.

The subject knowledge view
Rowley & Hartley (2008, p. 109) wrote "In order to achieve good consistent indexing, the indexer must have a thorough appreciation of the structure of the subject and the nature of the contribution that the document is making to the advancement of knowledge within a particular discipline". This is accordance with Hjørland's definition given above.

Other views and definitions
In the ISO-standard for topic maps the concept of subject is defined this way:

"Subject Anything whatsoever, regardless of whether it exists or has any other specific characteristics, about which anything whatsoever may be asserted by any means whatsoever." ISO 13250-1, here cited from draft: http://www1.y12.doe.gov/capabilities/sgml/sc34/document/0446.htm#overview)

This definition may work well with the closed system of concepts provided by the topic maps standard. In broader contexts, however, is not fruitful because it does not contain any specification of what to identify in a document or in a discourse when ascribing subject identification terms or symbols to it. If different methods of subject analysis imply different results, which of these results can then be said to reflect the (true) subject? (Given that the expression "a true subject assignment" is meaningful at all, which is an important part of the problem). Different persons may have different opinions about what the subject of a specific document is. How can a theoretical understanding of the term "subject" be helpful deciding principles of subject analysis?

Indexing words versus concepts versus subjects
A proposal for the differentiation between concept indexing and subject indexing was given by Bernier (1980). In his opinion subject indexes are different from, and can be contrasted with, indexes to concepts, topics and words. Subjects are what authors are working and reporting on. A document can have the subject of Chromatography if this is what the author wishes to inform about. Papers using Chromatography as a research method or discussing it in a subsection do not have Chromatography as subjects. Indexers can easily drift into indexing concepts and words rather than subjects, but this is not good indexing. Bernier does not, however, differentiate author's subjects from those of the information seeker. A user may want a document about a subject, which is different from the one intended by its author. From the point of view of information systems, the subject of a document is related to the questions that the document can answer for the users (cf. the distinction between a content oriented and a request-oriented approach).

Isness
"The FRSAR Working Group is aware that some controlled vocabularies provide terminology to express other aspects of works in addition to subject (such as form, genre, and target audience of resources). While very important and the focus of many user queries, these aspects describe isness or what class the work belongs to based on form or genre (e.g., novel, play, poem, essay, biography, symphony, concerto, sonata, map, drawing, painting, photograph, etc.) rather than what the work is about." (IFLA, 2010, p. 10).

Ofness
"Those LIS authors who have focused on the subjects of visual resources, such as artworks and photographs, have often been concerned with how to distinguish between the "aboutness" and the "ofness" (both specific and generic depiction or representation) of such works (Shatford, 1986). In this sense, "aboutness" has a narrower meaning than that used above. A painting of a sunset over San Francisco, for instance, might be analyzed as being (generically) "of" sunsets and (specifically) "of" San Francisco, but also "about" the passage of time." (IFLA, 2010, p. 11). See also: Baca & Harpring (2000) and Shatford (1986).