User:KYPark/003

''' A DIRECT APPROACH TO INFORMATION RETRIEVAL

Table of Contents''' WHAT WHY HOW 1. INTRODUCTION 2. THE LINE OF ATTACK 3. SYSTEMS VS. USERS 3.1 Discrimination 3.2 Prediction 4. DOCUMENTS VS. SURROGATES 5. THE THEORY OF INTERPRETATION 5.1 Denotation and Connotation 5.2 The Theory of Ogden and Richards 5.3 Implications for Information Retrieval 6. PROPOSAL FOR FILE ORGANIZATION 6.1 Incentives 6.2 Extracts as Indexing Sources 6.3 Extracts as Review Sources 7. CONCLUSION 8. REFERENCES

3.1 Discrimination
The user needs information. Even if he is seeking for it in a document, he may be little conscious of the physical form of document. For he may understand the notion of information in much the same way as a housewife in the marketplace does.

The user can easily and quite properly speak of pertinent, relevant, useful, or valuable information, expressing somewhat different shades of meaning. All these similar qualifiers, however, seem to be more or less redundant, for the user would appreciate information as such only when it is relevant to his specific purpose. Furthermore, it seems certain that these qualifiers are used only ambiguously.

Also, it is hard to say that the user, having found extremely valuable information in the library, must have in mind any economic sense at all. Of course, anyone would be perfectly right to understand information essentially as something "bought, sold, stored, treated, exchanged and consumed in economic terms," if there exists that sort of information. Another way of understanding8 is such that:


 * ...the information content of signals is not to be regarded as a commodity; it is more a property or potential of the signals, and as a concept it is closely related to the idea of selection, or discrimination.

Obviously, this quotation represents our common understanding. However, we know the fact that, by understanding or defining something in a particular way, one specifies in effect one's readiness or intention to communicate with other people as to the thing defined. And those people are expected or invited to share the same understanding. Unfortunately, some people would not or could not agree with the suggested definition, however authoritative, because it is absurd, unnecessary, out of their concern, or for some other reasons. In this case, communication is likely to become a conflicting argument apart from what is to be communicated; actually the breakdown of communication.

Communication between the retrieval system and the user as illustrated in Figure 1, is of secondary importance to the user. It is only intermediary or necessary for another communication of primary importance, that is, communication of information with the author of a document. Therefore misconception of information may damage both communications.

From the psychological point of view, Stevens10 attempted to generalize communication by defining it as "the discriminatory response of an organism to a stimulus." This definition was criticized by Cherry8 on the grounds that communication is essentially the relationship established between stimuli and responses. It does not follow however that the definition is wrong. It simply focuses on the communication event at the receiving end; for example, the discriminatory response of the user to a given document.

Information seeking presupposes satisfaction of information needs. The user will be really satisfied only when he finds information relevant to his need. Naturally, he discriminates what he has received in the light of his need and criteria. Namely, relevance judgment. Nothing can stop him from being subjective and tough in the judgment. He may even totalize his relevance criteria. Now it is well known that relevance judgments are so complicated depending on not only the subject matter but also many other things.11 It is noteworthy that the real judge is the user. If a panel of judges were to take his place, it may serve some practical purposes, but for a shift. We shall reserve the pure notion of relevance for those systems that aim to provide relevant information.

Many studies have shown that informal communication is very popular among scientists, especially among those who are eminent. This phenomenon is quite convincing. However, it is mistaken that informal communication is superior to formal communication. At least, informal communication does not pay much attention to the "social responsibility" or morality, if you like, which was emphasized by Bernal5. It should be said that each represents a different machinery of communication, not degrading the other. Presumably, all the past experience of an eminent scientist in formal communication (e.g., through books, journals, lectures, libraries, and so on), must have enabled him to shortcircuit to only the essence, say, a few words of suggestion. This shortcircuiting may be applied to formal communication.

Among many others, Jahoda, et al.12 observed that 66% of faculty members interviewed in one university maintained personal indexes, that 42% of them regarded preparation of indexes as too time-consuming, and that 32% complained of inconsistency in indexing. No doubt, a large portion of scientists spend much time in preparing their own systems, i.e., personal indexes or the like, and they are themselves suffering from so-called retrieval problems. Therefore, anything that can help them solve the problems and improve their own systems would be duly appreciated. On the other hand, naive and primitive as they may be, personal indexes must be worth careful study in order to learn how scientists extend their retrieval facilities toward systems. An important suggestion to all kinds of retrieval systems may be found there.

From the user's point of view, any retrieval system may be regarded as an extension of his information-seeking facilities. The user can satisfy his information need to some extent for himself, instead of delegating retrieval to what may be called outside systems as opposed to personal means. In this respect, it is questionable whether outside systems, however elaborate, can promise better satisfaction than personal means which are familiar to the user.

3.2 Prediction
When the user delegates retrieval to the system, there must be some agreement, although tacit, between both parties about the way in which the system is designed to act on behalf of the user. A set of constraints is characteristic of any system whatsoever. It is beyond these constraints that the retrieval system is not expected to answer and the user is not normally allowed to ask. However, these obvious constraints seem to be often disguised and overlooked. (Note that at the moment we are talking about the current systems regardless of their future developments.) This aspect has been convincingly discussed by Fairthorne in many of his writings.

Then, what is the agreement? The straightforward answer is that the user should agree that the system works on the basis of subject similarity rather than relevance. The distinction between these two apparently similar notions should be made clear.

If any two readers are compared as to what each of them recognizes from the same document, both will be different in general from each other. For everyone tends to interpret subjectively what is written or said about. These individual interpretations may be superimposed for many readers, in order to separate the densely overlapping thus explicit meaning from the relatively subjective and more or less implicit meanings.

We can fairly reasonably say that in interpreting a document, the indexer tends to behave in an unanimous way; in principle he can discern from a document most of the explicit meaning. The implicit meaning that he may discern in addition would be negligible considering the large number of potential users. As opposed to the indexer, the user tends to interpret in a subjective way; he may not find any information from the explicit meaning but elsewhere. Various users may be very different from each other in finding information according to their past experiences and present state of mind.

Not attempting to be precise, let us associate the explicit, unanimous interpretation with subject similarity. And let us associate the subjective, individual interpretation with relevance. Perhaps we cannot discuss similarity without unanimity or commonality of interpretation; nor relevance without subjectivity or individuality of interpretation. Fairthorne7 distinguishes them as extensional aboutness and intensional aboutness. We shall return to this distinction later.

It may be said that subject similarity is a necessary condition for relevance and that relevance is a sufficient condition for subject similarity. In this respect, the system which operates even ideally on the basis of subject similarity, or in an unanimous way, is liable to two types of error, that is, to miss relevant documents and to retrieve non-relevant documents. It cannot turn aside relevance by any means. Thus it can only predict relevance, however ideal in recognizing subject similarity.

Strictly speaking, relevance is a priori from the system's point of view. It is true in the sense that relevance criteria, however precisely stated by the user, cannot wholly be accepted due to the system constraints, so the accepted part may not be sufficient in the end. It is all the more so because relevance criteria, however readily accepted by the system, cannot affect indexing retrospectively. To borrow Fairthorne's contention7:


 * An indexer does not and cannot index all the ways in which a document will interest all kinds of readers, present and future.

We have still another reason to believe in the a priori characteristic of relevance. A great deal of experimental as well as operational experience in retrieval has been accumulated at least over the past twenty years. Retrieval languages and devices must have been greatly improved. Nevertheless, how much has been learned about how the user judges a relevant document* as such? Obviously not much.

If so understood, relevance must have been overemphasized in evaluating retrieval systems. Especially, comparison of different systems might have been unfair to some; for it is not certain whether or not subject similarity goes parallel with relevance. How retrieval systems are to be evaluated and compared should be made clear first; in terms of either subject similarity or relevance or both. The evaluation solely based on subject similarity would not tell much about how to give more satisfaction to the user; that solely based on relevance would not tell much about how to keep the agreement with the user better. Collation of both evaluation will be necessary to know about the relationship between the system and the user.


 * * Here we are mainly concerned with retrieving documents and the sort of information which is obtainable from the documents. We may attribute relevance to a document; but only indirectly, that is, after information has been discovered from it.

AFTERTHOUGHTS

 * See also
 * Subject matter
 * Subject similarity
 * Relevance judgment
 * Social responsibility