Zellig Harris

Zellig Sabbettai Harris (October 23, 1909 – May 22, 1992) was an influential American linguist, mathematical syntactician, and methodologist of science. Originally a Semiticist, he is best known for his work in structural linguistics and discourse analysis and for the discovery of transformational structure in language. These developments from the first 10 years of his career were published within the first 25. His contributions in the subsequent 35 years of his career include transfer grammar, string analysis (adjunction grammar), elementary sentence-differences (and decomposition lattices), algebraic structures in language, operator grammar, sublanguage grammar, a theory of linguistic information, and a principled account of the nature and origin of language.

Biography
Harris was born on October 23, 1909, in Balta, in the Podolia Governorate of the Russian Empire (present-day Ukraine). He was Jewish. In 1913 when he was four years old his family immigrated to Philadelphia, Pennsylvania. At age 13, at his request, he was sent to live in Palestine, where he worked to support himself, and for the rest of his life he returned frequently to live on a socialist kibbutz in Israel. His brother, Dr Tzvi N. Harris, with his wife Shoshana, played a pivotal role in the understanding of the immune system and the development of modern immunology. His sister, Anna H. Live, was Director of the English Institute (for ESL students) at the University of Pennsylvania (now named the English Language Program). In 1941, he married the physicist Bruria Kaufman, who was Einstein's assistant in the 1950s at Princeton. In the 1960s the couple established residence in kibbutz Mishmar Ha'Emek, in Israel, where they adopted their daughter, Tamar. From 1949 until his death, Harris maintained a close relationship with Naomi Sager, director of the Linguistic String Project at New York University. Their daughter, Eva Harris, is a professor of Infectious Diseases at the University of California, Berkeley.

Harris died in his sleep after a routine working day at the age of 82 on May 22, 1992, in New York.

Linguistics
From the outset of his early work in the 1930s, Harris was concerned with establishing the mathematical and empirical foundations of the science of language then emerging. He saw that one could not 'explain' language (Saussure's parole) by appeal to a priori principles or competencies (langue) for which language itself provides the sole evidence. "The danger of using such undefined and intuitive criteria as pattern, symbol, and logical a prioris, is that linguistics is precisely the one empirical field which may enable us to derive definitions of these intuitive fundamental relationships out of correlations of observable phenomena."

Early career and influences
Harris received his bachelor's (1930), master's (1932), and doctoral (1934) degrees in the Oriental Studies department of the University of Pennsylvania. Although his first direction was as a Semiticist, with publications on Ugaritic, Phoenician, and Canaanite, and on the origins of the alphabet; and later on Hebrew, both classical and modern, he began teaching linguistic analysis at Penn in 1931. His increasingly comprehensive approach saw practical application as part of the war effort in the 1940s. In 1946–1947 he formally established what is said to be the first modern linguistics department in the United States.

Harris's early publications brought him to the attention of Edward Sapir, who strongly influenced him and who came to regard him as his intellectual heir. Harris also greatly admired Leonard Bloomfield for his work and as a person. He did not formally study with either.

Relation to "Bloomfieldian" structuralism
It is widely believed that Harris carried Bloomfieldian ideas of linguistic description to their extreme development: the investigation of discovery procedures for phonemes and morphemes, based on the distributional properties of these units and of antecedent phonetic elements. His Methods in Structural Linguistics (1951) is the definitive formulation of descriptive structural work as he had developed it up to about 1945. This book made him famous, but generativists have sometimes interpreted it as a synthesis of a "neo-Bloomfieldian school" of structuralism.

Rather, Harris viewed his work as articulating methods for verifying that results, however reached, are validly derived from the data of language. This was in line with virtually all serious views of science at the time; Harris's methods corresponded to what Hans Reichenbach called "the context of justification," as distinct from "the context of discovery." He had no sympathy for the view that to be scientific a linguistic analyst must progress by stepwise discovery from phonetics, to phonemics, to morphology, and so on, without "mixing levels."

Fundamental to this approach, and, indeed, making it possible, is Harris's recognition that phonemic contrast cannot be derived from distributional analysis of phonetic notations but rather that the fundamental data of linguistics are speakers' judgments of phonemic contrast. He developed and clarified methods of controlled experiment employing substitution tests, such as the pair test (Harris 1951:32) in which informants distinguish repetition from contrast. It is probably accurate to say that phonetic data are regarded as fundamental in all other approaches to linguistics. For example, Chomsky (1964:78) "assume[s] that each utterance of any language can be uniquely represented as a sequence of phones, each of which can be regarded as an abbreviation for a set of features". Recognizing the primacy of speaker perceptions of contrast enabled remarkable flexibility and creativity in Harris's linguistic analyses which others - without that improved foundation - labelled "game playing" and "hocus-pocus."

Henry Hoenigswald tells us that in the late 1940s and the 1950s Harris was viewed by his colleagues as a person exploring the consequences of pushing methodological principles right to the edge. As a close co-worker put it

Zellig Harris's work in linguistics placed great emphasis on methods of analysis. His theoretical results were the product of prodigious amounts of work on the data of language, in which the economy of description was a major criterion. He kept the introduction of constructs to the minimum necessary to bring together the elements of description into a system. His own role, he said, was simply to be the agent in bringing data in relation to data. ... But it was not false modesty that made Harris downplay his particular role in bringing about results, so much as a fundamental belief in the objectivity of the methods employed. Language could only be described in terms of the placings of words next to words. There was nothing else, no external metalanguage. The question was how these placings worked themselves into a vehicle for carrying the 'semantic burden' of language. ... His commitment to methods was such that it would be fair to say that the methods were the leader and he the follower. His genius was to see at various crucial points where the methods were leading and to do the analytic work that was necessary to bring them to a new result.

This, then, is an extension and refinement of the distributional methodology pioneered by Sapir and Bloomfield, investigating which elements of a language can co-occur and which cannot. Given a representation in which contrasting utterances (non-repetitions) are written differently, even a conventional alphabetic orthography, stochastic procedures amenable to statistical learning theory identify the boundaries of words and morphemes. In practice, of course, linguists identify words and morphemes by a variety of intuitive and heuristic means. These again are substitution tests. Given words and morphemes, the general method is experimental as follows: substitute one element in a string of such elements, the others in its context being held constant, and then test the acceptability of the new combination, either by finding it in a corpus or by testing its acceptability by users of the language.

Harris's experimental distributional methodology is thus grounded in the subjective judgments of language users: judgments as to repetition vs. imitation, yielding the fundamental data of phonemic contrast, and judgments as to acceptability. Substitution tests using these judgments as criteria identify the "departures from randomness" that enable language to carry information. This is in contrast to the commonly held view that Harris, like Bloomfield, rejected mentalism and espoused behaviorism.

Major contributions in the 1940s
Harris's contributions to linguistics as of about 1945 as summarized in Methods in Structural Linguistics (Harris 1951) include componential analysis of long components in phonology, componential analysis of morphology, discontinuous morphemes, and a substitution-grammar of word- and phrase-expansions that is related to immediate-constituent analysis, but without its limitations. With its manuscript date of January 1946, the book has been recognized as including the first formulation of the notion of a generative grammar.

The overriding aim of the book, and the import of the word "methods" in its original title, is a detailed specification of validation criteria for linguistic analysis. These criteria lend themselves to differing forms of presentation that have sometimes been taken as competing. Harris showed how they are complementary. (An analogy may be drawn to intersecting parameters in optimality theory.) "It is not that grammar is one or another of these analyses, but that sentences exhibit simultaneously all of these properties." Harris's treatment of these as tools of analysis rather than theories of language, and his way of using them to work toward an optimal presentation for this purpose or that, contributed to the perception that he was engaged in "hocus-pocus" with no expectation that there was any absolute truth to the matter.

Harris's central methodological concern beginning with his earliest publications was to avoid obscuring the essential characteristics of language behind unacknowledged presuppositions, such as are inherent in conventions of notation or presentation. In this vein, among his most illuminating works in the 1940s are restatements of analyses by other linguists, done with the intention of displaying properties of the linguistic phenomena which are invariant across diverse representations This anticipates later work on linguistic universals. Also very relevant here is his work on transfer grammar, which presents the intersection of the grammars of two languages, clarifying precisely those features in which they differ and the relation between corresponding such features. This has obvious benefits for machine translation.

Metalanguage and notational systems
The basis of this methodological concern was that unacknowledged presuppositions, such as are inherent in conventions of notation or presentation, are dependent upon prior knowledge of and use of language. Since the object of investigation is language itself, properties of language cannot be presupposed without question-begging. "We cannot describe the structure of natural language in some other kind of system, for any system in which we could identify the elements and meanings of a given language would have to have already the same essential structure of words and sentences as the language to be described." "[W]e cannot in general impose our own categories of information upon language. ... We cannot determine in an 'a priori' way the 'logical form' of all sentences ... ", etc.

Natural language demonstrably contains its own metalanguages, in which we talk about language itself. Any other means for talking about language, such as logical notations, depends upon our prior shared 'common parlance' for our learning and interpreting it. To describe language, or to write a grammar, we cannot rely upon metalinguistic resources outside of the intrinsic metalinguistic resources within language, "for any system in which we could identify the elements and meanings of a given language would have to have already the same essential structure of words and sentences as the language to be described." "There is no way to define or describe the language and its occurrences except in such statements said in that same language or in another natural language. Even if the grammar of a language is stated largely in symbols, those symbols will have to be defined ultimately in a natural language."

From this observation there followed Harris's conclusion that a science that aims to determine the nature of language is limited to investigation of the relationships of the elements of language to one another (their distribution). Indeed, beginning with the fundamental data of linguistics, the phonemic contrasts, all the elements are defined relative to one another.

Any metalinguistic notions, representations, or notational conventions that are not stateable in metalanguage assertions of the language itself import complexity that is not intrinsic to language, obscuring its true character. Because of this, Harris strove for a 'least grammar'. "The reason for this demand is that every entity and rule, and every complexity and restriction of domains of a rule, states a departure from randomness in the language being described. Since what we have to describe is the restriction on combinations in the language, the description should not add restrictions of its own."

The hypothesis of Universal Grammar (UG) amounts to the contrary proposal that (some) metalinguistic resources for language are in fact a priori, prior to and external to language, as part of the genetic inheritance of humans. Insofar as the only evidence for properties of UG are in language itself, Harris's view was that such properties cannot be presupposed, but they may be sought once a principled theory of language is established on a purely linguistic basis.

Linguistics as applied mathematics
Deriving from this insight, Harris's aim was to apply the tools of mathematics to the data of language and establish the foundations of a science of language. "[The] problem of the foundations of mathematics was more topical than ever just at the time when Harris took charge of the 'homologous' enterprise of establishing linguistics on a clear basis." "We see here then nearly fifty years during which, to realize the program that he established very early, Zellig Harris searched and found in mathematics some of his supports. This merits closer attention, and it is doubtless advisable to consider it without shutting it into the reductive box of 'possible applications of mathematics to linguistics.' Is not the question rather 'how could a little mathematics transmute itself into linguistics?'" He contrasted this with attempts by others to project the properties of language from formal language-like systems. "The interest ... is not in investigating a mathematically definable system which has some relation to language, as being a generalization or a subset of it, but in formulating as a mathematical system all the properties and relations necessary and sufficient for the whole of natural language."

Transformational structure in language
As early as 1939, Harris began teaching his students about linguistic transformations. They had immediate utility to enhance the regularity of repetition patterns in texts (discourse analysis). By 1946 he had already done extensive transformational analysis in diverse languages such as Kota, Hidatsa, and Cherokee, and Hebrew (ancient and modern) as well as English, but he did not feel this was ready for publication until his "Culture and Style" and "Discourse Analysis" papers in 1952. A later series of papers beginning with "Co-occurrence and Transformations in Linguistic Structure" (1957) developed a more general theory of syntax.

Harris argued, following Sapir and Bloomfield, that semantics is included in grammar, not separate from it, form and information being two faces of the same coin. A particular application of the concern about presuppositions and metalanguage, noted above, is that any specification of semantics other than that which is immanent in language can only be stated in a metalanguage external to language (which would call for its own syntactic description and semantic interpretation).

Prior to Harris's discovery of transformations, grammar as so far developed could not yet address individual word combinations, but only word classes. A sequence or ntuple of word classes (plus invariant morphemes, termed constants) specifies a subset of sentences that are formally alike. Harris investigated mappings from one such subset to another in the set of sentences. In linear algebra, a mapping that preserves a specified property is called a transformation, and that is the sense in which Harris introduced the term into linguistics. Harris's transformational analysis refined the word classes found in the 1946 "From Morpheme to Utterance" grammar of expansions. By recursively defining semantically more and more specific subclasses according to the combinatorial privileges of words, one may progressively approximate a grammar of individual word combinations.

One form in which this is exemplified is in the lexicon-grammar work of Maurice Gross and his colleagues

This relation of progressive refinement was subsequently shown in a more direct and straightforward way in a grammar of substring combinability resulting from string analysis (Harris 1962).

Noam Chomsky was Harris's student, beginning as an undergraduate in 1946. Rather than taking transformations in the algebraic sense of mappings from subset to subset, preserving inter-word restrictions, Chomsky adapted the notion of rules of transformation vs. rules of formation from mathematical logic. The terms originated with Rudolf Carnap. He was also introduced to the symbol-rewriting rules of Post production systems invented some years earlier by Emil Post. Their capacity to generate language-like formal systems was beginning to be employed in the design of computing machines, such as ENIAC, which was announced at Penn with great fanfare as a "giant brain" in 1946. Chomsky employed rewrite rules as a notation for presentation of immediate-constituent analysis. He called this phrase structure grammar (PSG). He set about to restate Harris's transformations as operations mapping one phrase-structure tree to another. In his conception, PSG provided the rules of formation which were 'enriched′ by his rules of transformation. This led later to his redefinition of transformations as operations mapping an abstract deep structure into a surface structure. This very different notion of transformation creates a complex hierarchy of abstract structure which under Harris's original definition was neither necessary nor desirable. Inter-word dependencies suffice to determine transformations (mappings in the set of sentences), and many generalizations that seem of importance in the various theories employing abstract syntax trees, such as island phenomena, fall out naturally from Harris's analysis with no special explanation needed.

"In practice, linguists take unnumbered short cuts and intuitive or heuristic guess, and keep many problems about a particular language before them at the same time". Early work on transformations used paraphrase as a heuristic, but in keeping with the methodological principles noted above in the section on metalanguage issues and earlier, there is also a formal criterion for transformational analysis. In the 1957 "Co-Occurrence and Transformation" paper this criterion was that inter-word co-occurrence restrictions should be preserved under the mapping; that is, if two sentence-forms are transforms, then acceptable word choices for one also obtain for the other. Even while the 1957 publication was in press it was clear that preservation of word co-occurrence could not resolve certain problems, and in the 1965 "Transformational Theory" this criterion was refined, so that if a difference of acceptability is found between a pair of sentences that satisfy one sentence-form, the corresponding satisfiers of the other sentence-form are likewise differentiated (though in some contexts, e.g. under "I imagined" or "I dreamt", acceptability-differences may be collapsed). These acceptability gradings may also be expressed as ranges of contexts in which the word choices are fully acceptable, a formulation which leads naturally to sublanguage grammar (below).

Operator grammar
Harris factored the set of transformations into elementary sentence-differences, which could then be employed as operations in generative processes for decomposing or synthesizing sentences. These are of two kinds, the incremental operations which add words, and the paraphrastic operations which change the phonemic shapes of words. The latter, Harris termed "extended morphophonemics". This led to a partition of the set of sentences into two sublanguages: an informationally complete sublanguage with neither ambiguity nor paraphrase, vs. the set of its more conventional and usable paraphrases ("The Two Systems of Grammar: Report and Paraphrase" 1969). In the paraphrastic set, morphemes may be present in reduced form, even reduced to zero; their fully explicit forms are recoverable by undoing deformations and reductions of phonemic shape.

Thence, in a parallel to the generalization of linear algebra to operator theory in mathematics, he developed Operator Grammar. Here at last is a grammar of the entry of individual words into the construction of a sentence. When the entry of an operator word on its argument places words in the relationship to one another that a given reduction requires, it may be carried out. (The reductions are rarely obligatory). Operator entry is trivial to formalize. It resembles predicate calculus, and has affinities with Categorial Grammar, but these are findings after the fact which did not guide its development or the research that led to it. Recent work by Stephen Johnson on formalization of operator grammar adapts the "lexicon grammar" of Maurice Gross for the complex detail of the reductions.

Sublanguage and linguistic information
In his work on sublanguage analysis, Harris showed how the sublanguage for a restricted domain can have a pre-existent external metalanguage, expressed in sentences in the language but outside of the sublanguage, something that is not available to language as a whole. In the language as a whole, restrictions on operator-argument combinability can only be specified in terms of relative acceptability, and it is difficult to rule out any satisfier of an attested sentence-form as nonsense, but in technical domains, especially in sublanguages of science, metalanguage definitions of terms and relations restrict word combinability, and the correlation of form with meaning becomes quite sharp. It is perhaps of interest that the test and exemplification of this in The Form of Information in Science (1989) vindicates in some degree the Sapir–Whorf hypothesis. It also expresses Harris's lifelong interest in the further evolution or refinement of language in context of problems of social amelioration (e.g., "A Language for International Cooperation" [1962], "Scientific Sublanguages and the Prospects for a Global Language of Science" [1988]), and in possible future developments of language beyond its present capacities.

Harris's linguistic work culminated in the companion books A Grammar of English on Mathematical Principles (1982) and A Theory of Language and Information (1991). Mathematical information theory concerns only quantity of information, or, more exactly, the efficiency of communication channels; here for the first time is a theory of information content. In the latter work, also, Harris ventured to propose at last what might be the "truth of the matter" about the nature of language, what is required to learn it, its origin, and its possible future development. His discoveries vindicate Sapir's recognition, long disregarded, that language is pre-eminently a social artifact, the users of which collectively create and re-create it in the course of using it.

Honors
For his work, Harris was an elected member of the American Philosophical Society (1962), the American Academy of Arts and Sciences (1965), and the United States National Academy of Sciences (1973).

Legacy
The influence of Harris's work is pervasive in linguistics, often invisibly. Diverse lines of research that Harris opened continue to be developed by others, as indicated by contributions to (Nevin 2002a, 2002b). The Medical Language Processor developed by Naomi Sager and others in the Linguistic String Program in the Courant Institute of Mathematical Sciences (NYU) has been made available on SourceForge. Richard Kittredge and his colleagues have developed systems for automatic generation of text from data, which are used for weather radio broadcasts and for production of reportage of stock market activity, sports results, and the like. Work on information retrieval has been influential in development of the Lexis-Nexis systems and elsewhere.

Recent work on Statistical semantics, in particular Distributional semantics, is based on the Distributional hypothesis, and explicitly acknowledges the influence of Harris's work on distributional structure.

Harris's students in linguistics include, among many others, Joseph Applegate, Ernest Bender, Noam Chomsky, William Evan, Lila R. Gleitman, Michael Gottfried, Maurice Gross, James Higginbotham, Stephen B. Johnson, Aravind Joshi, Michael Kac, Edward Keenan, Daythal Kendall, Richard Kittredge, James A. Loriot/Lauriault, Leigh Lisker, Fred Lukoff, Paul Mattick Jr., James Munz, Bruce E. Nevin, Jean-Pierre Paillet, Thomas Pynchon, Ellen Prince, John R. Ross, Naomi Sager, Morris Salkoff, Thomas A. Ryckman, and William C. Watt.



Politics
Harris was also influential with many students and colleagues, though in a less public way, in work on the amelioration of social and political arrangements. He was committed all his life to radical transformation of society, but from the ground up rather than by revolution directed from the top down. His last book — The Transformation of Capitalist Society — summarizing his beliefs, was published posthumously. In it, he claims that capitalism abandons those personal and social needs which are unprofitable, that cooperative arrangements arise for meeting those needs, that participants in these niches gain experience in forms of mutual aid which are crucial to survival in 'primitive' societies but which have been suppressed where they are inconvenient for the requirements of capitalist and that these should be fostered as seed points from which a more humane successor to capitalism can arise. He states that these are unnoticed and disregarded by functionaries of capitalism, much as capitalism developed from mercantilism in the midst of feudalism. This book, whose manuscript Harris had titled "Directing Social Change,’’ was brought to publication in 1997 by Seymour Melman, Murray Eden, and Bill Evan. Some of Harris’s unpublished writings on politics are in a collection at the Van Pelt Library of the University of Pennsylvania.

From his undergraduate days he was active in a student left-Zionist organization called Avukah (Hebrew "Torch"). He resigned as its national President in 1936, the year he obtained the Ph.D., but continued in a leadership advisory role until, like many other student organizations in the war years, it fell apart in 1943. From the early 1940s he and an informal group of fellow scientists in diverse fields collaborated on an extensive project called "A Frame of Reference for Social Change." They developed new concepts and vocabulary on grounds that the existing ones of economics and sociology presuppose and thereby covertly perpetuate capitalist constructs, and that it is necessary to 'unfool' oneself before proceeding. This was submitted to Victor Gollancz, a notoriously interventionist editor, who demanded a complete rewrite in more familiar terms.