Lexical integrity hypothesis

The lexical integrity hypothesis (LIH) or lexical integrity principle is a hypothesis in linguistics which states that syntactic transformations do not apply to subparts of words. It functions as a constraint on transformational grammar.

Words are analogous to atoms in that, from the point of view of syntax, words do not have any internal structure and are impenetrable by syntactic operations. The ideas of this theory are complicated when considering the hierarchical levels of word formation and the broad variation in defining what constitutes a word, and when words are inserted.

Different theories have been proposed by linguists to further refine this theory in order to account for cross-linguistic challenges to the LIH. Two linguists, Joan Bresnan of Stanford University and Sam Mchombo of the University of California, Berkeley, maintain the idea of words as unanalyzable units; re-evaluate this theory using evidence from Bantu to resolve clitics' apparent violations of the LIH. They concluded that clitics and their prosodic word hosts are separate entities, thus stipulating that the hypothesis does not govern the prosodic word, but rather, the morphosyntactic word.

This hypothesis is incompatible with endoclitics, claimed to exist e.g. in the Udi language.

It is also incompatible with Arrernte, a language spoken in the Alice Springs area of Australia. Arrernte reportedly has initial separation where "the first two, or rarely three syllables of a verb can optionally be separated from the remainder of the verb. Intervening material seems to be limited to particles, clitics, pronouns, and simple NPs." (Henderson 2002)

History
The LIH is a subset of the lexicalist hypothesis, which states that morphology and syntax do not interact, with the result (among others) that some syntactic operations cannot access word-internal structures.

This theory appears to have no single source from which it originates. Despite being widely referred to and debated in linguistics, there is no single attributable source for the Lexical Integrity Hypothesis, nor does there seem to be any single definition, which potentially poses problems for this theory's falsifiability. The hypothesis seems to come about from the consensus that there is a phenomenon that generally and cross-linguistically prevents or limits the interaction between syntax and morphology.

Though not referred to by name, the earliest theoretical beginnings of the LIH seems to be from, while linguist Andrew Carstairs-McCarthy attributes it to , even though Bresnan and Mchombo themselves refer to the lexical integrity principle as a given concept within the linguistics canon.

While today they are generally distinct theories, the LIH is historically referred to interchangeably with the lexicalist hypothesis, making the origin of the LIH as a concept distinct from the lexicalist hypothesis difficult to pinpoint.

However, Bruening (2018) attributes the lexicalist hypothesis, of which the LIH is a subset, to Chomsky (1970).

Interaction between syntax and morphology: Theoretical variations
One of the biggest challenges to defining the LIH is in identifying the domain that syntax governs, the domain that morphology governs, and how these two constructs interact. Questions to be entertained, for example, are what constitutes a word and the point in which lexical insertion merges with sentence-level operations. is frequently used as a foundation for many explorations of the LIH. In this book, linguists Anna Maria Di Sciullo and Edwin S. Williams explore the concept of word atomicity, as well as the framing of syntax within the idea of a "sentence form" wherein sentences are skeletal placeholders of lexical items, such as listemes, which are lexical constituents that are stored in the lexicon as opposed to generated by rules.

redefines LIH as a principle that excludes two interactions between syntax and morphology: having access to word-internal structure, and being able to manipulate parts of word-internal structure--where manipulation is the syntactic movement, or the splitting of a word-constituent. He asserts that for a lexical unit to be a word, the impossibility of such manipulation is a necessary requirement. This prohibition on movement may serve as a test to find out whether a morpheme sequence is a word, or a phrasal compound.

Rochelle Lieber, a linguist at the University of New Hampshire and Sergio Scalise of the University of Bologna propose the idea of a limited access principle in which there is no hard wall between the division of syntax and morphology. Rather, there is a figurative filter that permits some syntactic operations on lexical items. This is evidenced by the fact that languages permit syntactic structures to be "downgraded" to words in that syntactic phrases can be merged into lexical items over time.

Professors Antonio Fábregas of the University of Tromsø, Elena Felíu Arquiola of the University of Jaén and Soledad Varela of the Autonomous University of Madrid, use the concept of a morphological local domain in their discussion of the LIH, in which words have multiple binary branching layers composed of roots and functional projections, with the deeper layers of the morphological hierarchy being too far away for the syntax to see and only the higher head of this multi-layered morphological tree has the ability to transmit information.

Additionally, some theories of syntax appear to be incompatible with the LIH, such as minimalism. Lieber and Scalise argue that Chomsky's version of strict minimalism necessitates lexical items to be fully formed before entering syntactic operations.

However, proposes that syntactic and lexicalist approaches may be reconciled through a checking approach. Checking assumes words are built in the lexicon, and subparts of these words have features attached. These features are then checked to find matching features within the functional heads of the syntactic structures which the words are part of. Dikken asserts that syntax does not only refer to the internal structure of words; it also looks at the properties of subparts of complex words.

English right-hand head rule (RHHR)
In English, the righthand head rule (RHHR) provides evidence for the division between the syntax and the lexical item. The properties of the head of the word, which in English tends to be the rightmost element, determines the properties of the word. The lens of syntax cannot see any other element in the word other than the head. In compounds, for example, a word like greenhouse is composed of the adjective, green, and the noun, house. The RHHR dictates that the head of the word comes from the rightmost element, which is a noun. As a result, the properties of the adjective green are invisible to the syntax. While most easily illustrated with compounds, the RHHR can also be extended to complex words and their respective suffixes.

Five tests for lexical integrity: Bresnan and Mchombo
identify five tests of lexical integrity which will be outlined below: extraction, conjoinability, gapping, inbound anaphoric islands and phrasal recursivity. The examples below parallel those outlined by.

Extraction
Syntactic operations are precluded from movement, such as extracting and relocating (as in topicalization) morphological constituents.

Conjoinability
Functional categories do not undergo morphological derivation, as evidenced by failures in coordination tests: syntactic categories can be coordinated but stems and affixes cannot.

Gapping
The gapping test shows that the syntax is unable to "see" inside morphological constituents.

Inbound anaphoric islands
Phrases can contain pronouns that function as anaphors (referring to a previous referent) or deictics (referring to a salient entity), derived words and compounds cannot, and act as "anaphoric islands", separated from outside reference.

Phrasal recursivity
This test for lexical integrity highlights how phrasal compounds may appear to be penetrable by syntactic operations, but have in fact been lexicalized. These lexical entries have the semblance of figurative quotations. Spencer (1988, 1991) lends support to the LIH through examples such as a Baroque flautist or transformational grammarian that seem to lack any conceptual counterparts, like a wooden flautist or partial grammarian.

Criticisms
Many theorists have generated examples that seem to detract from the strength of the LIH. The LIH heavily depends on what constitutes a word or phrase, and violations to lexical integrity may occur in any given language with how they are defined.

For example, examines Hungarian  with regards to words like meg-old [ PFV-SOLVE ] that are used in deverbal noun, and adjective formation. Haspelmath and Sims observe that constructions involving meg tend to be single words: However, they also noticed meg being detached from its affixations in certain contexts:

Pál meg-old-ott-a a problémá-t.

Paul PFV-SOLVE-PST-DEF.3SG the problem-ACC

'Paul solved the problem.' Pál nem old-ott-a meg a problémá-t.

Paul not SOLVE-PST-DEF.3SG PFV the problem-ACC

'Paul did not solve the problem.'

Haspelmath and Sims argue that the LIH is not violated in the data above if they think of megoldotta as a periphrastic construction, in which meg and oldotta are in separate syntactic nodes. Assuming that a "word" here is not a "morphologically generated form", but instead "terminal syntactic nodes"—a notion adopted by —lexical integrity will not be violated. However, defining what a word is seems to then be a language-specific process, and the challenge then comes from trying to label the LIH as universal.

According to, phrasal compounds, especially because of their productivity, provide strong counter-evidence to the LIH. Phrasal compounds, she argues, must at least account for the phrasal categories generated by the syntax. As an example, the English possessive attaches to the end of a DP in the following example (that parallels those outlined in her book), when the most rigorous interpretation of the LIH would predict it to attach to the end of a lexical noun.

Linguist Andrew Spencer of the University of Essex expands on this idea and suggests that there is evidence, particularly in derived words in Romance languages and Dutch that morphology echoes the syntax of a language.

, however, account for phrasal compounds by arguing that phrasal compounds are lexical entries. This proposal runs counter to, who argues that there are some phrasal compounds that are non-lexicalizable, such as nonce words that are spontaneously coined once and in limited contexts, such as conversations. However, even Lieber has since softened her strong position against the Lexical Integrity Hypothesis.

Wiese (1996) argues that phrasal compounds do not provide counter-evidence against the LIH, as the phrasal part in such a compound constitutes something like a quotation, used as an encapsulated element. The crucial evidence here is provided by "phrasal" parts which clearly stem from a different language or even a completely different sign system, such as: the @-sign, his rien-ne-va-plus-attitude.

identifies another apparent violation to the LIH in the following examples: pre- and even to some extent post-war economics, pro- as opposed to anti-war, and hypo- and not hyperglycemic. However, he also notes that there is currently wide variability, and in many respects, unpredictability, in the kinds of situations that permit the coordination of prefixes in this way.

argues that phrasal syntax has access to the morphological structure of words. There are apparent cases of prefix coordination such as the one below from :

Proponents of the LIH argue that such examples are underlying coordinated phrases (*pre-revolutionary France and post-revolutionary France*) with ellipsis of repeated material, making this a phonological and not syntactic phenomenon. However, the following examples demonstrate that the ellipsis must target morphological constituents and not just identical phonological strings. In other words, it can access the morphological structure within the words, directly contradicting the LIH.

A language that seems to violate the LIH with regards to their complex predicates is Arrernte, as observes. Complex predicates in this language may have non-verbal morphemes intervening within the constituents, for example: arrernelheme is split by the word akewele ('supposedly').

arrern+elh+eme akewele

place+REF+PRES SUPPO arrerne akewele lh+eme

place SUPPO REF+PRES

'supposedly sit down'