Indo-Uralic languages

Indo-Uralic is a highly controversial linguistic hypothesis proposing a genealogical family consisting of Indo-European and Uralic.

The suggestion of a genetic relationship between Indo-European and Uralic is often credited to the Danish linguist Vilhelm Thomsen in 1869 (Pedersen 1931:336), though an even earlier version was proposed by Finnish linguist Daniel Europaeus in 1853 and 1863. Both were received with little enthusiasm. Since then, the predominant opinion in the linguistic community has remained that the evidence for such a relationship is insufficient to confirm a genetic relationship versus similarity due to language contact. However, quite a few prominent linguists have always taken the contrary view (e.g. Henry Sweet, Holger Pedersen, Björn Collinder, Warren Cowgill, Jochem Schindler, Eugene Helimski, Frederik Kortlandt and Alwin Kloekhorst).

The Indo-Uralic hypothesis has been questioned by recent linguistic data, contradicting previous argued cognates, finding no support for a genealogical relationship between Uralic and Indo-European.

Geography of the proposed Indo-Uralic family
The Dutch linguist Frederik Kortlandt supports a model of Indo-Uralic in which the original Indo-Uralic speakers lived north of the Caspian Sea, and the Proto-Indo-European speakers began as a group that branched off westward from there to come into geographic proximity with the Northwest Caucasian languages, absorbing a Northwest Caucasian lexical blending before moving farther westward to a region north of the Black Sea where their language settled into canonical Proto-Indo-European (2002:1). Allan Bomhard suggests a similar schema in Indo-European and the Nostratic Hypothesis (1996). Alternatively, the common protolanguage may have been located north of the Black Sea, with Proto-Uralic moving northwards with the climatic improvement of post-glacial times.

Expanding upon his earlier hypothesis, Kortlandt (2021) proposes that Proto-Indo-European, rather than being a sister of Proto-Uralic, is a daughter of Proto-Uralic, and that Indo-European is a branch of the Uralic family. More specifically, he proposes that Proto-Indo-European and Proto-Finno-Ugric share a more recent common ancestor with each other than either of them do with Proto-Samoyedic. If valid, this would mean the traditional conception of the Uralic family (with Indo-European excluded) is a paraphyletic clade.

History of the Indo-Uralic hypothesis
An authoritative if brief and sketchy history of early Indo-Uralic studies can be found in Holger Pedersen's Linguistic Science in the Nineteenth Century (1931:336-338). Although Vilhelm Thomsen first raised the possibility of a connection between Indo-European and Finno-Ugric in 1869 (336), "he did not pursue the subject very far" (337). The next important statement in this area was that of Nikolai Anderson in 1879. However, Pedersen reports, the value of Anderson’s work was "impaired by its many errors" (337). The English phonetician Henry Sweet argued for kinship between Indo-European and Finno-Ugric in his semi-popular book The History of Language in 1900 (see especially Sweet 1900:112-121). Sweet's treatment awakened "[g]reat interest" in the question, but "his space was too limited to permit of actual proof" (Pedersen 1931:337). A somewhat longer study by K. B. Wiklund appeared in 1906 and another by Heikki Paasonen in 1908 (i.e. 1907) (ib.). Pedersen considered that these two studies sufficed to settle the question and that, after them, "it seems unnecessary to doubt the relationship further" (ib.).

Sweet considered the relationship to be securely established, stating (1900:120; "Aryan" = Indo-European, "Ugrian" = Finno-Ugric):

"If all these and many other resemblances that might be adduced do not prove the common origin of Aryan and Ugrian, and if we assume that the Ugrians borrowed not only a great part of their vocabulary, but also many of their derivative syllables, together with at least the personal endings of their verbs from Aryan, then the whole fabric of comparative philology falls to the ground, and we are no longer justified in inferring from the similarity of the inflections in Greek, Latin, and Sanskrit that these languages have a common origin."

The short name "Indo-Uralic" (German Indo-Uralisch) for the hypothesis was first introduced by Hannes Sköld 1927.

Björn Collinder, author of the Comparative Grammar of the Uralic Languages (1960), a standard work in the field of Uralic studies, argued for the kinship of Uralic and Indo-European (1934, 1954, 1965).

Alwin Kloekhorst, author of the Etymological Dictionary of the Hittite Inherited Lexicon, endorses the Indo-Uralic grouping (2008b). He argues that, when features differ between the Anatolian languages (including Hittite) and the other Indo-European languages, comparisons with Uralic can help to establish which group has the more archaic forms (2008b: 88) and that, conversely, the success of such comparisons helps to establish the Indo-Uralic thesis (2008b: 94). For example, in Anatolian the nominative singular of the second person pronoun comes from *ti(H), whereas in the non-Anatolian languages it comes from *tu(H); in Proto-Uralic it was *ti, which agrees with evidence from internal reconstruction that Anatolian has the more archaic form (2008b: 93).

The most extensive attempt to establish sound correspondences between Indo-European and Uralic to date is that of the Slovenian linguist Bojan Čop. It was published as a series of articles in various academic journals from 1970 to 1989 under the collective title Indouralica. The topics to be covered by each article were sketched out at the beginning of "Indouralica II". Of the projected 18 articles only 11 appeared. These articles have not been collected into a single volume and thereby remain difficult to access.

In the 1980s, Russian linguist N. D. Andreev (Nikolai Dmitrievich Andreev) proposed a "Borean languages" hypothesis linking the Indo-European, Uralic, and Altaic (including Korean in his later papers) language families. Andreev also proposed 203 lexical roots for his hypothesized Boreal macrofamily. After Andreev's death in 1997, the Boreal hypothesis was further expanded by Sorin Paliga (2003, 2007).

Sound correspondences
Among the sound correspondences which Čop did assert were (1972:162):


 * Uralic m n l r = Indo-European m n l r.
 * Uralic j w = Indo-European i̯ u̯.
 * Uralic sibilants (presumably s š ś) = Indo-European s.
 * Uralic word-initial voiceless stops (presumably p t č ć k) = Indo-European word-initial voiced aspirates (presumably    ) and voiceless stops (presumably p t  k ), also Indo-European s followed by one of these stops.
 * Uralic ŋ = Indo-European g and ng.

History of opposition to the Indo-Uralic hypothesis
The history of early opposition to the Indo-Uralic hypothesis does not appear to have been written. It is clear from the statements of supporters such as Sweet that they were facing considerable opposition and that the general climate of opinion was against them, except perhaps in Scandinavia.

Károly Rédei, editor of the etymological dictionary of the Uralic languages (1986a), rejected the idea of a genetic relationship between Uralic and Indo-European, arguing that the lexical items shared by Uralic and Indo-European were due to borrowing from Indo-European into Proto-Uralic (1986b).

Perhaps the best-known critique of recent times is that of Jorma Koivulehto, issued in a series of carefully formulated articles. Koivulehto's central contention, agreeing with Rédei's views, is that all of the lexical items claimed to be Indo-Uralic can be explained as loans from Indo-European into Uralic (see below for examples).

The linguists Christian Carpelan, Asko Parpola and Petteri Koskikallio suggest that early Indo-European and Uralic stand in early contact and suggest that any similarities between them are explained through early language contact and borrowings.

According to Angela Marcantonio (2014) and Johan Schalin a genetic relation between Uralic and Indo-European is very unlikely and mostly all similarities are explained through borrowings and chance resemblances. Marcantonio argued that the fundamental typological differences between Uralic and Indo-European are so much, that a relationship is unlikely.

In 2022, a group of scholars concluded that Proto-Uralic and Proto-Indo-European do not share a genealogical relationship with each other, as "whether based on cognacy or loans the argument from lexical resemblances is flawed". According to them, "Uralic is distinctive in western Eurasia. A number of typological properties are eastern-looking overall, fitting comfortably into northeast Asia, Siberia, or the North Pacific Rim". Previously proposed cognates can be largely explained via borrowings from Indo-Iranian languages. They concluded in regards to the Indo-Uralic hypothese that "of what we take to be the two statistically soundest recent quantitative tests, Kessler and Lehtonen (2006), using a 100-item Swadesh-like wordlist, found no evidence for Indo-Uralic".

Morphological
The most common arguments in favour of a relationship between Indo-European and Uralic are based on seemingly common elements of morphology, such as the pronominal roots (*m- for first person; *t- for second person; *i- for third person), case markings (accusative *-m; ablative/partitive *-ta), interrogative/relative pronouns (*kʷ- "who?, which?"; *y- "who, which" to signal relative clauses) and a common SOV word order. Other, less obvious correspondences are suggested, such as the Indo-European plural marker *-es (or *-s in the accusative plural *) and its Uralic counterpart *-t. This same word-final assibilation of *-t to *-s may also be present in Indo-European second-person singular *-s in comparison with Uralic second-person singular *-t. Compare, within Indo-European itself, *-s second-person singular injunctive, *-si second-person singular present indicative, *-tHa second-person singular perfect, *-te second-person plural present indicative, *tu "you" (singular) nominative, *tei "to you" (singular) enclitic pronoun. These forms suggest that the underlying second-person marker in Indo-European may be *t and that the *u found in forms such as *tu was originally an affixal particle or merely analogical. An Indo-European marginal locative *-en compares in function the most closely with the Uralic locative *-na, in form with the Uralic genitive *-n, which has inspired suggestions of a single Indo-Uralic *n-case with later development into multiple case forms in both families (Pedersen 1933)

Similarities have long been noted between the verb conjugation systems of Uralic languages (e.g. that of Finnish) and Indo-European languages (e.g. those of Latin, Russian, and Lithuanian). Although it would not be uncommon for a language to borrow heavily from the vocabulary of another language (as in the cases of English from French, Persian from Arabic, and Korean from Chinese), it would be extremely unusual for a language to borrow its basic system of verb conjugation from another. Supporters of the existence of Indo-Uralic have thus used morphological arguments to support the Indo-Uralic thesis by, for example, arguing that Finnish verb conjugations and pronouns are much more closely related to Indo-European than they would be expected to be by chance; and since borrowing basic grammar is rare, that this would suggest a common origin with Indo-European. (Finnish is preferred for this argument over Saami or Hungarian because it seems to be more conservative, i.e. to have diverged less than the others have from Proto-Uralic. But even then, similar suspicious parallels have been noted between Hungarian and Armenian verb conjugation.) However, the strongly divergent sound systems of Proto-Indo-European and Proto-Uralic are an aggravating factor both in the morphological and the lexical realm, making it additionally difficult to judge resemblances and interpret them as either borrowings, possible cognates or chance resemblances.

Lexical
A second type of evidence advanced in favor of an Indo-Uralic family is lexical. Numerous words in Indo-European and Uralic resemble each other (see list below). The problem is to distinguish between cognates and borrowings. Uralic languages have been in contact with a succession of Indo-European languages for millennia. As a result, many words have been borrowed between them, most often from Indo-European languages into Uralic ones.

An example of a Uralic word that cannot be original is Finno-Ugric *śata "hundred". The Proto-Indo-European form of this word was ḱm̥tóm (compare Latin centum), which became ćatám in early Indo-Iranian (reanalyzed as the neuter nominative–accusative singular of an a stem > Sanskrit śatá-, Avestan sata-). This is evidence that the word was borrowed into Finno-Ugric from Indo-Iranian or Indo-Aryan. This borrowing may have occurred in the region north of the Pontic–Caspian steppes around 2100–1800 BC, the approximate floruit of Indo-Iranian (Anthony 2007:371–411). It provides linguistic evidence for the geographical location of these languages around that time, agreeing with archeological evidence that Indo-European speakers were present in the Pontic-Caspian steppes by around 4500 BCE (the Kurgan hypothesis) and that Uralic speakers may have been established in the Pit-Comb Ware culture to their north in the fifth millennium BCE (Carpelan & Parpola 2001:79).

Another ancient borrowing is Finno-Ugric *porćas "piglet". This word corresponds closely in form to the Proto-Indo-European word reconstructed as porḱos, attested by such forms as Latin porcus "hog", Old English fearh (> English farrow "young pig"), Lithuanian par̃šas "piglet, castrated boar", and Saka pāsa (< *pārsa) "pig". In the Indo-European word, -os (> Finno-Ugric *-as) is a masculine nominative singular ending, but it is quite meaningless in Uralic languages. This shows that the whole word was borrowed as a unit and is not part of the original Uralic vocabulary.

One of the most famous borrowings is the Finnish word kuningas "king" (< Proto-Finnic *kuningas), which was borrowed from Proto-Germanic kuningaz. Finnish has been very conservative in retaining the basic structure of the borrowed word, nearly preserving the nominative singular case marker reconstructed for Proto-Germanic masculine 'a'-stems. Furthermore, the Proto-Germanic *-az ending corresponds exactly to the -os ending reconstructable for Proto-Indo-European masculine o-stems.

Thus, *śata cannot be Indo-Uralic on account of its phonology, while *porćas and *kuningas cannot be Indo-Uralic on account of their morphology.

Such words as those for "hundred", "pig", and "king" have something in common: they represent "cultural vocabulary" as opposed to "basic vocabulary". They are likely to have been acquired along with a novel number system and the domestic pig from Indo-Europeans in the south. Similarly, the Indo-Europeans themselves had acquired such words and cultural items from peoples to their south or west, including possibly their words for "ox", gʷou- (compare English cow) and "grain", *bʰars- (compare English barley). In contrast, basic vocabulary – words such as "me", "hand", "water", and "be" – is much less readily borrowed between languages. If Indo-European and Uralic are genetically related, there should be agreements regarding basic vocabulary, with more agreements if they are closely related, fewer if they are less closely related.

Advocates of a genetic relation between Indo-European and Uralic maintain that the borrowings can be filtered out by application of phonological and morphological analysis and that a core of vocabulary common to Indo-European and Uralic remains. As examples they advance such comparisons as Proto-Uralic *weti- (or *wete-) : Proto-Indo-European *', oblique stem *', both meaning 'water', and Proto-Uralic *nimi- (or *nime-) : Proto-Indo-European *, both meaning 'name'. In contrast to *śata and *kuningas, the phonology of these words shows no sound changes from Indo-European daughter languages such as Indo-Iranian. In contrast to *kuningas and *porćas, they show no morphological affixes from Indo-European that are absent in Uralic. According to advocates of the Indo-Uralic hypothesis, the resulting core of common vocabulary can only be explained by the hypothesis of common origin.

Objections to this interpretation
It has been countered that nothing prevents this common vocabulary from having been borrowed from Proto-Indo-European into Proto-Uralic.

For the old loans, as well as uncontroversial ones from Proto-Baltic and Proto-Germanic, it is more the rule than the exception that only the stem is borrowed, without any case-endings. Proto-Uralic *nimi- has been explained according to sound laws governing substitutions in borrowings (Koivulehto 1999), on the assumption that the original was a zero-grade oblique stem PIE (H)nmen- as attested in later Balto-Slavic *inmen- and Proto-Celtic *anmen-. Proto-Uralic *weti- could be a loan from the PIE oblique e-grade form for 'water' or from an indirectly attested cognate root noun *wed-. Proto-Uralic *toHį- 'give' and PFU *wetä- 'lead' also make perfect phonologic sense as borrowings.

The number systems of Indo-European and Uralic show no commonalities. Moreover, while the numbers in all Indo-European languages can be traced back to reconstructed Proto-Indo-European numbers, this cannot be done for the Uralic numbers, where only "two" and "five" are common to all of the family (roots for 3-6 are common to all subgroups other than Samoyedic, and slightly less widespread roots are known for 1 and 10). This would appear to show that if Proto-Indo-European and Proto-Uralic are to be related, the connection must lie so far back that the families developed their number systems independently and did not inherit them from their purported common ancestor. Although, the fact that Uralic languages themselves do not share the same numbers across all Uralic branches indicates that they would not with Indo-European languages in any case, even if they were in fact related.

It is also objected that some or all of the common vocabulary items claimed are false cognates – words whose resemblance is merely coincidental, like English bad and Persian بد (bad).

Some possible cognates
1Some researchers have interpreted Proto-Uralic *wete as a borrowing from Indo-European that may have replaced a native Proto-Uralic synonym *śäčä everywhere but in some of the northern fringes of the family (most prominently Proto-Samic *čācē).

2 This word belongs to the r and n stems, a small group of neuter nouns, from an archaic stratum of Indo-European, that alternate -er (or -or) in the nominative and accusative with -en in the other cases. Some languages have leveled the paradigm to one or the other, e.g. English to the r, Old Norse to the n form.

3 Indo-Europeanists are divided on whether to reconstruct this word as *nom(e)n- or as *, with a preceding "laryngeal". See Delamarre 2003:50 for a summary of views, with references. The o timbre of the root is assured by, among others, Greek ónoma and Latin nōmen (with secondary vowel lengthening). As roots with inherent o are uncommon in Indo-European, most roots having e as their vowel, the underlying root is probably *nem-. The -(e)n is an affixal particle. Whether the e placed in parentheses is inherently part of the word is disputed but probable.

4 The ḷ in Indo-European *pḷlu- represents a vocalic l, a sound found in English in for instance little, where it corresponds to the -le, and metal, where it corresponds to the -al. An earlier form of the Indo-European word was probably *pelu-.

The following resemblance sets are from Aikio (2019).