Talk:Lexical similarity

Code for Sardinian
This table is really interesting. Is there any source that we can add? Also, why is sard used as a code for Sardinian when srd exists as the proper ISO/DIS 639-3? GringoInChile 10:49, 31 January 2006 (UTC)


 * Other than Ethnologue I couldn't find any reliable source. I'm sure there must be, I just wasn't lucky.


 * The problem with Sardinian is that the numbers for the lexical similarity do not specify to which Sardinian dialect they refer. Ethnologue lists four such dialects, mentioning that they are quite distinct. The info about "Sardinian" similarity with other Romance languages comes from here:


 * Logudorese is quite different from other Sardinian varieties. Lexical similarity 68% with Standard Italian, 73% with Sassarese and Cagliare, 70% with Gallurese. 'Sardinian' has 85% lexical similarity with Italian, 80% with French, 78% with Portuguese, 76% with Spanish, 74% with Rumanian and Rheto-Romance.


 * As you can see, nothing clear. One more reason to find the original source of all these numbers. The other important reason is to find out exactly how the coefficients were calculated. — Adi Japan   ☎  11:41, 31 January 2006 (UTC)


 * Now I see your point about srd: it refers to "generic" Sardinian. I changed the article accordingly. Thanks. — Adi Japan   ☎  11:57, 31 January 2006 (UTC)

Question about English and Spanish
Does a dash on the table mean there is no lexical similarity between the languages? I can think of a couple dozen words between Spanish and English that are similar - things like "interesante" (interesting), "rata" (rat) and "tren" (train) - even one or two very common words like "no" (no) and "es" (is). Apparently I've misunderstood the concept of lexical similarity - why does the table report zero similarity? --203.206.56.201 10:56, 29 August 2006 (UTC)


 * The dash means "data not available". — Adi Japan   ☎  10:36, 30 September 2006 (UTC)

Dialects above 85%
hey, where'd this come from? Hence, wouldn't French be a dialect of Italian?Domsta333 14:17, 27 September 2006 (UTC)


 * It came from this page at Ethnologue (lower half of the page): "Percentages higher than 85% usually indicate a speech variant that is likely a dialect of the language being compared." So it is a question of likelyhood, not certainty. Besides the concept of dialect doesn't have a clear-cut, linguistic-only, definition. France and Italy both have armies, that's why they are not just dialects... — Adi Japan   ☎  10:36, 30 September 2006 (UTC)


 * Lexical similarities higher than 85% indicate that the speech variant that is likely a dialect, this does not mean that 100% of the time two languages with a lexical similarity of 85%+ are dialects. French and Italian both have countries in which their languages are official languages (which by some very unscientific definitions of a dialect can make them languages), and they have serious phonological differences that make mutual intelligibility difficult (at least in spoken form). Some dialect continuum has been suggested to exist between French and Italian (speakers in Southern France can sometimes partially understand Italian for example), but the standard froms in Paris and Rome are very different. — Preceding unsigned comment added by Brianc26 (talk • contribs) 03:59, 9 August 2012 (UTC)

Similarity?
The article does not explain how the "similarity" between two given words is defined. FilipeS 22:36, 9 May 2007 (UTC)


 * It is sort of straightforward (two words that look/sound similar and have similar meanings, and sometimes similar words might not look immediatly similar without a background in Historical Linguistics), but here are some examples of them from German vs English (60% lexical similarity):

English:   German:

similar words:

the man           der Mann

to begin          beginnen

Winter            Winter

Psychology        Psychologie

(the) Probability die Probabilität*


 * A word for probability in German which is not a loanword is "die Wahrscheinlichkeit".

Similar, but not so obvious

old               alt

elder**           älter

thou**            du

the rain          der Regen

The water         das Wasser

to fly            fliegen

to sail           segeln

the gift          die Gabe**

and               und

to greet          begrüßen

the friend        der Freund/die Freundin (male/female)

friendship        die Freundschaft

the ship          das Schiff

the apple         der Apfel

the housemaid     das Hausmädchen

the cow           die Kuh

Bavaria           Bayern


 * =archaic

Words which have changed in meaning over time and are now not lexically similar even though they are technically in cognate with each other:

the little girl   das Mädchen=little girl (cognate with "maiden", has a variety as "maid" in modern English)

the dog           der Hund (cognate with "hound")

Dissimilar words:

to buy            kaufen

to sell           verkaufen

the seller        der Verkaüfer/die Verkaüferin (male/female)

to breathe        atmen

the sale          die Einkaüfungsgelegenheit (as in a sale at a store) Brianc26 (talk) 04:30, 9 August 2012 (UTC)

Spanish and Italian
According to the table, spanish has a lower lexical similarity with italian than french. I´d like to point out the importance of the spoken language, this is it, the phonetic patterns, I´m spanish and I have no statistic researchs, please note that the differences between the written written and the spoken issues are extremmely important, but I´d say Spanish and Italian, as a hole, have a higher level of similarity each other than with french.--85.85.2.194 10:15, 27 May 2007 (UTC)

Catalan and French
In the table of lexical similarity values there is no value for Catalan-French. I'm catalan native speaker and I can understand some spoken words in french and so much readed words. And some words are identically. For example "D'acord" (OK) the negation "pas" that are identical or  "parlar" (cat), "parler" (fra) that are really similar, also with italian "parlare".

So I think that there must be a mistake. Catalan and French have a similarity value. —The preceding unsigned comment was added by Special:Contributions/ (talk)


 * It's not a mistake, the hyphens in the table don't mean zero, but "data not available". The table contains only data that have been published by Ethnologue.com. For the Catalan there are several lexical similarity values, but none of them is with French, so even if there is a high lexical similarity between Catalan and French we don't know what number to put in the table. If you find any publication that indicates that degree of lexical similarity, please feel free to put it in the table. Don't forget to add your source in the reference section. — Adi Japan   ☎  03:43, 24 August 2007 (UTC)
 * You are wrong! "d'accord" has two c's in French. --2.245.92.86 (talk) 16:06, 19 March 2014 (UTC)

Reliability and comparability
Regarding the table used here, I'd like to make two points.
 * The table is very unbalanced at the moment, consistigt of eight Romance languages, only two Germanic languages and a single Slavic languages. I would suggest removing some of the smaller Romance languages, perhaps Sardu and Romansh, and add at least a few more Germanic languages (Dutch and Swedish) and Slavic (Polish, Czech, Croatian).
 * I know I'm stepping on some toes now, but Ethnologue is not a reliable source. Apart from usually getting its facts wrong, at least in Europe, it is at best a secondary source. It would increase the reliability of the table quite a lot if primary sources would be used. JdeJ (talk) 20:55, 5 August 2008 (UTC)
 * I can't understand the table: it says that italian and portoguese have a coefficient of similiraty of zero (thus I've interprated the symbol "-"). Since they are both romance languages it doesn't make any sense! --Vipera (talk) 22:00, 16 February 2009 (UTC) I've just realized what the dash means. --Vipera (talk) 09:12, 18 February 2009 (UTC)

This article is filled with poppycock
First I'd remind all editors that Ethnologue is not an infallible source. Indeed, it is most fallible. It is not a reliable source of information beyond the most general, i.e. that which they cannot possibly get wrong. 60% lexical similarity between English and German? I find that an incredible thing to believe given English contains only 24% Germanic vocabulary, and within this 24%, I would venture to guess that similarities with equivalent German words are not near the 100% threshold either. Utter spurious rubbish. — Preceding unsigned comment added by 203.59.250.129 (talk) 15:17, 9 June 2012 (UTC)

English compared to German and French
The ethnologue figures seem to be way off mark, especially when comparing English to French and German which are given two widely differing coefficients. In English, if you write any formal or business letter, or any text relating to culture or law the lexical similarity with French is likely to be very high, and that with German will be much lower. The inverse is true writing informal texts about everyday life, and with functional words. The reason for this is likely cultural. Modern English was borne out of the hundred years war with France, when there was a need for the Norman ruling classes, who spoke French, to talk with the peasantry who spoke early English, which was nordic Germanic. Look at the Lords Prayer written in Old English, Shetland Norn and Icelandic, they are all pretty similar. Strange how the Brythonic languages had very little influence on English.

Look at these animals and meats and their cognates

Cow, Beef - Kuh, Rindfleisch (D) - Vache, Beoef (FR) Calf, Veal - Kalb, Kalbfleisch (D) - Veau, Veau Swine (Pig), Pork - Schwein, Schweinfleisch (D) - Cochon, Porc (FR) Sheep, Mutton - Schaf, Hammelfleisch (D) - Mouton, Mouton Lamb, Agnew (archaic)- Lamm, Lammfleisch (D) - Agneau, Agneau (FR)

What is interesting is that the animals are cognate with the german equivalents (and what the peasants called them), and the meats are cognate with the french equivalents (and what the cooks and the aristocrats saw).

These differences persist today in terms of class attitudes, and what is perceived to be formal and proper, and informal and vulgar. French based language is largely seen as more formal. — Preceding unsigned comment added by Jroswald2001 (talk • contribs) 00:25, 28 November 2012 (UTC)

English and French
It's impossible to accept that according to Ethnologue German is closer to French than English is. Even for basic words, there have obviously been things that they must have missed as cognate words originating directly from french or latin. In the past and current sentences (sentence is derived from french) there are 18 words originating directly from French, and 24 directly from german. So the 27% and 60 % similarities to French and German, respectively, are even questionable too.

E.g.     EN table - FR table - DE Tisch EN rock - FR roc/rocher - DE Fels — Preceding unsigned comment added by 213.205.227.110 (talk) 14:31, 5 August 2013 (UTC) EN piece - FR piece - DE Stueck — Preceding unsigned comment added by 147.188.184.226 (talk) 12:07, 2 August 2013 (UTC) EN road - FR route - DE strasse EN face - FR visage/face - DE Gesicht EN to pay - FR payer - DE bezahlen EN money - FR argent/monnaie - DE Geld EN Sir - FR (Mon)Sieur - DE Herr EN Madam - FR Madame - DE Frau EN carry - FR porter/charrier - DE tragen EN air - FR air - DE Luft EN power - FR pouvoir - DE Macht EN fork - FR fourchette, fourche - DE Gabel EN space - FR espace - DE Raum EN attention - FR attention - Aufmerksamkeit/Achtung EN to spend - FR dépenser - DE verbringen EN fine - FR fin - DE fein (borrowed from French) EN to close - FR fermer/clore - DE schliessen EN to count - FR compter - DE zählen EN choice - FR choix - DE Wahl EN city - FR ville/cité - DE Stadt

And many more basic words... And even more less basic, but everyday words are more or less distant cognates, either directly from french or from latin: realise (réaliser), train (from french traîner), screen (écran), castle (château, old french castel), car (char, from charrier), create (créer), change (changer), language (langage), place (place), reduce (réduire), society (société), arm (arme), alarm (alarme, from "à l'arme"), alert (alerte, from "à l'erte"), country (contrée), cause (cause), case (cas) etc...

So in any case, even if German may be closer to english, albeit not at 60% vs 27% lexical similarity, English is definitely closer to French than German is because a very significant part of English originates directly from French and Latin!!!

Bear in mind that Ethnologue is a journal which also puts the total number of native French speakers worldwide at 68.5 million, a figure which basically just comprises mainland France (65.8 million), French-speaking Belgium (Wallonia and Brussels, 4.5 million), and French-speaking Switzerland (2 million). The total (72.3 million) is however larger than the quoted figure (68.5 million) so Ethologue assumes that 5% of the speakers of these regions are not native speakers. This figure also does not seem to include Quebec (8 million native speakers), nor Haiti - where it is a native language, coexisting with the local French-derived dialect, Haitian Creole, nor any African country - where it is a native language for many speakers, e.g. Algeria, Gabon, Côte d'Ivoire, Sénégal, etc.

The French national institute for statistics puts this figure at 128 million. — Preceding unsigned comment added by 2.24.0.89 (talk) 12:14, 19 July 2013 (UTC)

Ethnologue compares the basic or core vocabulary of the languages and not the entire words. Pronouns, irregular verbs, colours, etc are nearly the same in both English and German and not in English and French. Arndt1969 (talk) 19:48, 22 October 2021 (UTC)

Something wrong with the table
The Russian portion of the table is wrong. It reads differently horizontally and vertically (which ought to be the same). And where are there 2 "1"s for Portuguese? Someone fix this table! 219.74.168.174 (talk) 20:17, 5 September 2013 (UTC)

English and French: New Research
The 27% lexical similarity between English and French is simply wrong, especially the fact that it is at 29% between German and French.

Newer research puts this figure between 30-49% - the authors would have the exact figure.

Check out this page

http://elms.wordpress.com/

The data used for the research comes from

Шайкевич А. Я., Гипотезы о естественных классах и возможность количественной таксономии в лингвистике, в кн.: Гипотеза в современной лингвистике, М., 1980;

In any case, they obviously place English as closer to French than as German is.

They also say that 75% of the english vocabulary comes from French.


 * 46% here as well 178.221.85.71 (talk) 16:21, 26 December 2014 (UTC)

Calculation method
Can someone elaborate how lexical similarity is calculated? Is a random sample of words taken, a fixed well-constructed list, or all words (including archaic and scientific ones)? How far can the meaning of cognates deviate? For example German Herberge "shelter, lodgings" and English harbour "shelter for ships, port" - would they be counter for or against lexical similarity? How far can the form of words deviate? For example Afrikaans "aand" has the same origin and meaning as English "evening", though only linguists will see this similarity. Morgengave (talk) 12:24, 27 October 2014 (UTC)


 * There are different ways. I don't think random sample is ever used. I read papers where most frequent words were used. The other way is to compare whole dictionaries. As for the second part of the question, I think you cannot mix archaic words with modern ones.. Archaic have their use in etymology studies. You can use synonyms and contextual synonyms. 178.221.85.71 (talk) 16:19, 26 December 2014 (UTC)

External links modified
Hello fellow Wikipedians,

I have just modified 1 one external link on Lexical similarity. Please take a moment to review my edit. If you have any questions, or need the bot to ignore the links, or the page altogether, please visit this simple FaQ for additional information. I made the following changes:
 * Added archive https://web.archive.org/web/20011005193846/http://www.ethnologue.com/web.asp to http://www.ethnologue.com/web.asp

When you have finished reviewing my changes, please set the checked parameter below to true or failed to let others know (documentation at ).

Cheers.— InternetArchiveBot  (Report bug) 07:24, 21 July 2016 (UTC)