Talk:Linkage (linguistics)

not universally applicable
Note that linkages as conceived by François presume the existence of a separated set of dialects already "before" diversification. It is a model of linguistic innovations, not of the formation of language communities. Several tree models however actually claim to model the emergence of linguistic communities as well, and they cannot be modelled as linkages. -- Trɔpʏliʊm • blah 17:45, 4 November 2014 (UTC)
 * Hi Tropylium,
 * Thanks for your useful edits.
 * Regarding your question: You raise an interesting point. However, I don't think it is true that “François presumes the existence of a separated set of dialects already "before" diversification”. In fact his model can start from a single (proto) language and account for the formation of dialects: this is because dialects emerge through (the spread of) local innovations, and these are precisely what his model represents. So, the linkage model would well handle, for example, a situation like Latin (or Proto-Romance…) spreading over Western Europe, and from there, breaking up into a number of dialects…; you don't need the tree model for that (a tree would be unhelpful anyway for handling such a continuum).  I would agree with you, though, that François’ model presumes, not the existence of separate dialects, but at least the geographical spread of a population over a territory.  That is, he assumes that one could start from a set of (potential) dialects A, B, C, D, E, F already spread out in space, but these dialects do not have to be differentiated (i.e. they could be exactly identical at time zero) for the model to work. Ross (in Lynch, Ross & Crowley) also makes a similar point: I'll add the reference to the entry now, and I'll try to clarify what I think I understood from François' approach.  Best, — Womtelo (talk) 19:01, 4 November 2014 (UTC).
 * "Geographical spread" is exactly what I mean by "dialects before diversification", yes. The idea of these being treatable as "identical dialects" is a key assumption of the model. I don't think it is really universally accepted, though. -- Trɔpʏliʊm • blah 10:01, 5 November 2014 (UTC)
 * Hi Tropylium, I don't see it as a problem. It is a universal property of all languages to be geographically spread to some extent, right? As soon as you have several speakers, you have several idiolects, and therefore potentially several dialects. So, while the spread of Latin throughout W. Europe is a good example, the linkage model does not require such a scope: it works also well with Latin being spoken just in Rome. As soon as you have one language diversifying into different varieties (either regional or individual, or in fact social too), then the linkage model applies. Mind you, the tree model has also the same requirement, but it hides it behind the illusion of a single "node", with the misleading impression that a change can happen "once" in that node; but in fact, any node in a tree refers to a language community, that is, a social network made up of idiolects. So, I can't see how the linkage model lacks any universal relevance, or that some situations would require a tree; on the contrary (if I understood well Ross and François), it looks like it is way more universally applicable than trees, since it applies to any language situation. (so, I don't agree with the title of the section here). Maybe I'm wrong, but I can't figure a language situation where the model wouldn't apply, whereas I can think of dozens where the tree assumptions don't work. Finally, the linkage model (or wave model) should also apply to the differentiation into sociolects: it doesn't seem that the authors (either Ross or François) have written specifically about this topic, but this can be easily derived from their model, since it is based on "lects" rather than "languages". — Womtelo (talk) 10:16, 5 November 2014 (UTC).
 * What the linkage model doesn't cover, while the tree model in some versions does, is the creation of new topolects entirely. There was never a Proto-Germanic dialect of London or Tromsø or Salzburg, and the origin of the current speech varieties of these cities cannot be solely modelled by starting with a community of Proto-Germanic dialects and allowing them to undergo innovations. And while the tree model is certainly prone to the impression of changes only occurring in nodes, no detailed description of the model has ever made such a claim.
 * The linkage model probably does have universal relevance (insofar as all languages might well be part of at least one linkage) — what I am questioning is the claim that it universally supercedes the tree model, which is not a NPOV. None of the sources I've looked at seem to claim quite that much either. Also note that the job of defining "the assumptions of the tree model" should not be surrendered to the papers that argue against it. There are many different versions of "the" tree model, of which some might be strictly worse than the linkage model; but I figure some are not.
 * For my part, I fail to see any situation where all possible tree models "fail to work" — a great many where they fail to provide much useful information, but none where it would be categorically invalid. The examples discussed in the linkage model papers all begin from a set of "undifferentiated dialects". If pushed to account for such scenarios, some tree models would probably spit out the result that the "true" tree is the order in which these communities were formed in the first place. Others might argue for a chronologically privileged order of diversification, denying the geneticality of any isogloss that ends up crossing another. Etc. -- Trɔpʏliʊm • blah 11:35, 5 November 2014 (UTC)

“There was never a Proto-Germanic dialect of London or Tromsø or Salzburg, and the origin of the current speech varieties of these cities cannot be solely modelled by starting with a community of Proto-Germanic dialects and allowing them to undergo innovations”: excellent point, I agree with you. But this does not necessarily rule out the linkage model per se; rather, it could be an argument for proposing a multilayered linkage, with layers corresponding to various time periods. So you have innovations specific to Latin with respect to PIE (by contrast with Sabellic etc), all of which will be shared by all Romance languages; then you have innovations found among Romance languages post the spread of Latin;  then you have innovations of Mexican Spanish that happened in Mexico post colonisation, etc. In all cases, you'll need a linkage-like model to represent innovations: that's because you can expect them to intersect, since that is exactly what innovations always do by default (including those genealogical ones), in ways which a tree would be unable to capture. Otherwise we would have long gotten a consensual tree for Romance languages, which is perhaps the best known lg family in the world, yet the tree model here clearly fails to work (or is "categorically invalid" to use your terms). However, you are correct in saying that Proto-Romance was never spoken in Mexico (to extrapolate from your Germanic example). So what we need is an application of the linkage or wave model that can take different periods into account in the representation. It might as well look like a tree, which is not a problem, as long as that tree is redesigned to accommodate the reality of intersecting genealogical subgroups, which current trees can't do. I think that should be ultimately feasible; Ross tried this in his 1988 representation, but in a rudimentary way (using double lines, if you see what I mean).

One important point that may have to be made, is that the lects or points in a linkage are not necessarily tied to geography, and must be understood as more abstract. So, one could compare the modern Germanic varieties spoken in London, Manchester, Amsterdam, Tromsø, Frankfurt, and Salzburg (L M A T F S). The comparative method could be used to identify what innovations are shared with respect to an ancestral reconstructed state, perhaps PIE: among others, Grimm's law, Verner's law, the High German consonant shift, loss of case marking on nouns, etc. etc. Among these innovations, some will be restricted to only {LM}, others to {LMA}, others to {AFS}; and finally, some will cover the whole set {LMATFS}. Now if one takes linkage maps literally, with each dialect being tied to a specific place, then indeed it seems that a claim is made that, say, "Grimm's law" was an innovation that swiped through all Germanic dialects, "from Salzburg to Manchester" :-/ I grant you that this interpretation would be awkward, so I see exactly what your objection is. But things get better if you take each dialect as more abstract, i.e. “a modern lect spoken today at location X, carrying with it all the baggage it inherited from earlier forms of that lect anywhere in the world”. In such a way, innovations shared by {LM} belong to the shared ancestor of that {LM} subgroup, with no claim as to (1) whether they took place recently in situ (spreading between London and Manchester), or (2) whether they happened in Jutland before the invasion of Britain. Crucially, this agnosticism is a strength of the model, because for many non-written languages, there is no way to decide between the two situations (unlike English). The tree model favours interpretation #2 by default, and does not handle well cases of in situ diffusion. As for the linkage model, it is true that it seems to favour configuration #1 by default, as you have been claiming, but I believe that this is not necessarily the case if you take linkage points as more abstract (=non-specific w.r.t. geographical location for earlier stages).

So the point is, if a linkage diagram shows that {LM} share some innovations, it should not necessarily be interpreted as saying that L and M had to already exist separately at the time of that spread, and that there was diffusion from one to the other; it can perfectly correspond to a case of innovations happening at an ancestral stage, including before L and M even existed as separate dialects. (which I guess answers the objection you've been saying, that the linkage requires separate dialects to begin with; I don't think it does).

Finally, assuming that there are indeed some isoglosses that encompass {LMA} and others {AFS}, then you need the linkage / wave model here, because you have an intersection; none of the tree models I know of is capable of dealing with that configuration, in spite of its simplicity and obvious relevance to the description of dialectological / linguistic data. Of course, as you mentioned yourself in your last para, there is always the good old option of dismissing all cases of intersection, by claiming these must be areal / contact-induced rather than genealogical; but that would be a facile and arbitrary explanation, precisely the one which proponents of linkages have made a point of debunking (in a way which I've found convincing).

All that said, part of the above is my interpretation, or extrapolation from my readings, and from my understanding of how the model works. I agree that the authors don't make these claims explicit enough, and sometimes give the impression (without claiming it explicitly) that the (potential) dialects have to be sort of tied to specific locations for the model to work; hopefully they could make these points clearer in future publications (assuming they ever read this discussion…). Thanks for raising those points, they've made me think more about these issues, and made me formulate my interpretation better. I hope I've been clear enough -- and sorry if this was too long. — Womtelo (talk) 13:25, 5 November 2014 (UTC).

The need for less partisan sources
Again, the main linkage theory papers are not unbiased sources for claims on what the tree model is and is not. They consistently favor interpretations that are easier to attack.

The {BCD} {CDE} {DEF} example is an excellent example of a situation where the linkage theory claims to supercede the tree model via a subtle redefinement of terminology. They would argue that, at time zero, we have "proto-BCD", "proto-CDE" and "proto-DEF", and that each one of them is an ancestor of D. However, "descent" from an "ancestor" does not mean "whereever a particular feature originated", but instead "what was the transmissional origin of this language". English loaning "they" or "sky" or "die" does not make English 0.1% descended from Old Norse — it remains 100% descended from Anglo-Saxon, no matter how many features of Norse it were to absorb, given that the two had already acquired distinct existence in Proto-Northwest Germanic times. Similarly D in this scenario does not descend at all from B, C, E, or F, nor from any set-theoretically defined entity: it descends only from the original D.

The implicit assumption being smuggled in is that a proto-language was a (well-defined or not) set of language varieties (even: a Dachsprache), rather than a single language variety. Some conceptions of what a proto-language is do claim this (and might hold e.g. that the proto-language of Modern English varieties was Middle English, since all English varieties have shared innovations from this era) — but not all do (and might hold e.g. that Proto-English was the Germanic language variety that first set foot on Britain, since each British dialect it has spawned has evolved largely in parallel).

Another such case is the term "shared innovation". The tree model reserves this solely for synapomorphies that occurred in a common ancestor. The linkage model extends this (and freely admits that it is doing so) to innovations that occurred across a dialect continuum. It is however a basic fact of cladistics that subgroups defined by synapomorphies cannot ever possibly intersect; if two innovations intersect, then at least one of them is not a synapomorphy. So "genealogical" subgroups may well intersect; yet "genetic" subgroups continue not to, and the two concepts cannot be equated.

Third example: the idea that tree-model subgroups require discontiguity (not only distinctness) from one another is definitely a non-standard idea in linguistic phylogeny as far as I am aware (granted, it's common enough in biology), and framing it as an core position of the tree model is again not NPOV. -- Trɔpʏliʊm • blah 16:22, 6 November 2014 (UTC)
 * hi Tropylium. I think some of your points are insightful, but there remain problems. Thus, your example of Old Norse is totally off topic: recall that the requirement for belonging in a dialect continuum is that the interacting dialects should be mutually intelligible; but you don't think that Old Norse and Proto-English were part of the same dialect network, do you? So, this is just a case of areal contact between already separate languages. You need to find a better example, i.e. one actually taken from a dialect continuum.
 * The general problem in your analysis is that you seem to adopt a top-down approach, where you first make a whole lot of assumptions, and then try and fit the data into them. For example, the assumption that there is such a thing as a "single language variety", which would not be a "set of language varieties". This sounds like a naïve approach, which any sociolinguist or field linguist would disagree with. So yes of course, any node in any tree is necessarily a set of language varieties, more precisely a set of mutually intelligible idiolects; this is the only way to define a "single language". Even Latin as spoken just within Rome was such a network of varieties.
 * A second fiction is the notion that a synapomorphy is a simple, atomistic "event" that takes place "once" in a language like a gene mutation. This is total fiction.  All linguistic innovations can only affect a language if they gradually diffuse across the social network that is the community of speakers. Depending on the innovation and on the sociolinguistics of the community, that process of diffusion may take just a few weeks, or several generations.
 * This still makes it possible to make the sorts of distinctions you're alluding to: for example, one should distinguish between (#1)_a change affecting Latin when it was still restricted to the Latium area (=diffusion across the various lects of Latin speakers at that time), and (#2)_a change affecting Romance dialects in Spain in the 8th century AD (=diffusion across the various lects of Romance speakers of that time and region).  Be reassured, I am not mixing the two, nor confusing the time periods. But the thing that should be uncontroversial is that, in all cases, a linguistic innovation can only affect a language if it spreads across the set of language varieties (=idiolects) that compose it. This diffusion may affect the whole community, but quite often it affects only a subset of the social network. The important consequence of this point, is that there is no such thing as a synapormophy that would be a simple event affecting just a "single language";  instead, we're always dealing with diffusion across a network, whether we're dealing with a "dialect continuum" proper, or not. This is a key point which language phylogeneticists often forget, often because they do not do fieldwork and observe (or reflect on) how languages actually change in real life.
 * So now, to come back to your questions, the issue is: What linguistically realistic definition should we give of a synapomorphy? I say “linguistically realistic”, because we historical linguists must not start drawing a tree on the blackboard and comment on that model (top-down);  but rather, start from the real facts of language change and how they actually take place on the ground, and figure out how best we should model them. (I don't mind if we arrive at different conclusions, but at least we should start from language facts and go bottom-up towards defining a good model.)
 * Is my Latium example #1 above a case of synapomorphy? I guess quite consensually, yes, insofar as this change, whatever it is, affected the whole Latin area, and was later carried across by all Roman speakers when they invaded Western Europe. Now, what about example #2? What does it take for an innovation that took place in Spain, to be a "synapomorphy" so as to qualify as "genetic" in your terms? [NB: in linguistics, "genealogical" is strictly synonymous with "genetic"]. If you're ever gonna try and fit "Spanish" in a tree of Romance languages (if you believe in such a thing) then you're gotta try somewhere, right? so if you want to end up with a node for "(Castilian) Spanish" you're going to need some "synapomorphies" there, to justify your node. Now I'd be curious, sincerely, of where you are going to find them. If you're a cladophile (tree advocate), you probably wish that all the innovations that define modern Spanish would have happened somewhere on a small island (say, in the Balears) where a small population, speaking a "single language variety" of Pre-Spanish, would have innovated enough to define Spanish (a bit like your Proto-English example); and then, only later, would that isolated population have migrated to mainland Spain, where it would have come into contact (areally) with other close languages like Portuguese, Galician or Catalan.  That would be a nice scenario, which would make it easy to distinguish between "genetic" synapomorphies (=whatever happened in that island) and "areal" stuff (whatever came afterwards). But the big problem with this nice fairy tale (brought to us by the tree model) is that it fails to account for the actual historical facts of how languages evolve. There was never a single node for Spanish, or anything like the social isolation that would enable us to pick "genetic" innovations from "areal" ones.  The way Castilian was formed (and all other Romance lgs for that matter) was through a long sequence of innovations that affected Latin in Iberia, during the first millennium AD. Some of these innovations encompassed Pre-Castilian with Pre-Galician (initially, Latin-as-it-was-spoken in these respective regions), others Pre-Castilian with Pre-Catalan, others Pre-Castilian and Pre-Portuguese, others just Pre-Castilian alone, and others again, affected the whole of Western Romance. All those innovations (sound change, morphological change etc) are exactly of the same nature (linguistically speaking) as the ones that define a synapomorphy. So my question is:  which ones here are we going to identify as the synapomorphies? My answer:  they are all synapormophies, but of different and overlapping subgroups. As long as the varieties remained mutually intelligible (=for several centuries in this case), any innovation that affected Pre-Castilian with Pre-Galician can legitimately be analysed as a synapomorphy of the Castilian—Galician subgroup; any innovation that affected Pre-Castilian with Pre-Catalan was a synapomorphy of the Castilian—Catalan subgroup; etc. Remember that the innovations I'm talking about happened roughly at the same period, and did not result in the loss of intelligibility; so that it would be impossible (pace tree advocates) to decide that the "first" of these shared innovations defines the "real" genetic unit;  but that the cross-cutting innovation that took place 20 years later was suddenly a case of "contact". Such terminology would be very unhelpful, adhoc and unable to do justice to facts. Michigan English today shares features with Ontario English, and others with North-east Cities English, and other features with Midwest English… Which ones are we going to label "genetic", and which ones "areal"?  Do we have a principled way to draw the line? And even if we had one, there is no guarantee that we would necessarily end up with a tree structure — why should we? Nothing in real life predisposes languages for diversifying in a cladistic way.
 * So Castilian is a bit like the "D" in the example you were discussing. Yes, the strict ancestor of D is D, and just D; likewise the ancestor of Castilian Spanish is essentially Latin as it was spoken in the Castilian area (or anywhere in the world before it arrived there). However, the purpose of linguistic genetics is to reconstruct the structure of language families, right? so as to understand how Castilian came to be that way, and which other languages are closest to it.  So in this sense, if we're going to speak of shared ancestors, then D indeed has both {BCD} and {CDE} as its shared ancestors, just like Castilian shares an ancestor with Catalan, and it also shares an ancestor with Portuguese. (and earlier than that, all these languages shared another ancestor, namely Latin;  which itself was a member of the early Italo-Celtic linkage, etc).
 * So yes, "genetic subgroups" can, and do, intersect. I admit that these terms are here defined in a novel way which historical linguists are not used to; but I think this pays better justice to the historical reality. And if we're really thinking about what goes on in the real life of speakers (rather than trying to save the assumptions of an abstract, largely arbitrary model), then that is the sort of theoretical refinements that we may find fruitful for improving our understanding of the world.
 * all the best, — Womtelo (talk) 19:08, 6 November 2014 (UTC).