Lumpers and splitters

Lumpers and splitters are opposing factions in any discipline that has to place individual examples into rigorously defined categories. The lumper–splitter problem occurs when there is the desire to create classifications and assign examples to them, for example, schools of literature, biological taxa, and so on. A "lumper" is a person who assigns examples broadly, judging that differences are not as important as signature similarities. A "splitter" makes precise definitions, and creates new categories to classify samples that differ in key ways.

Origin of the terms
The earliest known use of these terms was thought to be Charles Darwin, in a letter to Joseph Dalton Hooker in 1857: It is good to have hair-splitters & lumpers. But according to research done by the Deputy Director at NCSE, Glenn Branch the credit is due to naturalist Edward Newman who wrote in 1845 "The time has arrived for discarding imaginary species, and the duty of doing this is as imperative as the admission of new ones when such are really discovered. The talents described under the respective names of 'hair-splitting' and 'lumping' are unquestionably yielding their power to the mightier power of Truth."

They were then introduced more widely by George G. Simpson in his 1945 work The Principles of Classification and a Classification of Mammals. As he put it: "... splitters make very small units – their critics say that if they can tell two animals apart, they place them in different genera ... and if they cannot tell them apart, they place them in different species. ... Lumpers make large units – their critics say that if a carnivore is neither a dog nor a bear, they call it a cat."

A later use can be found in the title of a 1969 paper "On lumpers and splitters ..." by the medical geneticist Victor McKusick.

Reference to lumpers and splitters in the humanities appeared in a debate in 1975 between J. H. Hexter and Christopher Hill, in the Times Literary Supplement. It followed from Hexter's detailed review of Hill's book Change and Continuity in Seventeenth Century England, in which Hill developed Max Weber's argument that the rise of capitalism was facilitated by Calvinist Puritanism. Hexter objected to Hill's "mining" of sources to find evidence that supported his theories. Hexter argued that Hill plucked quotations from sources in a way that distorted their meaning. Hexter explained this as a mental habit that he called "lumping". According to him, "lumpers" rejected differences and chose to emphasize similarities. Any evidence that did not fit their arguments was ignored as aberrant. Splitters, by contrast, emphasised differences, and resisted simple schemes. While lumpers consistently tried to create coherent patterns, splitters preferred incoherent complexity.

Biology
The categorization and naming of a particular species should be regarded as a hypothesis about the evolutionary relationships and distinguishability of that group of organisms. As further information comes to hand, the hypothesis may be confirmed or refuted. Sometimes, especially in the past when communication was more difficult, taxonomists working in isolation have given two distinct names to individual organisms later identified as the same species. When two named species are agreed to be of the same species, the older species name is almost always retained dropping the newer species name honoring a convention known as "priority of nomenclature". This form of lumping is technically called synonymization. Dividing a taxon into multiple, often new, taxa is called splitting. Taxonomists are often referred to as "lumpers" or "splitters" by their colleagues, depending on their personal approach to recognizing differences or commonalities between organisms.

For example, the number of genera used in Pteridophyte Phylogeny Group (PPG) I has proved controversial. PPG I uses 18 lycophyte and 319 fern genera. The earlier system put forward by Smith et al. (2006) had suggested a range of 274 to 312 genera for ferns alone. By contrast, the system of Christenhusz & Chase (2014) used 5 lycophyte and about 212 fern genera. The number of fern genera was further reduced to 207 in a subsequent publication. Defending PPG I, Schuettpelz et al. (2018) argue that the larger number of genera is a result of "the gradual accumulation of new collections and new data" and hence "a greater appreciation of fern diversity and [..] an improved ability to distinguish taxa". They also argue that the number of species per genus in the PPG I system is already higher than in other groups of organisms (about 33 species per genus for ferns as opposed to about 22 species per genus for angiosperms) and that reducing the number of genera as Christenhusz and Chase propose yields the excessive number of about 50 species per genus for ferns. In response, Christenhusz & Chase (2018) argue that the excessive splitting of genera destabilises the usage of names and will lead to greater instability in future, and that the highly split genera have few if any characters that can be used to recognize them, making identification difficult, even to generic level. They further argue that comparing numbers of species per genus in different groups is "fundamentally meaningless".

History
In history, lumpers are those who tend to create broad definitions that cover large periods of time and many disciplines, whereas splitters want to assign names to tight groups of inter-relationships. Lumping tends to create a more and more unwieldy definition, with members having less and less mutually in common. This can lead to definitions which are little more than conventionalities, or groups which join fundamentally different examples. Splitting often leads to "distinctions without difference", ornate and fussy categories, and failure to see underlying similarities.

For example, in the arts, "Romantic" can refer specifically to a period of German poetry roughly from 1780 to 1810, but would exclude the later work of Goethe, among other writers. In music it can mean every composer from Hummel through Rachmaninoff, plus many that came after.

Software modelling
Software engineering often proceeds by building models (sometimes known as model-driven architecture). A lumper is keen to generalize, and produces models with a small number of broadly defined objects. A splitter is reluctant to generalize, and produces models with a large number of narrowly defined objects. Conversion between the two styles is not necessarily symmetrical. For example, if error messages in two narrowly defined classes behave in the same way, the classes can be easily combined. But if some messages in a broad class behave differently, every object in the class must be examined before the class can be split. This illustrates the principle that "splits can be lumped more easily than lumps can be split".

Language classification
There is no agreement among historical linguists about what amount of evidence is needed for two languages to be safely classified in the same language family. For this reason, many proposed language families have had lumper–splitter controversies, including Altaic, Pama–Nyungan, Nilo-Saharan, and most of the larger families of the Americas. At a completely different level, the splitting of a mutually intelligible dialect continuum into different languages, or lumping them into one, is also an issue that continually comes up, though the consensus in contemporary linguistics is that there is no completely objective way to settle the question.

Splitters regard the comparative method (meaning not comparison in general, but only reconstruction of a common ancestor or protolanguage) as the only valid proof of kinship, and consider genetic relatedness to be the question of interest. American linguists of recent decades tend to be splitters.

Lumpers are more willing to admit techniques like mass lexical comparison or lexicostatistics, and mass typological comparison, and to tolerate the uncertainty of whether relationships found by these methods are the result of linguistic divergence (descent from common ancestor) or language convergence (borrowing). Much long-range comparison work has been from Russian linguists belonging to the Moscow School of Comparative Linguistics, most notably Vladislav Illich-Svitych and Sergei Starostin. In the United States, Greenberg's and Ruhlen's work has been met with little acceptance from linguists. Earlier American linguists like Morris Swadesh and Edward Sapir also pursued large-scale classifications like Sapir's 1929 scheme for the Americas, accompanied by controversy similar to that today.

Religious studies
Paul F. Bradshaw suggests that the same principles of lumping and splitting apply to the study of early Christian liturgy. Lumpers, who tend to predominate in this field, try to find a single line of successive texts from the apostolic age to the fourth century (and later). Splitters see many parallel and overlapping strands which intermingle and flow apart so that there is not a single coherent path in the development of liturgical texts. Liturgical texts must not be taken solely at face value; often there are hidden agendas in texts.

The idea of a single Hindu religion is essentially a lumper's concept, sometimes also known as Smartism (on the basis of the Smārta synthesis). Hindu splitters, and individual adherents, often identify themselves on the other hand as adherents of a religion such as Shaivism, Vaishnavism, or Shaktism - according to which deity they believe to be the supreme creator of the universe.

Various "holistic" approaches to religion can prioritise themes such as individual spirituality, the New-Age-style essential oneness of multiple religious traditions, or religious fundamentalism.

Philosophy
Physicist and philosophy writer Freeman Dyson has suggested that one can broadly, if over-simplistically, divide "observers of the philosophical scene" into splitters and lumpers - roughly corresponding to materialists (who imagine the world as divided into atoms) and Platonists (who regard the world as made up of ideas).

Psychiatry
In psychiatry, the 'splitters' and the 'lumpers' have fundamentally different approaches to psychiatric diagnosis and classification. First, 'splitters' emphasise the heterogeneity within the diagnostic categories and argue that this heterogeneity drives the 'splitting' process'. 'Lumpers', on the other hand, point to the similarities between the diagnostic categories, and suggest that these similarities justify the creation of broader entities. Thus lumpers might see "stress" where splitters could identify (say) worry, grief, or some sort of anxiety disorder.

Neuroscience
In neuroscience, "uncertainty aversion" and "uncertainty tolerance" in semantic representations appear to correlate with the terms "splitters" and "lumpers" respectively. As neuroscientist Marc-Lluís Vives observes:"'Our survival is possible because every day we make use of previously acquired categories to navigate the world. Every single mug we encounter is distinct, but fundamentally the same. Thanks to this powerful capacity to classify distinct stimuli under the same category, we can generalize our knowledge from the previously encountered subset of mugs to a future subset of mugs. However, this also posits a dilemma: Is a glass mug still a mug? That is, what are the defining principles that make something a 'mug'? Establishing this is fundamental since it also affects its relationship with its close-neighbors. Conceptualizing a mug as very different from a glass creates a more clear-cut mapping between the input—that is, the stimulus perceived—and the output that a person needs to generate—that is, the response, such as drinking coffee. Classical work in cognitive science demonstrates that the more similar two stimuli are, the harder it is to discriminate them and respond with different behavior.'"

Artificial intelligence and linguistics
Natural Language Processing, using algorithmic approaches such as Word2Vec, provides a way to quantify the overlap or distinguish between semantic categories between words. This can provide a sense of how often the contexts of words overlap or are dissimilar in general usage.