Wikipedia:Reference desk/Archives/Science/2022 August 25

= August 25 =

Guaranteed twins
How large would the population of humanity be, when it reaches the point where the pigeon-hole principle says there is 100% certain to be a pair of genetically identical individuals who are none the less not related? Would it be 2 ^ 2.3 billion by an chance? Duomillia (talk) 21:37, 25 August 2022 (UTC)


 * It's an interesting question, and we could quibble over how you might tighten up some of the definitions in order to refine the question... but I think the first order of business is to ascertain how we could find a reliable reference for such a question. Which reputable and peer-reviewed bioinformatics research publications would be most likely to cover such hypotheticals?
 * Nimur (talk) 22:53, 25 August 2022 (UTC)
 * I'm not sure what is needed from a reference, other than the number of base pairs in the human genome. Our article says there are 3 054 815 472 base pairs, and each base pair can have one of four different amino acids, so there are 43 054 815 472 possible different human genomes. But most of these genomes won't result in a viable living human being. It is of course far beyond the realm of possibility for that many actual human beings to exist in this universe. CodeTalker (talk) 23:40, 25 August 2022 (UTC)


 * ... because there would be millions of digits in the number that describes the number of zeroes after the 1 when you calculate how many humans that would be, right? Duomillia (talk) 04:12, 26 August 2022 (UTC)
 * Assuming each human being weighs at least as much as a single neutron, the combined mass of that many human beings would exceed $6 kg$, while the mass of the observable universe is estimated to be a mere $1.5 kg$. Of course, if the universe is infinite and homogeneous at all large scales – a possibility not excluded by current observations and accepted theory – all viable human genomes are present an infinite number of times in actual lifeforms living on planets indistinguishable from our Earth. --Lambiam 08:16, 26 August 2022 (UTC)
 * To use the number of possible human genomes to answer the original question, we must restrict it to viable genomes. The number of viable genomes is a vanishing fraction of the number 4$3,054,815,472$. The total number of genes may be 50,000, most of which do not code for proteins, which, however, does not imply they have no function. The number coding for proteins is like 20,000. Most of these genes exist in different variants, the simplest of which is that it is deleted in some individual. It is not unreasonable to assume that if a gene variant can occur in a viable genome, it occurs somewhere, but thus far the combined human genome projects have sampled only a minute fraction of the human population. Another confounding issue is that two individuals may (in theory) have exactly the same sets of genes, but that they occur in a different order or on different chromosomes. For what I suppose is a lower bound, let us only consider the set of genes that code for proteins. If half of these can exist in only one version and the rest in two variants, we get some 2$10,000$ ≈ $2$ viable gene sets, enough to fill unimaginably many observable universes to the point where they collapse into black holes. --Lambiam 08:44, 26 August 2022 (UTC)
 * It says here that any two people differ by only 0.1% of their base pairs (so, 3,000,000). But most of this DNA variation is in DNA that doesn't affect the phenotype, because 99% of DNA doesn't code for anything. Furthermore, the majority of human genes (I seem to recall about 2/3) are essentially homozygous, because they code for things that are so important that any variants will be rapidly eliminated from the population. So that leaves about 10,000 differences between any two individuals. But we do not mate like salamanders all in the same pond, so all us will have large stretches of DNA in common with some other people due to shared ancestry (and a history of massive inbreeding). The original question was not, "what are the odds that any one (non-twin) individual has a genetically identical person in the world?", but "what are the odds that there is one pair of identical non-twin individuals in the 8 billion individuals that constitute the human population?" The answer to the second question is that there are almost certainly a handful of such pairs around right now. Abductive  (reasoning) 10:27, 26 August 2022 (UTC)
 * No, there really isn't. The human population has about a 50-generation limit back to a full set of common ancestors (which is to say that it takes roughly 50 generations until we go back far enough for everyone to be related to everyone else) and that still means that there are 210000 possible variations in that set of base pairs.  Certainly, very improbable things do happen from time to time, but it should not be an expectation that an event that is not likely to happen more than once before the heat death of the universe to happen multiple times right now.  -- Jayron 32 12:12, 26 August 2022 (UTC)
 * So, you're saying is that we all have the same common ancestor from 1250 years ago? And in that time we all got 10000 different mutations? Abductive  (reasoning) 19:45, 26 August 2022 (UTC)
 * Doubtless Jayron will clarify on his own account, but I think the scenario is not that everyone alive now is descended from one person alive then, but rather that the people alive then can be divided into two (rather large) groups (say A and B), and that everyone alive now is descended from everyone in group A and from no-one in group B. Population genetics can be counterintuitive at times.
 * On another point, I would note that even if two people have the same genome, they will differ in their epigenetic characteristics and their environmental influences, including having experienced different hormonal and other exposures in the womb, even if it was the same womb, as in identical twins, etc. I used to be acquainted with a set of fraternal triplets, two girls and a boy: the boy and one girl bore the usual resemblances of brother and sister; the other girl was quite unlike both of them, presumably because of her hormonal exposures differing from those experienced by her sister. {The poster formerly known as 87.81.230.195} 90.208.90.29 (talk) 20:22, 26 August 2022 (UTC)
 * But that is my point; people are divided into groups (such as ethnic groups) that are highly related, so the odds drastically increase. Also, we know that even identical twins will vary in a few SNPs, so nobody is truly genetically identical. But unless a mutation hits an important section of DNA (and these are vanishly rare) that manifests in the phenotype, there are many sets of phenotypically identical "twins" not borne of the same mother in the world. Abductive  (reasoning) 20:50, 26 August 2022 (UTC)
 * The OP is not asking about the likelihood of the existence of a pair of genetically identical individuals, but about the number of humans required for the pigeonhole principle to guarantee its existence. --Lambiam 08:58, 27 August 2022 (UTC)
 * To apply the pigeonhole principle, we require that the number of nucleotides in a "human" is very precisely defined. I posit that this number is not well-defined.  While we're playing hypothetical games by imagining every plausible permutation of DNA, we're glossing over profound questions that necessarily emerge.  In a way of demonstration: Is a Neanderthal a "human"?  If not, what number of nucleotides must be added, removed, or changed, until the hypothetical intermediary is a human?  In this hypothetical game, we're asked to consider every possible permutation.  (Again: review the article on the principle at issue here - we're discussing mathematical properties of finite and infinite sets, so we have to contemplate a literally infinite set of genetic sequences, intersected by an as-yet-unspecified criterion that defines "human-ness"...  like, if you can't see the problem, consider every possible thing that ever could evolve into a human, and every possible thing that could ever evolve from a human, and recognize that each of these is only a small perturbation from its immediate antecedent... )
 * It's equally absurd to ask us to apply the pigeonhole principle, say, to make assertions about the count of the number of grains of sand in a heap. That'd be a variant of the Sorites paradox.  How many heaps would you need in order to guarantee that two of them have the same number of granules, using the pigeonhole principle?  (Hint, this is not a combinatorics calculation - it's a semantic demonstration intended to illustrate the limitations of weakly-defined entities).
 * I think that's why I opened by asking where we might even find a reliable reference. To validate any answer, we should start by asking who the experts are, and how they would even frame the problem.  Otherwise, we can run in circles computing non-answers derived from numbers that are little more than trivia-factoids, and that's not moving the discussion toward a really scientific answer.
 * Nimur (talk) 06:22, 29 August 2022 (UTC)
 * The pigeonhole principle is invoked in several reliable sources for a demonstration that there are at least two persons in Paris/New York/Santa Barbara/... who have exactly the same number of hairs sprouting from their pates. The illustration of the PP by applying it to the number of hairs is undoubtedly due to an alleged early use by Pierre Nicole, as related by Charles Augustin Sainte-Beuve. He might as well have applied it to the number of grains in heaps of sand; there is nothing paradoxical about counting grains in a sand heap that is not at least equally paradoxical about counting hairs on someone's head. --Lambiam 08:33, 29 August 2022 (UTC)


 * The OP might be thinking about this report: ←Baseball Bugs What's up, Doc? carrots→ 19:49, 26 August 2022 (UTC)