The Maximum Genetic Diversity Theory: A Comprehensive Framework for Understanding Evolutionary Processes

A robust framework of evolutionary theories underpins all biological phenomena and is essential for understanding the mysteries that captivate us, including our own origins. The theoretical exploration of the evolutionary process has a rich history. Early evolutionary thought was pioneered by the French naturalist Jean-Baptiste Lamarck, who, in 1809, published ‘Philosophie Zoologique’ (1). Lamarck proposed an evolutionary theory based on two forces: a complexifying force that drives evolutionary progress from simple to complex forms, and an adaptive force that suggests environmental effects on traits can be inherited across generations.

Charles Darwin and Alfred Wallace independently conceived the idea of natural selection, presenting their joint papers to the Linnean Society of London in 1858. Similar concepts were also independently developed by three other English naturalists – Edward Blyth, Patrick Matthew, and William Wells – between 1813 and 1859. The era's racism and the expansion of the British Colonial Empire may have influenced the popularity of the ‘survival of the fittest’ ideology within English society (2, 3). Central to this theory are the concepts of common descent and natural selection as a mechanism to filter variations.

Among these five individuals, Blyth believed that the role of natural selection was very limited, merely maintaining the basic status quo of species, eliminating individuals that were not adapted, and ensuring the long-term stable existence of species. Wallace believed that natural selection did not apply to the emergence of human consciousness. Only Darwin considered natural selection to be nearly omnipotent, and only Darwin provided the most systematic and comprehensive description of examples supporting the concept of natural selection. His monumental work, ‘On the Origin of Species’, was published in 1859.

Based on theoretical advancements in the past two decades, Blyth's understanding of natural selection appears to be more in line with reality. The primary role of natural selection may be to maintain species stability. However, Blyth failed to note that natural selection may also play a role in the microevolution of species or species formation involving no major epigenetic or phenotypic changes (further discussion to follow).

In the early 20th century, the rediscovery of Mendelian genetics led to the modern synthesis, which combined Darwin's theory with genetic principles. The modern synthesis posits that adaptive evolution results from natural selection acting on genetic mutations. While this synthesis is widely accepted as a valid explanation for microevolutionary changes, its assertion that macroevolution operates through the same mechanisms is more controversial and has few if any experimental support. As one of the founders of the modern synthesis Simpson puts it: ‘If the two proved to be basically different, the innumerable studies on microevolution would become relatively unimportant and would have minor value in the study of evolution as a whole’ (4). Additionally, this theory remains incomplete as it fails to account for a surprising discovery in molecular evolution made in the early 1960s: the genetic equidistance phenomenon (GEP).

Since the discovery of the GEP, two theories of molecular evolution have emerged: first the neutral theory (NT) (plus the molecular clock hypothesis) and then the maximum genetic diversity (MGD) theory. Both theories acknowledge the proven virtues of natural selection. The MGD theory also incorporates the valid concepts of the NT but considers its core claim to be flawed. This review examines these competing theories. Notably, there is a recent textbook by Bickel (5) that provides a comparative analysis of these theories. Additionally, these theories are introduced in a recently published Chinese textbook (6).

The GEP

In the early 1960s, molecular or genetic differences between species were measured for hemoglobin, cytochrome C, and fibrinopeptides (7,8,9). In these studies, genetic distance, a measure of molecular differences between species, was represented by the percentage non-identity in orthologous protein sequences, which is termed the identity matrix. The results varied depending on whether the reference species used for comparison was the most complex or the least complex, with complexity defined by the number of cell types. When humans were used as the reference species, it was observed that there was a gradually increasing amount of difference in hemoglobin when compared with progressively more distant species (7,8,9). This observation led Zuckerkandl and Pauling (7) to informally propose the universal molecular clock that hemoglobins from different species are changing at a steady and similar rate (10).

If, on the other hand, the reference species is the least complex among those being compared – for instance, when fish is the reference in comparison to humans, birds, and frogs (Figure 1) – a highly unusual result was observed: the GEP (8). This shows that fish is approximately equidistant to humans, birds, and frogs. Similarly, yeast cytochrome C is equidistant to all multicellular organisms. An example of the GEP using the Dot1 protein is shown in Table 1 (11). The GEP has been confirmed for nearly all proteins involving 15 species; for instance, bacteria are equidistant to yeast and human in all 196 protein orthologs available for the three species (11, 12). Data demonstrating the GEP have occasionally been presented in some papers without formal acknowledgment (13). Here, equidistance means that the distances are similar, though not necessarily identical, in numerical values.

Table 1.

Percentage non-identity among species in Dot1.

Species	Percentage non-identity

	YEASA	CAEEL	STRPU	DANTE	XENTR	TAEGU	MYOLU
CAEEL	71
STRPU	73	69
DANTE	70	66	29
XENTR	71	68	31	9
TAEGU	72	67	32	8	5
MYOLU	72	67	31	6	3	2
HUMAN	72	67	31	7	3	2	1

CAEEL, Caenorhabditis elegans (worm); DANRE, Danio rerio (Zebrafish); HUMAN, Homo sapiens (Human); MYOLU, Myotis lucifugus (Little brown bat); STRPU, Strongylocentrotus purpuratus (Purple sea urchin); TAEGU, Taeniopygia guttata (Zebra finch); XENTR, Xenopus tropicalis (Western clawed frog); YEASA, Saccharomyces cerevisiae (Baker's yeast).

The table presents alignment results as measured by the identity matrix (percentage non-identity) for the Dot1 protein among a group of species, in the order from simple to complex. Equidistance to the simpler species can be seen as shown in each column. For example, yeast is equidistant to all the more complex species listed, with approximately 72% genetic difference.

To statistically demonstrate the GEP, multiple genes have been analyzed (11, 12). For example, for a simple species C, its equidistance to two more complex species B and A can be demonstrated by examining a set of randomly selected genes. If approximately half of all proteins show more identity between C and B and the other half show more identity between C and A, this suggests that C is equally distant from A and B. The statistical significance of this can be assessed using a chi-squared test. Conversely, if a significantly higher number of proteins (e.g., 90%) show greater identity between C and B than between C and A, this indicates non-equidistance, meaning C is closer to B than to A. The GEP was unanticipated and remains largely unrecognized today.

It should be noted that while the two types of alignments yield different results – one consistent with intuitive expectations and the other not – the GEP specifically refers to the alignment with the simpler species as the reference species, as measured by the identity matrix. The alignment demonstrating the GEP was performed by Margoliash (8), whereas Zuckerkandl and Pauling (7) did not use this approach.

The molecular clock hypothesis

Perhaps due to the influence of the modern synthesis, which disregards the issue of complexity or evolutionary progress toward higher complexity as proposed in Lamarck's theory, Margoliash (8), in his initial report of the GEP, emphasized time as the determining factor of genetic distance. With this observation, he formally proposed the universal molecular clock hypothesis, suggesting that different species have approximately the same substitution rate, implying that genetic distance between species is determined solely by time (10).

The molecular clock hypothesis as inspired by molecular distance in protein sequence alignment has two claims:

Different species have similar substitution rates.

The substitution rate of any given gene is constant over time.

Although the two types of distance results from the above alignments can both lead to the idea of a universal molecular clock and are merely two sides of the same coin, the GEP was unusual for two reasons. First, unlike the distance result with humans as the reference, deducing a molecular clock from the equidistance to a least complex reference species was more straightforward and did not require calculations or fossil dating. Since all more complex species are equidistant in evolutionary time to the least complex reference species, equidistance in molecular divergence implies equal rates of substitution for all species (Figure 1, under the tacit assumption that the molecular distance is still increasing with time rather than at a maximum level).

Second, the molecular distance results with humans as the reference have been presented in textbooks as evidence for Darwin's theory or the modern synthesis, suggesting that the more similar the phenotypes, the closer the molecular distance (although this pattern has many exceptions). However, the equidistance result is unexpected from the perspective of the modern synthesis: how can similar amounts of genetic change or a similar number of sequence differences between a simpler species and two more complex species result in vastly different degrees of phenotypic changes? Relative to the phenotypic changes from fish to frog, the changes from fish to human are far more pronounced. Despite being regarded by some scientists as ‘one of the most astonishing findings of modern science’ (14), the GEP has largely faded from collective consciousness. It has only been occasionally acknowledged by experts as the key finding that inspired the molecular clock hypothesis (10).

Numerous studies indicate that the molecular clock has only limited applicability, suitable only for similar or closely related species. Mutation rates among different vertebrate species can vary by up to 40 times (15). Living fossil species have lower mutation rates (16). Today, almost no experts agree that there is a universal molecular clock applicable to all species (17). The relaxed clock is commonly used today that acknowledges different mutation rates among species (18). However, researchers are often unaware of the significant implications of rejecting the ‘universal molecular clock’. They seldom realize that there is the GEP that has been overshadowed, misinterpreted, and replaced by the ‘universal molecular clock’. The collapse of the ‘universal molecular clock’ implies that the GEP once again becomes an unsolved mystery.

The NT of molecular evolution

The molecular clock hypothesis was initially met with significant resistance from classical evolutionary biologists, as Darwin's theory could not explain the existence of similar substitution rates among different species. Nonetheless, some researchers have treated the molecular clock as a genuine reality and subsequently proposed various theories to explain it (19,20,21,22,23,24). The NT of Kimura (22) is one such theory known to account for the molecular clock (24), even though it is widely acknowledged to be an incomplete explanation (25, 26). However, Kimura believes that the best evidence for his NT is the molecular clock (27).

The abstract of the Kimura (22) paper on the NT consists of a single sentence: ‘Calculating the rate of evolution in terms of nucleotide substitutions seems to give a value so high that many of the mutations involved must be neutral ones’. However, this calculation followed the logic of the molecular clock hypothesis, which includes two implicit assumptions that were taken for granted without deliberation. The first is that observed genetic distance always increases with time. The second is that every nucleotide in a genome is freely changeable or mostly neutral, meaning that alterations to any nucleotide position would not result in lethality. In fact, the mere use of the equation r = d/2t to derive the mutation rate is an example of begging the question: it assumes the conclusion – that the mutation rate is the same for the two species – by presuming that the genetic distance was contributed equally by both species.

Research concerning neutral processes in genetics began in the 1920s and culminated in Kimura's NT (22, 28). From the outset, researchers did not seek to deny the importance of natural selection but were instead interested in how neutral processes influenced adaptive evolution (29). Fisher (30) and Wright (31) posed questions and developed techniques in population genetics that are also relevant to the NT. Haldane's (32) genetic load argument played a crucial role in inspiring Kimura to develop his NT. Kimura (22) was influenced by the idea that if most variants were not neutral, the genetic load would be too high to be sustainable. Additionally, some researchers have suggested that much of molecular evolution is neutral (33, 34). However, Kimura (22) was the first to combine population genetics theory with molecular evolution data to formulate a theory of neutral evolution. While the NT has been used (unsuccessfully) to explain electrophoretic protein polymorphisms within species (35, 36), the data that directly inspired Kimura to propose the NT came from sequence variation among species (37, 38).

The NT makes these claims:

The observed sequence variations between different species are mostly neutral with no effects on fitness or not under natural selection. DNA segments that are assumed to play no functional roles or not under natural selection are viewed as junk DNA (39). Some estimation has concluded that the human genome is mostly, 90%, junk DNA (40, 41).

Substitution rate equals mutation rate. Evolutionary rate as measured in generation time rather than years is constant and similar among taxa (22, 28).

The infinite-site model is part of the neutral framework (42). This model has been widely used for interpreting observed polymorphisms and for constructing phylogenetic trees. A corollary of the infinite site assumption includes many unrealistic notions such as infinite genetic distance/diversity, no recurrent or back mutations, and no stage of evolution could be at equilibrium. However, this assumption has been invalidated by experimental data (43,44,45).

The NT classifies mutations into deleterious, neutral, and advantageous. This assumption overlooks the fact that all mutations including advantageous ones have a deleterious aspect as random noises to an ordered system. It does not recognize the fact that most variations appear neutral (only quasi-neutral) as a result of selections (46,47,48,49,50,51).

The NT views non-conservation as non-function (52). This assumption treats many poorly conserved genomic elements as neutral junk, such as repetitive and viral elements, and is the basis for deducing the conclusion of only 8% functional genome in humans (40). However, functional genomics research in recent years has revealed that there are essentially no junk DNAs (53). All DNA should function in a mechanical code, i.e., the mapping between the local sequence and the local deformability of DNA (54), or the spatial grammar, i.e., the location-dependent function of transcription factors (55), or the optimal positioning of the enhancers (56). Also, at least 25% of the 1 million predicted regulatory DNA elements in the human genome are derived from transposable elements (57).

The NT treats synonymous mutations as neutral (24, 58). This assumption has now been invalidated by systematic studies (59,60,61,62).

Mutations in different sites are independent.

Effective population size (N_e) is deduced to be related to genetic diversity (π) as in the formula π = 4N_eμ (mutation rate μ) and often tautologically derived by using the very genetic variables that it is meant to predict or explain (35).

The Hardy–Weinberg law: allele and genotype frequencies in a population will remain constant from generation to generation in the absence of other evolutionary influences. This law underlies nearly all methods in phylogenetic studies of human populations. Direct tests of this law is rare but a recent test has disproven it (46).

The NT is also used as an explanatory theory to explain genetic diversity within species. However, it is widely acknowledged that the NT is not a satisfying account for genetic diversity (46, 63,64,65). Some have even declared it dead as an explanatory theory of nature, while acknowledging its value as a useful null model (66). Direct experimental tests of the NT have disproven it (46). A multiyear, genome-wide analysis of a population of the microcrustacean Daphnia pulex demonstrates that temporal variation in selection intensity has a significant influence on nucleotide polymorphism and divergence (46). Most nucleotide sites experience fluctuating selection with mean selection coefficients near zero, with little covariance in selection strength across time intervals, and with selection distributed across large numbers of genomic islands of linked sites. Therefore, it is natural selection, rather than the presumed neutral drift, that explains the patterns of genetic diversity in real life.

Most in the field today merely view the NT as a null model useful for testing if a site is under natural selection. However, researchers interested in building phylogenetic trees still generally assume the NT to be a true account of nature and often unfoundedly present the trees based on the neutral assumption as uncertainty-free, e.g., the recent out of Africa model of modern human origins (67). Most phylogenetic methods require the neutral assumption, including methods such as the bootstrap intended to test the robustness of a phylogenetic tree (68).

The MGD theory of molecular evolution

Both the molecular clock and the NT provide valuable insights in many areas. However, they have limitations and may not fully account for the complexities of evolutionary processes. As shown by studies in the last two decades, the GEP remains unexplained by these theories. The molecular clock interpretation of the GEP is, in fact, a classic tautology – a mere ad hoc restatement of a distance phenomenon (69). It has not been verified by any independent observations, and on the contrary, has been contradicted by a significant number of observations (15, 16, 26, 69,70,71). Some have even declared it to be merely a mirage (25).

If the molecular clock is not a real phenomenon, it is only to be expected that no explanatory theory of nature could account for it. Indeed, the NT has not fully explained the molecular clock even though it was inspired by it (25, 26). The observed rate is measured in years but the NT predicts a constant rate per generation. Also, the theory predicts that the clock will be a Poisson process, with equal mean and variance of mutation rate. Experimental data have shown that the variance is typically larger than the mean. Ohta's (72) ‘nearly neutral theory’ explained to some extent the generation time issue by observing that large populations have faster generation times and faster mutation rates, but remains unable to account for the great variance issue.

Around 2005, open access to genome sequencing results of various species allowed us to compare the sequences of fish, frog, and human for the retinoblastoma interacting zinc-finger-protein 1/PR domain containing 2 (RIZ1/PRDM2) tumor suppressor gene that we had been studying (73, 74). We first used the human RIZ1 protein sequence as a query to search for homologous proteins in the GenBank database, which is a routine practice, and obtained the expected alignment identity results showing that humans are closer to frogs than to fish. Next, we conducted an unconventional experiment by using the fish RIZ1 protein as the query, fully expecting – like most researchers would – that fish would be closer to frogs than to humans. To our surprise, we observed the GEP, with fish being approximately equally different from both frogs and humans. This led us to recognize that the molecular clock interpretation of the GEP might be more of a tautology than a definitive explanation (69). After three years of in-depth research, we published the MGD hypothesis in 2008, which offers a new interpretation of most major evolutionary phenomena (74, 75). We also later identified a new characteristic of the GEP, known as the overlap feature, which has not yet been recognized or explained by existing theories (76).

Here we summarize the terminology and concepts of the MGD theory.

Microevolution

In popular evolutionary theory, the term ‘microevolution’ refers to evolutionary changes within a species or a small group of organisms, especially over short periods. In the MGD theory, however, the term encompasses evolutionary changes both within a species and between species over both short and long periods, as long as these changes do not involve major epigenetic modifications.

Macroevolution

The term macroevolution in the popular theory applies mainly to the evolution of whole taxonomic groups over long periods of time. In the MGD theory, the term describes new species formation that involves both genetic and epigenetic changes, particularly involving increases in epigenetic complexity. The number of cell types is used to define and quantify complexity. As the number of cell types is directly related to the number of epigenetic programs (each cell type is a unique epigenetic program) or epigenetic complexity, advances in complexity during macroevolution is thus reflected by major epigenetic changes. Each neuron is likely a unique cell type with a unique synaptic connectome. A typical human neuron has an average of about 1000 to 10,000 synapse connections with other neurons. Even within the same type of seemingly homogeneous neurons, epigenetic heterogeneity or diversity exists and plays a role in memory formation, rather than being neutral or simply noise (77). Therefore, complexity is largely defined by the number of types of neurons. Given that phenotypes may change throughout an organism's lifetime, the complexity of its phenotype refers to the peak level of complexity during its healthy lifespan.

MGD

The central problem of the field has always been the old riddle of what determines genetic diversity (35, 63, 78, 79). The study of variation and diversity has a long history (80). For over two decades before 1960s, there were two schools of thought in hot debate, with one believing in very little genetic variation within a population of a species, and the other hypothesizing extensive genetic variation maintained by natural selection (35, 79). In the mid-1960s, the application of protein electrophoresis method showed for Drosophila, humans, and other organisms that there were extensive variations in many proteins (35, 79). The debate shifted to how to best explain such high diversity and the mysteriously narrow range of genetic diversity levels seen across taxa that vary markedly in their census population size (63). Two main schools of thought evolved from the earlier two: one originally believed in little genetic variation and now favors neutral drift, while the other continues to support natural selection. The debate remains unsettled and has been largely neglected over the past three decades as the field has shifted its focus to generating increasing amounts of diversity data using newer techniques.

Genetic diversity within a species is commonly measured as the nucleotide diversity, defined by Nei and Li (81) in 1979 as the average number of nucleotide differences per site between two DNA sequences in all possible pairs in the sample population. In this sense, genetic diversity is equivalent to genetic distance or sequence mismatches. The new concept of MGD is mainly in regard to the form of genetic diversity as measured by genetic distance. For a group of organisms with multiple taxa exhibiting similar phenotypes, the genetic diversity of the group with respect to a gene sequence is the percentage of positions in the sequence that differ among taxa. Genetic distances between species in specific genes were first reported in the early 1960s (7,8,9). The genetic diversity of an individual is commonly measured by the amount of heterozygosity, which is equivalent to the sequence difference between the father and the mother of the individual. It can also be measured by the minor allele content (MAC) or the total amount of minor alleles in an individual genome. Minor alleles are in general less fit and represent deviations from the optimal genetic state.

It is important to distinguish between low genetic diversity and low heterozygosity. While low genetic diversity is manifested as both low heterozygosity and low MAC or a low number of deleterious alleles, low heterozygosity by itself, associated with higher amount of deleterious alleles without low MAC, could result from inbreeding (82).

Many authors use the term genetic diversity to refer only to genetic variations within a species or population (63, 83). But many also use the term to refer to genetic variations both within and between populations or species (84). The popular theory of molecular evolution, the NT, was directly inspired by a genetic distance phenomenon between species (the GEP) and its interpretation, the molecular clock (37, 38). This theory is commonly used to interpret genetic diversity patterns both within and between species. Therefore, it would be inconsistent if the term ‘genetic diversity’ only covered genetic variations within a species or population.

The existing concepts of genetic diversity remain unchanged in the MGD theory. However, the new theory introduces an additional concept: the upper limit concept. Genetic diversity or distance, as measured in most genes by identity matrix, would increase with time but cannot reach 100% non-identity. The increase would stop at an upper limit level, leaving many sequence positions unchanged. The upper limit of the percentage of positions in the sequence that differ among taxa within a group of similar organisms is called the MGD of the organism. The upper limit of the percentage of positions in the sequence that differ among individuals within a taxon is called the MGD of the taxon.

While the concept of an upper limit does imply that genetic distance/diversity will remain unchanged over time, it does not signify the cessation of evolution. Evolution is generally understood as a process involving mutations followed by natural selection (Darwin's theory) or neutral drift (Kimura's theory), and these processes are expected to continue indefinitely. However, while these processes would lead to increases in genetic distance/diversity during the early stages of evolution, they would not alter the optimum level of genetic distance/diversity once the upper limit is reached. At the upper limit stage, what continues to change is the turnover of alleles. In other words, while the MGD level is maintained by one set of variants/alleles at time point A, the same level of MGD would be maintained at a later time point B by a new set of variants/alleles.

The level of MGD measures the fraction of positions in a sequence that can change without adversely affecting the core physiology of the members of a taxon or the taxon itself within a group of similar organisms containing multiple taxa. These changes may be beneficial, deleterious, or neutral with regard to fitness, depending on circumstances. Positions not free to change are considered conserved and are essential for the core physiology of a taxon. Positions that are identical among the compared species are considered conserved. These unchanged positions consist of the following three types:

Positions essential for the barebone or minimal function of a gene, where any change would alter the biochemical activity of the gene in a test tube.

Positions essential for more complex species but not for simpler species. As species become more complex, an increasing number of positions in a gene become unchangeable or are involved in more complex traits. Changes in these positions might not affect the basic function of a gene or its activity in a test tube, but they could influence the long-term survival of more complex species, while having little to no impact on simpler species.

Positions essential for adaption to a specific environment. Such positions may be fixed. Environmental selection may lead to homozygosity of variants and reduce the MGD level of a taxa.

Genetics, together with epigenetics, play important roles in determining phenotypes, and if genetic diversity has no limit, it implies that phenotypic diversity also has no limit. To understand the limits of genetic diversity, one can consider whether phenotypic diversity has limits. For example, humans exhibit extensive phenotypic diversity, such as variations in skin color, hair color, and eye size, but these are all minor quantitative variations within certain boundaries. Fundamental aspects of nearly all human traits, like having two eyes or one mouth, do not change, suggesting that there are immutable sequences in the genome. The existence of unchanging phenotypes indicates that phenotypic diversity is not infinite, implying that genetic diversity must also have an upper limit. The NT assumes no upper limit on genetic diversity, suggesting that genetic differences between human individuals could reach 100%, which implies unlimited phenotypic differences – a notion that can be easily disproven.

The major building blocks for biological organisms are DNA and the architectural plans of how to use the DNA parts are the epigenetic programs. The more the number of cell types, the more the number of ways of using the same set of DNAs, and the more complex the organism (85,86,87,88,89,90,91,92). Epigenetic programs are not only inherited during mitotic cell division but are also often transmitted through the germline to the next generation (93,94,95,96,97,98,99). The epigenetic complexity of a taxon is the average number of cell types of its individual members. Taxa of higher epigenetic complexity are considered more complex. Humans can in most cases intuitively and correctly judge the complexity differences. For unicellular organisms or for species with similar number of cell types, the number of epigenetic genes may be used to infer higher complexity. So, yeasts are more complex than bacteria because yeasts have more epigenetic proteins. Yeasts have several histone acetylases and Su(var)3-9, Enhancer-of-zeste and Trithorax (SET) domain histone methyltransferases while bacteria have none.

The MGD theory contains three self-evident intuitions or axioms (74, 75, 100,101,102,103).

Axiom 1 posits that the more complex the phenotype, the greater the restriction on the choice of molecular building blocks. A complex or ordered system requires higher precision in its building parts. In biology, this means there is an inverse relationship between genetic diversity and epigenetic complexity (75).

The idea of Axiom 1 was first noted nearly 100 years ago by Fisher (30, 104), as well put by Orr (104): ‘Fisher further showed that mutations of a given size are less likely to be favorable in complex than simple organisms. The reason is intuitively clear: A random mutation of some size is more likely to disrupt (not improve) a complex than simple organism for the same reason a random change of some size is more likely to disrupt (not improve) a complex than simple machine. Changing the length of an arbitrary mechanical part by one inch, for instance, is more likely to derail the function of a microscope than a hammer’. However, the prevailing evolutionary theories have long overlooked, and in fact, contradict this notion.

Axiom 2 states that in any system allowing a limited level of random errors or noises in molecular building parts, such errors may be beneficial, deleterious, or neutral depending on the circumstances. Limited errors at an optimum level are more likely to be beneficial than deleterious because they are within tolerable limits, conferring economy in construction and the strongest possible adaptive capacity or robustness to environmental challenges. In biology, substituting ‘errors in building blocks’ with ‘genetic diversity’ yields an equivalent concept. Axiom 2 highlights the valid aspects of both Kimura's NT and Darwin's theory of natural selection.

Axiom 3. Deleterious variants or mutations harmful to a trait could be rescued by other mutations (105). Populations with greater genetic diversity can more easily rescue or tolerate harmful mutations, which would make it hard for natural selection to maintain the quality of a trait and to eliminate harmful variants. Therefore, suppressing genetic diversity is necessary for maintaining traits at a high-quality level and for removing harmful variants.

For neutral variants, genetic diversity π has the formula π = 4N_eμ, and is primarily determined by the mutation rate (μ) and the effective population size (Ne). However, MGD is driven by species complexity C and environmental factors E, as reflected in the formula: MGD = E/C. Environmental factors may include geographical and ecological pressures, competition for resources, social or behavior changes, and many other factors that influence the survival and adaptation of species. A narrow environmental selection can reduce MGD, as limited environmental factors would restrict the ability of species to diversify genetically, thus lowering the overall diversity.

The MGD theory makes the following claims:

MGD tends to be higher for simpler taxa and lower for more complex taxa (Figure 2). The reason may be that the members of more complex taxa rely on more sequence positions for their more complex physiology, which are for that reason conserved, leaving fewer positions free to change. Complex taxa also require higher precision genomes, leaving fewer positions free to change.

Macroevolution involves changes in organismal complexity and often results in an increase in complexity, which is mirrored by an increase in the precision of the building parts or a decrease in the allowed range of the standard deviations for the parts (Figure 2A). Although there are some rare instances of apparent loss of complexity, such as the loss of eyesight in nearly blind naked mole rats, these changes are relatively minor and do not detract from the overarching trend of progression toward higher complexity. Moreover, it remains unclear whether there may be compensatory increases in complexity in other aspects, leading to overall minimal net changes in complexity. Microevolution, on the other hand, is an increase in genetic diversity within the allowed standard deviation ranges without significant change in complexity (Figure 2B). It encompasses both evolution within species and evolution from one species to another. At the saturation phase, microevolution involves turnover of alleles at the equilibrium level of genetic diversity. The gradual evolution of sequences occurs at the microevolution level but cannot be extrapolated to the scale of macroevolution, as Gould and Eldredge (106) had concluded largely on the basis of the fossil record.

The positions that are conserved in simpler taxa tend to also be conserved in more complex taxa. The positions that are free to change in more complex taxa tend to also be free to change in simpler taxa.

Genetic distance among taxa and genetic diversity within a taxon is mostly at the optimum level today after a very long evolutionary time, especially so for fast-evolving sequences (107). Any level higher or lower than the optimum would be negatively selected. As genetic diversity, so long it is within the maximum level, facilitates adaptation, it would be positively selected to quickly reach the optimum level. The optimum level is equal to the maximum level and can be defined in humans by comparing the genetic diversity of a control population with those of patient populations. If the genetic diversity of the patient population of a particular disease is greater than that of the control population, while the genetic diversity of the patient population of another disease is smaller than that of the control population, one can infer that the genetic diversity level is at an optimum. The optimum concept here means a Pareto optimum or simply the best that can be achieved due to a balance between positive and negative selection at a particular time point under a specific level of epigenetic or organismal complexity (108). As time, environments, and complexity changes, the optimum level of nucleotide diversity will also change.

Genetic variants are mostly functional or under balancing selection and quasi-neutral (under both positive and negative selection) rather than neutral.

Genetic distance or molecular distance between two taxa of different complexity is not contributed equally by mutations in the two lineages but rather is mostly contributed by mutations in the simpler lineage.

Non-conservation is not non-function. Fast-changing non-conserved sequences play more important roles in adaptation to the environment than the slowly changing conserved housekeeping genes.

Lower MGD means higher homozygosity, which is however very different from the higher homozygosity due to inbreeding. Lower MGD results in higher fitness traits because there are more common alleles or good alleles becoming homozygous. In contrast, inbreeding leads to lower fitness traits (inbreeding depression) due to homozygosity in minor alleles or deleterious alleles. Inbreeding shows long runs of homozygosity (ROH) but lower MGD does not.

The origin of the first life involves a reduction in the randomness of the life-building molecules, which is fundamentally similar to the reduction in genetic diversity or randomness during the step-wise increase in complexity in macroevolution. They all involve the same complexifying force or anti-randomness force.

Using conservation as an index of function measures only one of two kinds of sequences: those essential for internal system physiology that have little to do with adaptation to external environments. To preserve the long-term integrity of the system or essential traits, such as the presence of only two eyes in humans, such sequences must remain unchanged. For living fossils to be possible, these sequences should be highly stable. On the other hand, sequences involved in adaptation to changing environments must be able to evolve quickly, as environmental changes tend to occur rapidly. For instance, flu viruses escape neutralizing antibodies every few years, and the fast-changing non-conserved sites in these viruses are crucial for their survival but not essential for their basic physiological functions.

Comparing the claims of the two competing theories, the MGD theory makes many assertions that directly oppose those of the NT (Table 2). (a) Most variants are not neutral. (b) Genetic distances or diversities are mostly at upper limit levels. (c) The infinite site model is unrealistic. (d) Most DNA sequences are not informative to phylogenetic inferences. (e) All mutations have a deleterious aspect. (f) Non-conservation means adaptive function. (g) Synonymous substitutions are also functional. (h) Genetic or molecular distance between two taxa of different complexities is not equally influenced by mutations in each lineage. (i) Increase in complexity requires suppression of genetic diversity. (j) Evolution or turnover of alleles are very fast rather than slow (46, 107). (k) Physiological selection determines the fate of many variants.

Table 2.

The MGD theory versus the NT.

Features	MGD theory	NT
Micro-evo linear stage	NT and natural selection	NT and natural selection
Micro- and Macro-evo	Different mechanisms	Same
Role of epigenetics	Yes	No
Noises suppressed	Yes	No
Neutral regions	Complexity dependent	Complexity independent
Time frames	Any	Relevant to short times
Mutation rates	Any	True for slow rates
Genetic equidistance	Maximum and linear	Linear only
MGD	Yes	No
Genetic variants	Mostly non-neutral	Mostly neutral
Recurrent mutations	Common	Rare
Quasi-neutrality	Common	Rare
Turnover of alleles	Fast and common	Slow and rare
Non-conservation	Adaptive function	No function
Physiological selection	Yes	No
Independent tests	Confirmed	Rejected
Axiom based	Yes	No

MGD, maximum genetic diversity; NT, neutral theory.

The table compares the major features of the two different theories of molecular evolution.

The GEP reinterpreted

The MGD hypothesis explains the GEP as a result of maximum genetic distance (74, 75, 100). Over a long evolutionary period, for fast-evolving DNAs, the genetic distance between species has reached the maximum level. The distance between ingroup species and a simpler outgroup taxon is primarily determined by the MGD of the simpler outgroup (Figures 2 and 3). This distance is equal to the maximum distance allowed within members of the simpler outgroup, e.g., the distance between humans and fishes equals the maximum distance between different taxa of fishes. Changes in the lineage leading to the simpler outgroup mask any changes in the lineages leading to the ingroup taxa (Figure 3).

This notion that the MGD level of a simple kind of outgroup organism determines the distance between the outgroup and the more complex clade can be illustrated by the example of cytochrome C. The maximum diversity in this protein sequence is about 70% difference within bacteria, such as between Bordetella parapertussis and Paracoccus versutus. The maximum distance between bacteria and mammals is about 65% difference, such as between Bordetella parapertussis and Pan troglodytes. Within fungi, the maximum diversity is about 40% difference, for example, between Aspergillus oryzae and Yarrowia lipolytica. The maximum distance between fungi and mammals is about 43% difference, such as between Aspergillus oryzae and Pan troglodytes.

There are, in fact, two kinds of GEP (Figure 3). For long evolutionary timescales or fast-evolving sequences, one would observe ‘maximum genetic equidistance’: different species are equidistant to a species of lower or equal complexity. The original result of Margoliash (8) is an example of maximum genetic equidistance. For short evolutionary timescales or slowly-evolving sequences, one observes ‘linear genetic equidistance’, where the molecular clock holds, and the distance is still linearly related to time. When ingroup species have similar mutation rates, they would be equidistant to a lower or equal complexity outgroup.

This explanation of the GEP by the MGD theory can be easily illustrated by a simple thought experiment. Imagine we create a yeast, a fish, and a human using identical genes for their shared homologs and let the three organisms diverge for an infinite amount of time, or about 500 million years, with each organism remaining phenotypically largely the same as today. Over this time, a gene in yeast would change significantly, up to a maximum of, say, 50%, while its homolog in fish would change up to a maximum of 30%, and its homolog in humans would change very little, say <1%. Any further changes beyond 50% would be lethal to yeast, beyond 30% would be lethal to fish, and beyond 1% would be lethal to humans.

The reason that a gene in yeast can change much more than in fish, which in turn can change more than in humans, is that a gene in humans encounters far more functional constraints than its homolog in fish or yeast. Thus, the genetic distance between yeast and humans or fish is mainly determined by the mutations in yeast. In this case, the 50% change in yeast would account for the genetic distance of 50% identity between yeast and humans or between yeast and fish, as well as 50% identity within species distance in yeast. The 30% change in fish would account for the genetic distance of 30% identity between fish and humans. In contrast, the NT would predict that both humans and fish can also, like yeast, change up to 50% or more, resulting in a genetic distance of 50% identity.

Testing the different evolutionary theories

There are two ways to test which of the two different interpretations of genetic equidistance is true, the molecular clock versus the MGD theory. The first is the overlap feature of mutations (76, 109). When aligning a protein sequence from three taxa, the conserved positions would show identical amino acids for all three taxa. The non-conserved positions can be categorized into two types. One type involves mutation in only one of the three taxa, while the other two taxa retain either identical amino acids or remain unchanged. The second type consists of positions where each taxon has a unique amino acid at the same location (red colored positions, Figure 3). This indicates that at least two taxa had gone through a unique mutation at that same position, and hence their mutations overlapped at that position. Such positions are referred to as overlap positions.

Since one mutation is enough to lead to a difference of one between two taxa as measured by identity matrix, overlapped mutations at the same position do not increase molecular distances and are hence hallmarks of mutation saturation. At maximum saturation, each free-to-change position in a taxon would have mutated, resulting in a higher fraction of overlap positions among the non-conserved positions than expected by chance. This is termed the high overlap ratio (HOR) phenomenon.

In fact, Margoliash's GEP shows a high fraction of overlap positions, which aligns with the prediction of the MGD theory and significantly exceeds what is predicted by the molecular clock and NT (Figure 3) (11, 76).

It is important to note that the HOR type of saturation is different from that seen in the ‘long branch attraction (LBA)’ in phylogenetic trees (110). In LBA, saturation means convergent mutations leading to the same amino acid residue or nucleotide among (across) multiple taxa, leading to a shortening of branch length. Although they were derived independently, these shared alleles can be misinterpreted in phylogenetic analyses as being shared due to common ancestry. However, for the HOR phenomenon, independent mutations at the same site among different taxa would generally lead to different taxa having different amino acids rather than the same, since the probability of an independent mutation changing to the same amino acid is about 20 times lower than that of mutating to a different amino acid. Thus, HOR is expected to be more common in nature compared to the saturation in the case of LBA. These two types of saturation are essentially just different aspects of the same phenomenon: one more commonplace and manifesting as HOR, while the other is less common and manifests as LBA.

The other way to test the molecular clock versus the MGD theory is the genetic non-equidistance result despite equidistance in time. The MGD theory predicts that maximum equidistance would only result when the outgroup is less complex than the sister species. If the outgroup is more complex, then its maximum distance with the ingroup sister species would be determined by the MGD of each of the ingroup species, which may not be the same for all the ingroup species.

However, the molecular clock would predict genetic equidistance to the outgroup regardless if the outgroup taxon is more or less complex. For example, human is the outgroup to the Sauropsida clade containing snake and bird. The molecular clock predicts that humans should be equidistant to snakes and birds in protein sequence. However, the MGD theory predicts that birds should be closer to humans than snakes should because birds should be more complex than snakes.

The actual data in fact validated the MGD theory (5, 111, 112). Genetic non-equidistance to humans despite equidistance in time has also been found for sister species within the teleost fish clade, the arthropod phylum, the Porifera phylum, and the fungi kingdom. In all five cases where the difference in complexity of the ingroup sister species can be intuitively inferred (octopus vs. cockle, terebratulina vs. lingula, bird vs. snake, dragonfly vs. louse, and smut vs. yeast), the more complex species always shows greater sequence similarity to humans in fast-evolving genes, fully conforming to the predictions of the MGD theory but not that of the molecular clock (111). It has recently been shown that octopus indeed has the lowest heterozygosity level among mollusks, and that the highest heterozygosity level among mollusks is found for the least complex solenogastres (113). Also, by whole genome sequencing analysis, two new world monkeys are found to be non-equidistant in nucleotide sequence to humans with the most primitive monkey marmoset to be more distant to humans than the owl monkey (114).

For species within a clade that are of similar degree of complexity, it is also possible to observe genetic non-equidistance to a more complex outgroup. The sequence of a more complex taxon and any of its closely related sequences should be well tolerated by the lower clade (but not vice versa) and so may exist as a part of the normal variation of the lower clade. Therefore, within a lower clade, certain taxon may by chance happen to have a genome more similar to the higher outgroup than another taxon.

There are also many other independent tests that have confirmed the MGD theory:

Latest evidence for Axiom 1 includes the following. Lower species have more common variants that are however harmful or non-tolerated in humans (115, 116). The highest MGD level is found for the Human Immunodeficiency Virus (HIV) at 40% bp/bp (117), while the lowest is found in humans at 0.1% bp/bp, which is half that of chimpanzees (118), one-third of that in macaques (129), and one-eighth of that in lemurs (120). The malaria parasite endemic to Asian macaque monkeys, Plasmodium knowlesi, exhibits higher genetic diversity and, as demonstrated by supersaturation mutagenesis experiments, is more tolerant of mutations than Plasmodium falciparum, which is endemic in humans (121). The so-called ‘Random Genome Project’ verifies this axiom: mammalian cells repress random DNA that yeast transcribes, confirming the expectation that complex organisms are less tolerant of random errors or noises than simple ones do (122).

The overall level of individual genetic diversity is inversely related to educational attainment or cognitive function (123,124,125,126,127), with most genetic variations being harmful to the human brain (128). Consistently, the more socially tolerant crab eating macaque has lower genetic diversity than the less tolerant rhesus macaque (129, 130), and the more tolerant bonobo has lower genetic diversity than the less tolerant chimpanzee (131, 132). It is possible that the high genetic diversity observed in the San people of Southern Africa could be a factor contributing to their relatively under developed culture, while also providing them with stronger immunity (133,134,135,136), and may have little to do with longer evolutionary time or being human ancestors. Also, animal model studies have found associations between genetic diversity and various complex traits, including cognitive function and immune function (47).

The genetic diversity level of patient groups with many complex diseases is higher than that of normal control groups, indicating that the genetic diversity level of normal individuals is at an upper limit state (48, 137, 138). Many human traits are subject to stabilizing selection, implying there is an optimal degree for a quantitative trait (139).

Around 40% of human single nucleotide polymorphisms (SNPs) in database of SNPs (dbSNP) carry recurrent mutations or more than two alleles, directly invalidating the infinite site assumption of the NT framework (140).

Functional genomics studies have found that most of the genome is functional (141,142,143). For example, short tandem repeats (STRs), which account for 5% of the human genome, almost all have regulatory functions in gene expression (141). Also, large fragments of random foreign DNA, differing in Guanine (G) and Cytosine (C) (GC) content, introduced into yeast either mingle with other active yeast chromosomes or segregate into an inactive compartment, a process predicted by the underlying DNA sequence. If most of the DNA were neutral, this would be unlikely to occur (144). During the aging process, mutations correlate with alterations in DNA methylation, directly supporting the inverse relationship between genetic diversity/mutation and epigenetic complexity/order as posited by the MGD theory (145).

Quantifying constraints in the human mitochondrial genome has shown that most variants in mitochondrial DNA (mtDNA), including those in the non-coding region, are subject to selection (146). This challenges numerous phylogenetic trees constructed using mtDNA genes that assume the molecular clock and the NT.

Bacterial robustness to mutations is not affected by evolutionary generations, which does not align with popular theoretical expectations. The robustness of a species is primarily determined by its blueprint: the simpler the construction of a species, the stronger its robustness and the greater its tolerance to mutations (147).

The existence of conserved sequences that are related to complexity but not to enzyme function per se. The length of such sequences increases with complexity. This type of sequences, termed complexity associated protein sectors, has been directly shown to exist (148).

Genome as a whole is a building part, and this part can vary in size is just like any part should have an allowable standard deviation. Thus, complex species tend to have less variation in genome size relative to lower species (39). Protozoa, a group of unicellular organisms, exhibit a genome size variation where one species' genome is approximately 20,000 times larger than that of another, whereas in mammals, the variation is less than 10-fold. The truly significant question regarding genome size should be why protozoa have such a vast variation in genome size, rather than why the onion has such a large genome. Notably, Africans, well-known to have greater genetic diversity, also consistently have higher copy number of duplicated gene families than non-Africans (149), implying slightly greater genome size variation. A bigger genome has been found to provide environment-dependent growth benefits in grasses (150).

A number of mechanisms exist that suppress the transmission of new mutations to the next generation prior to birth, including the abortion of mutant embryos, the DNA repair system in germline reproductive cells being more efficient than somatic cells (151), and older mothers having higher capacities to prevent mutant mtDNA from getting transmitted to their daughters (152). This is physiological selection as opposed to natural selection (of adult individuals by environments). The important question implied here is, how can the Darwinian mechanism, which needs as many as possible mutant adult individuals in order for natural selection to work, lead to mechanisms that would reduce the number of such individuals or undermine itself? A related question is, if genetic diversity provides the raw material for Darwinian natural selection and lower genetic diversity is associated with each advance in species complexity, how can natural selection undermine itself during evolution?

Trait diversity is commonly believed to be shaped by environmental diversity, according to the prevailing view. However, this perspective overlooks the idea proposed by the MGD theory, which posits that genetic diversity – and, by extension, trait diversity – is primarily influenced by species complexity. Recent findings have shown that the habitats within Africa seem less diverse or complex than outside (153). Eurasia, where humans have lived for ~2 million years, has a much more diverse environments than Africa. So, humans living in Eurasia have experienced a broader, and hence more complex, range of environments than those who have stayed in Africa as well as non-human primates such as African apes that have lived in Africa. So, Eurasians should be expected to have higher genetic diversity than bonobos or even Africans according popular views or the finding of reduction in environmental diversity leading to a decrease in genetic diversity (154). But the reality is the exact opposite. So, the correlation between environmental complexity and genetic diversity is contradicted by the low genetic diversity of Eurasians compared to bonobos or Africans. Also, recent study shows that drylands act as a global reservoir of plant phenotypic diversity and challenge the prevailing view that harsh environmental conditions reduce plant trait diversity (155). These studies are in line with the MGD theory that trait diversity or genetic diversity is mostly determined by species complexity.

What does the universal molecular clock really mean?

The molecular clock hypothesis contains both valid and flawed elements. During the linear phase, where genetic distances are driven by truly neutral mutations, the molecular clock holds and can explain the phenomenon of linear genetic equidistance. However, this concept clearly cannot be applied to the phenomenon of maximum genetic distance.

The universal molecular clock was (mis)inspired by the phenomenon of maximum genetic distance and reflects the constant rate of complexity increases. The first molecular evidence for the constant, albeit discontinuous, advances in complexity is, in fact, the phenomenon of maximum genetic equidistance. Although this progression towards higher complexity is non-linear, it generally appears roughly linear over long periods, leading to the superficial appearance of a universal molecular clock. Defying the Darwinian gradualistic worldview, there have always been researchers, since the time of Darwin, who have appreciated the discontinuous nature of macro-evolutionary changes (14, 80, 106, 156,157,158,159). Underlying the direction towards higher complexity and the constant rate of increase in complexity may be the mysterious ‘complexifying force’ first proposed by Lamarck (1).

Dissenting perspectives on the MGD theory

Since the publication of the MGD theory in 2008, there has been a lack of formal dissenting viewpoints from experts in the field, with the exception of a recent preprint by Zhang (160), which is notably dismissive. We have written a detailed rebuttal of Zhang's paper (161). In brief, Zhang attempts to deny the validity of the GEP while defending the molecular clock and NT, a stance that is contradictory, as discrediting the GEP would undermine the molecular clock itself. Furthermore, he does not challenge the upper limit concept of the MGD theory. As a result, his paper is self-defeating and lacks the rigorous scholarship necessary for a meaningful contribution to the scientific discourse.

Zhang also raised a common criticism of the MGD theory, which argues that the saturation issue it raises is not new. However, these critics fail to recognize the distinction between partial saturation and the upper limit concept of the MGD theory. The field has primarily focused on partial saturation, where genetic distance between species still increases with time, albeit at a non-linear rate compared to the early stages of species divergence. The methods used in the field to address partial saturation cannot be applied to full saturation at the upper limit level. This upper limit could have been reached a long time ago, making it impossible to model the number of mutations that have occurred in the past.

Conclusion

The GEP is the most astonishing finding in molecular evolution, so remarkable that it was misunderstood for many decades and directly inspired two distinct theoretical frameworks of molecular evolution (Figure 4). To fully grasp evolution, one must first understand molecular evolution, and to understand molecular evolution, the GEP is essential. This phenomenon was initially interpreted by proposing the molecular clock hypothesis and in turn the NT, but has since been re-interpreted by the MGD theory (Table 2). Direct tests of the predictions from these two competing theoretical frameworks have raised doubts about the validity of the molecular clock and NT, while providing support for the MGD theory. Currently, genetic distances or diversities appear to be largely at an optimal equilibrium. Although turnover of alleles due to mutations, natural selection, and neutral drift will continue indefinitely, such standard evolutionary processes will not affect the MGD level unless there is an increase in species complexity or a dramatic change in environmental selection. This evolving understanding may offer fresh insights into various evolutionary questions such as the origin of modern humans (162, 163).

Future research may further support the MGD theory in several ways:

Large-scale functional genomics studies may show that almost all bases have functions.

Studies on more populations or species may further demonstrate that genetic diversity is at its maximum limit. For individuals within the same ethnicity, compare the genetic diversity of the population of children's versus that of their parents, do the parents with higher than average genetic diversity tend to produce children with lower genetic diversity as expected by the MGD theory? Do the parents with lower than average genetic diversity tend to produce children with higher genetic diversity?

Research on more species and more genes may further establish the GEP.

By confirming phylogenetic trees based on the MGD theory the theory itself will also be validated.

Better understanding of cognitive abilities may further establish the inverse relationship between MGD and cognitive functions.

More studies may lead to a better understanding of the collective effect of numerous genetic variants in determining complex traits.

The understanding of fundamental evolutionary theories will inevitably evolve into a more refined and comprehensive framework. This will greatly enhance humanity's ability to understand our shared destiny, both in the present and future, and contribute to the creation of a global community with a shared future.

Sprache:: Englisch

Zeitrahmen der Veröffentlichung:: 1 Hefte pro Jahr
Fachgebiete der Zeitschrift:: Chemie, Biochemie, Biologie, Evolutionsbiologie, Philosophie, Philosophiegeschichte, Philosophiegeschichte, andere, Physik, Astronomie und Astrophysik

Zeitschrift RSS Feed

The Maximum Genetic Diversity Theory: A Comprehensive Framework for Understanding Evolutionary Processes

Shi Huang

Artikel-Kategorie: Review

Online veröffentlicht: 24. Apr. 2025

Seitenbereich: 41 - 61

DOI: https://doi.org/10.2478/biocosmos-2025-0009

Schlüsselwörtergenetic diversity, genetic equidistance phenomenon (GEP), molecular clock, neutral theory, maximum genetic diversity (MGD) theory

© 2025 Shi Huang, published by Sciendo

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Schlüsselwörter
genetic diversity, genetic equidistance phenomenon (GEP), molecular clock, neutral theory, maximum genetic diversity (MGD) theory