Evolution and Diversity of Glycomolecules from Unicellular Organisms to Humans
Artikel-Kategorie: Review
Online veröffentlicht: 11. Apr. 2024
Seitenbereich: 1 - 35
DOI: https://doi.org/10.2478/biocosmos-2024-0001
Schlüsselwörter
© 2024 Richard D. Cummings, published by Sciendo
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
All cells of all organisms express a dazzling variety of glycomolecules on their membranes and in their secretions (1, 2). In addition, the two monosaccharides ribose and deoxyribose, comprise about one-third of all nucleic acids in ribonucleic acid (RNA) and deoxyribonucleic acid (DNA), respectively. Glycomolecules other than nucleic acids include glycoproteins, glycolipids, oligosaccharides, polysaccharides, glycosylphosphatidylinositol (GPI)-lipid anchored glycoproteins, and the newly discovered glycosylated RNA (glycoRNA) (3, 4). These classifications are somewhat deceiving, as within any class there are many subclasses and the diversity is extraordinary. The term

Here is depicted a few examples of the diversity of glycans found in glycomolecules of different organisms – mycobacteria, yeast, plants, nematodes, and humans. More details about these structures and discussions about their biosynthesis and function can be found in the following references (5,6,7,8,9,10,11,12). For more information about the symbols above used to designate monosaccharides see
This article highlights several concepts regarding the diversity and evolution of glycomolecules, but the complete scope of the subject would require a much larger treatise. References are provided to help guide readers to additional points of interest and further details regarding the subject, and are not meant to be comprehensive.
While the diversity of nucleic acids and proteins is vast, they arise from a relatively small number of building blocks. Moreover, the macromolecules themselves are made in a linear form arising from a template, either DNA or RNA, directing their specific sequences. However, glycans are created without templates, and collectively in nature they are comprised of hundreds of different monosaccharides, and perhaps even thousands of them (13, 14). A single organism may have a set of monosaccharides and derivatives that can differ greatly from that in another organism. Thus, because the building blocks can vary, and without a template guiding the synthesis, there appears to be no end to the complexity of glycans generated in nature. In contrast to nucleic acids and proteins, glycans may have multiple branches and each branch can convey unique determinants. Using just six different monosaccharides occurring within a hypothetical hexasaccharide there are 1.05 × 1012 possible oligosaccharide isomeric products, both branched and linear, including their α and β anomers (15). By contrast, even using all 20 amino acids, there are only 6.4 × 107 possible linear hexapeptide products.
The term
The sizes of glycomolecules in nature vary enormously; they may be tiny or massive, regardless of the organism making them. Amazingly, even the tiniest infectious and transmissible pathogen, the prion, is a single glycoprotein with multiple glycans enzymatically attached by infected cells; importantly, its glycosylation is essential for its neurotoxicity (18). Finally, transfer RNA (tRNA) (19) and non-coding RNAs can be glycosylated (3), and even DNA itself can be glycosylated, as observed for the occurrence of glucosyl-5-hydroxymethylcytosines in phage DNA, which is added by phage-encoded α- and β-glucosyltransferases (20).
In terms of size, consider the common linear polysaccharide hyaluronic acid (HA), which is synthesized by both microbes and animals. In animals HA is just one among the many glycosaminoglycans (GAGs), e.g., chondroitin sulfate (CS) and heparan sulfate (HS), that evolved early in evolution (21,22,23,24). Microbial and human HA are identical in structure, although the lengths may be different; a single HA molecule can have ~25,000 repeat units of the disaccharide
Microbes can also generate enormous networks of covalently-linked glycomolecules. For example, the cross-linked network of peptidoglycan with repeats of
Thus, glycan diversity is a feature of all organisms. For any particular organism the assortment of glycans is referred to as the
From these general considerations a simple question arises. What has driven the expansion and diversity of the building blocks (monosaccharides) and the resultant glycomolecules in nature? One likely explanation is that glycan diversity is driven by the same forces, e.g., mutation and natural selection, that drive evolution of other types of macromolecules, such as proteins (41). There have been many excellent reviews about aspects of glycomolecular evolution, often focusing on specific classes of glycomolecules, certain types of glycan modifications, evolution of genes for enzymes, eukaryotic evolution, and other issues [some of them are cited here (6, 42,43,44,45,46,47)]. There are too many genetic pathways for glycomolecule generation to consider them all here, and thus, rather than go into all the interesting genetic aspects of glyco-evolution, herein is a complementary and slightly different approach with a focus more on the nature and diversity of glycomolecules in all forms of life and evolutionary challenges and opportunities associated with their generation.
The concept of evolution driving diversity of glycomolecules implies that there are pressures on organisms to evolve their glycomes in ways that enhance the possibility of their survival. The genome itself presents an interesting conundrum in this regard, which may have promoted the evolution of glycans very early. Consider the limitations inherent in the genetic code. The DNA genetic code is comprised of codons of three nitrogenous bases selected among the four – adenine (A), cytosine (C), guanine (G) and thymine (T). But this three bases codon, when transcribed into RNA with its four nitrogenous bases – adenine, cytosine, uracil, and guanine – can only accommodate 20 amino acids (plus selenocysteine and pyrrolysine), which universally make up all template-derived, complex proteins. Considering, however, all the vast roles that proteins play in biology, this relatively small diversity in side chains of amino acids limits the information content available to proteins.
One way of envisioning this is to imagine a polypeptide as a rope, linear yet very flexible, but restricted in the diversity of shapes and spatial organization it can assume. However, now imagine a rope carrying a virtually unlimited number of attached modifications that represent glycosylation, each with a potentially unique structure and composition at various
Because of the differences between polypeptides and glycans, they evolved very different sets of functions. For example, some of the fundamental roles of proteins are to bind other molecules and to function in catalysis. Polypeptides are readily adaptable for such a function, both conformationally and structurally, to promote
The modification of a protein with another molecule is termed a post-translational modification (PTM), although some can occur co-translationally (52). As the diversity and availability of monosaccharides in nature evolved, PTM-based mechanisms evolved to overcome limitations inherent in the structures and bioactivities of polypeptides (and lipids). To fully appreciate the concept of protein glycosylation as a PTM, consider first the phosphorylation of protein as a simple PTM. This is among the simplest and most common PTM of animal proteins, but it typically occurs inside the cell. It is a small, reversible modification, added and removed by enzymes (kinases and phosphatase), and regulates protein interactions. Other PTMs of proteins include acetylation, methylation, hydroxylation, and ubiquitination, among >500 known PTMs (52, 53). But all such modifications are either small in size, or specifically fixed in size, e.g., ubiquitin or SUMO proteins, and typically reversible. However, in regard to secreted and membrane glycoproteins, glycosylation is an essentially irreversible modification and the sizes can range enormously.
Of course, there are also processes of intracellular glycosylation, and unlike most PTMs involving glycosylation, some of these are reversible. An essential type of intracellular protein glycosylation in most metazoans (multicellular organisms) and some pathogenic fungi (54), and which can reside on cytoplasmic, nuclear, and mitochondrial proteins, is the addition of the monosaccharide GlcNAc in linkage to Ser/Thr residues (
Intracellular glycosylation of proteins is actually an ancient theme of glyco-evolution. The most famous and universal type of intracellular glycosylation in prokaryotes and animals involves the production of glycogen (63, 64), and in plants and algae the production of starch (65). In animal cells glycogen may originate from the protein glycogenin, which has inherent glucosyltransferase activity and auto-glucosylates a single tyrosine residue of itself using UDP-Glc to generate
As mentioned above, the addition of one or more monosaccharides to an amino acid allows the sizes to vary from a single monosaccharide to its elongation by addition of other monosaccharides up to many hundreds, far beyond any other type of PTM in size and diversity. This breakthrough in evolution allows even the tiniest of the amino acids, such as serine or threonine, to become giants, as seen for the

Modifications of proteins with carbohydrates can increase both the complexity and size, as illustrated here for simple sugar modifications of the amino acid serine, as often seen in
While glycomolecules are ubiquitous in nature, until a few years ago it was unclear as to whether organisms in all domains of life,
A common biosynthetic concept for the elaboration of glycoproteins shared by bacteria and all other organisms regards the synthesis of lipid-linked glycans, broadly classified as glycolipids; they serve both as mature products made by cells and as precursors for generating other glycomolecules. A common type of protein glycosylation requires a lipid-linked sugar precursor. Such a
Cytoplasmic glycosylation of proteins also occurs in eukaryotes, as for the addition of
Monosaccharide evolution likely coincided with the formation of ribose and deoxyribose as these were incorporated into macromolecules essential for reproduction. Ribose is easily produced chemically by simple formose reactions involving formaldehyde and water, and proceed through glycolaldehyde and the formation of aldoses and ketoses (107, 108) (Figure 3). Glyoxylate and C6 aldonates, which can transform to ribose, are also candidates as evolutionary precursors of sugars (109, 110). Multiple lines of evidence indicate the primitive nature of formation of sugars in the prebiotic world (109, 111,112,113). The Murchison meteorite contained both aldonic acids and ribose (114, 115). Glycolaldehyde, the smallest possible molecule with both an aldehyde and an –OH group, has been detected in interstellar space and could be a source of pre-RNA intermediates (116). It is also interesting that sugar or alcohol alternatives to ribose and deoxyribose were probably available for inclusion in nucleic acids, e.g., threose nucleic acid or glycerol nucleic acid. However, neither their structural features, nor those of any other monosaccharide, competed effectively with ribose and deoxyribose for the formation of RNA and DNA, respectively (107, 117,118,119). Deoxyribose within DNA might provide greater flexibility and a higher stability to alkaline conditions compared to RNA, among other differences (120). Early in evolution, it is likely that mechanisms for the synthesis of the nitrogenous bases, select amino acids, lipids, and monosaccharides were established and conditions were such that these building blocks were at least semi-stable.

Examples of formose reactions leading to the generation of
The generation of additional monosaccharides,
In terms of carbohydrate stereochemistry, the dominance of the stereoisomeric
Furthermore, a monosaccharide such as
Monosaccharides linked together in glycomolecules can generate linear or branched structures, there may be different stereoisomers, and they may occur in different anomeric linkages, either α or β, thus permitting a jigsaw puzzle of structural variety (Figure 1). Consider, that within starch and glycogen
The ubiquity of sucrose and its unusual structure raises some interesting questions about its origin and dominance, as many other potential disaccharides could supply a similar about of energy. Sucrose is a
Thus, while the glyco components of glycomolecules may vary in linkage, anomericity, monosaccharide units, branching, they may also be modified to become even more complex, as they may contain aglycone components, e.g., methylation, acetylation, phosphorylation, sulfation, phosphorylcholination, etc. Each glycan linkage and modification is catalyzed by discreet enzymes, but even a relatively few enzymes, acting combinatorially, but in an orchestrated fashion, can create an amazing variety of novel molecular structures. The types of structures shown in Figure 1 illustrate the terrific diversity of structures possible by linking monosaccharides to each other in such ways and further modifying the sugars. In that sense, if glycosylation is thought of as a PTM, the further modification of glycans themselves may be a post-PTM. Thus, glycomolecules present an imposing ensemble of structural diversity across all forms of life. Glycosylation of amino acids within proteins and the generation of oligo- and polysaccharides are ancient processes, and all organisms have multiple glycosylation pathways (46).
The issue of sucrose raises another evolutionary consideration, which is key to the development of glycomolecules, and that is photosynthesis and the harvesting of energy to generate sugars (140, 141). This singular outcome, which is thought to have arisen in microorganisms, combines carbon dioxide and the 5-carbon sugar ribulose-1,5,-bisphosphate, to generate two molecules of
Photosynthesis evolved 3.2–3.5 billion years ago soon after life itself came into existence (143). It is utilized by cyanobacteria (blue-green algae), green algae, and plants as the most efficient way to trap solar energy and provide energy for the cell. In eukaryotes photosynthesis probably arose through endosymbiosis of a cyanobacterial-like organism (144). Amazingly, the smallest photosynthetic organism known,
Given their incredible structural diversity, it stands to reason that glycans have a tremendous number of functions indispensable for life (50), and some have been discussed above. Some of the broad categories include the following. (i) Glycomolecules are structural components, essential for survival of a cell or organism. Examples of such functions are seen for RNA and DNA, and in polysaccharides, such as cellulose and chitin, and extracellular matrix (ECM) components, such as the polysaccharide HA and the glycoprotein collagen. Microbes that have peptidoglycans in their cell walls require them for osmotic integrity and environmental responses, and microbial ecology. Glycolipids overall are essential in biological membranes for membrane integrity and membrane functions and biological topology. Similarly, GPI-anchored glycoproteins expressed in eukaryotes and likely in Archaea, but not bacteria, are critical for membrane organization and survival (146). (ii) Glycans in glycoproteins are commonly found to be essential to one or more features, such as protein structure, oligomerization, trafficking, turnover, resistance of proteins to proteases, free radical attack, etc., in what might be called
Glycolipids are important structural components of cellular membranes (156), but also important for sugar metabolism and glycomolecule biosynthesis. All organisms have within their membranes one or more of different types of glycolipids, which include those derived from fatty acids, glycerol, sphingolipids, phospholipids, polyisoprenols, and steroids, all of which can be glycosylated, e.g., glucosylated cholesterol (157). In evolution, no specific glycolipid is conserved in all organisms, but each organism has essential glycolipids, and their classes are often unique to certain organisms. For example, bacteria have lipopolysaccharides (LPS), animals have glycosphingolipids, such gangliosides (158); plants and algae lack these, but have galactolipids such as mono- and digalactosyldiacylglycerol (MGDG, DGDG) and the sulfolipid sulfoquinovosyldiacylglycerol (SQDG), not found in other organisms (159). Amazingly, MGDG and DGDG are the major glycolipids in the thylakoid membranes of all photosynthetic organisms. All such glycolipids have important biological functions, too vast to consider in detail here. Glucosylceramide is one of the simplest glycolipids and is expressed by fungi, plants, and animals; it is essential for normal development in all such organisms, as shown in genetic studies in
The polyisoprenoids represent a unique class of lipids for glycomolecule generation. In animals they are represented by dolichol, a polyisoprenoid of C95-C125, i.e., carbons in length (163, 165,166,167,168,169); the bacterial counterparts are the C55 undecaprenoids, e.g., bactoprenol (169). These are present in all organisms and serve as lipid-linked precursors that may contain monosaccharides and oligosaccharides essential for biosynthesis and sugar addition to other molecules. The polyisoprenoids are derived by condensation of the universal 5-carbon precursors of all isoprenoids, isopentenyl pyrophosphate (IPP) and dimethylallyl pyrophosphate (DMAPP), as well as steroids such as cholesterol (170).
It is thought that a momentous step in evolution was the formation of membranes to encapsulate molecular activities associated with life (171, 172). But compartmentalization of metabolic activities also created the problem of topology (Figure 4). Many glycomolecules are present on the outer surface or membranes of cells, which has been termed the glycocalyx (173), but the sugar precursors to generate them arise mainly in the cytoplasm. Sugars are hydrophilic and averse to the hydrophobic membrane environment and do not diffuse through lipid membranes. The polyisoprenoids function as reusable precursors that allow the protein-controlled

Cells are bounded by membranes and glycomolecules are typically on the outer membrane of cells or secreted from the cells; however, the precursors to such glycomolecules are formed in the cytoplasm. One approach found in all organisms to overcome this topological barrier is to generate unique types of lipid-linked sugars. Within the cytoplasmic side of a membrane, cells can generate a lipid-linked sugar, and it can then be flipped to the opposite side (external or lumenal). In the two examples, taken from animal cells, but common to eukaryotes, glucosylceramide is synthesized on the cytoplasmic side of the Golgi apparatus using UDP-glucose as a donor, and the product is then flipped to the lumen of the Golgi by protein-assisted pathways. Similarly, for
Such biosynthetic recycling, however, is not the case for glycosphingolipid biosynthesis using ceramide; the common precursor glucosyl-ceramide, is generated from UDP-Glc and ceramide on the cytoplasmic side of the Golgi apparatus and is then flipped into the Golgi lumen (177,178,179), and within the Golgi apparatus it can be extended to generate many other glycolipids (Figure 4). Thus, the lipid is not immediately recycled, but of course it may be
Other pathways also evolved in eukaryotes to synthesize glycomolecules within compartments (organelles) and exploit the topological barrier. Nucleotide sugar transporters allow the movement of nucleotide sugar precursors made in the cytoplasm into lumenal compartments, e.g., ER and Golgi apparatus, through specific transporters. [An exception of this rule in eukaryotes is the nucleotide sugar CMP-sialic acid, which is generated in the cell nucleus (182).] Within those compartments glycomolecule biosynthesis can proceed in a more regulated fashion and the nucleotide by-products are retro-transported to the cytoplasm for recycling (183). The newly synthesized glycomolecules are either soluble or membrane bound within the compartment, yet topologically outside of the cytoplasm, and may reach the outside through membrane fusion events during secretion.
A completely different pathway arose, however, for the three giant polysaccharides found in nature – cellulose, chitin, and HA. Interestingly, all of them have either Glc or the Glc derivatives GlcA or GlcNAc. For HA synthesis, as shown in Figure 4, hyaluronic acid synthase (HAS) in microbes and animals is a transmembrane enzyme that extrudes HA co-synthetically to the extracellular space; within its single polypeptide the enzyme has two catalytic activities, that can alternately use nucleotide precursors to generate the repeating HA disaccharide GlcNAc-GlcA (22, 184). Chitin and cellulose biosynthesis also proceeds through such transmembrane channel enzymes at the cellular membrane (184, 185). As for HA, the glycans have lengths that greatly exceed the dimensions of the cell synthesizing them, but the sizes are somewhat indeterminant. These processes contrast with those used for linear microbial LPS and capsular polysaccharide (CP) synthesis, where they are generated at the membrane coupled with synthesis and typically use lipid-linked sugar donors (186, 187); however, their exact lengths can also be indeterminant (Figure 4). It interesting that polysaccharides are the only polymers in nature that can have thousands of repeats of a single building block, e.g., glucose in cellulose or starch. No such giant counterparts exists for proteins or nucleic acid comprised of a single repeating building block, although some polypeptides, relatively short by comparison to the giant polysaccharides, can be generated from a single amino acid, e.g., epsilon-poly-
The essential nature of protein modifications with sugars, as discussed above regarding microbial glycoproteins, also includes the common
Many viruses contain glycoproteins and other glycomolecules, such as capsid or envelope glycoproteins. Glycoproteins are present in small capsid viruses, such as the adenoviruses, and in large enveloped viruses, such as HIV-1 (196, 197). For many viruses, their glycomolecules are synthesized by the infected host using virus-encoded proteins and host enzymes. The genomes of animal viruses lack encoded genes or machinery to dictate the glycosylation of viral molecules, but instead completely depend on the host cells they infect to add needed glycans to their viral glycoproteins (197, 198), in what may be classified as host-dependent viral glycosylation. Well studied examples of such viruses that infect animals include HIV-1, influenza viruses, and of course, coronaviruses, such as the notorious SARS-CoV-2. Envelope viruses, as they bud from cellular membranes, may also acquire glycolipids and other host glycomolecules present in membranes (199). The glycomolecules in viruses are often key to their interactions with host cells or interactions and evasion of the human immune system (197, 200, 201). It is interesting to speculate that animal viruses, which can infect animals that have immune systems, were selected to specifically not be able to glycosylate their own proteins, but use the glycosylation systems in the host cells. Such acquired host glycans, therefore, might be less likely to provoke immune responses and block virus replication, acting as a
There are a few animal viruses whose genomes encode glycosyltransferase enzymes that can modify glycans. The
By contrast, the genomes of many viruses that infect organisms other than animals encode enzymes for producing their own varieties of glycomolecules. The genomes of bacteriophage, which are among the smallest viruses, encode a variety of glycosyltransferase and glyco-modifying genes active on their own molecules, including their DNA (212). But other viruses also encode glycogenes, e.g., glycosyltransferases, capable of modifying their own proteins and host proteins. The genomic sequencing of giant viruses revealed extensive numbers of such glycogenes (212,213,214,215). Giant viruses are ubiquitous in all fresh water, and infect freshwater eukaryotic algae; they are known as nucleocytoplasmic large DNA viruses (NCLDVs) (216). The chlorovirus
Many studies have documented the tremendous differences in glycomolecules between different unicellular organisms, and certainly between bacteria and archaea (9, 98, 221,222,223). Bacteria lack a nucleus and are bounded by a single membrane (and outer membrane in Gram negative) in which is embedded glycolipids, e.g., LPS in Gram negative and lipoteichoic acids (LTAs) in Gram positive. As noted above, glycoproteins are also produced by all types of bacteria, Gram negative and positive. In regard to polysaccharides, bacteria are amazing in their ability to generate diversity from a common theme. LPS, which is essential for bacterial viability, is densely expressed on the surface and is a linear polysaccharide; depending on the organism, it may have repeating units of di- to nonasaccharides (two to nine), and there may be several million copies of LPS expressed on the surface of a single bacterium (224, 225). There may also be non-stoichiometric modifications of the LPS, which are partly regulated by small non-coding sRNAs (226). But the genetics and the genes encoding the enzymes responsible for such linear LPS molecules for example, are remarkably few, and each independently adds a sugar from a precursor intermediate. Thus, each microbial strain may synthesize a dominant but unique LPS or a CP of varying lengths, leading to their recognition as serotypes, based on antibodies that can distinguish the antigenic glycan components (227). The strain differences, as for
Archaea lack both LPS and LTA, common components of Gram negative and Gram positive bacteria, respectively (223, 232). Archaeal glycans are much more complicated in any individual organism compared to bacteria; archaea also lack conventional murein or peptidoglycan, but instead make a tremendous variety of glycomolecules related to peptidoglycan in their cell wall. Archaea can also synthesize glycoproteins, and a major feature of the archaeal cell surface is the S-layer (233). They also generate glycoproteins containing
There are many pathways of glycomolecule biosynthesis that plants share with all eukaryotes, but there are some that are specific to plants and even a plant species (234, 235). For example, the initial process of modifying Asn residues in glycoproteins in plants is essentially similar to the processes in all eukaryotes, i.e., it utilizes a dolichol-linked glycan precursor to co-translationally add sugars to proteins entering the ER (236). Plants also have GPI-anchored glycoproteins using a lipid-linked precursor similar to yeast and animals (237). However, the types of mature
Plants generate enormously complex polysaccharides with very unique structures not found in animals. A great example of a very unique set of polysaccharides synthetized by many plants is pectin, found in cell walls (241, 242). Pectin includes galacturonic acid-rich polysaccharides such as homogalacturonan (HG), rhamnogalacturonan I (RG-I), and the substituted galacturonans rhamnogalacturonan II (RG-II) and xylogalacturonan (XGA) (243). Its biosynthesis requires nearly 70 different transferases, such as glycosyltransferases and methyltransferases, which function in the Golgi to synthesize pectin that is then exported outside of cells (242). Pectin is comprised of 17 different monosaccharides in an amazing variety of linkages, at least 20 types (244). Plants also synthesize a dazzling variety of polysaccharides in general, such as xylans, mannans, galactans, arabinoxylans, and many others (Figure 1) (11). In addition, collectively plants make a nearly incomprehensible variety of GBPs (lectins) (245), which are commonly found in their seeds, roots, and bark, which are important in seed survival, plant protection, nitrogen assimilation, and many other pathways (246). They are considered to be a part of the innate immune system of plants, aiding in resistance to viruses, bacteria, fungi and animals. Amazingly, each species of plant appears to generate unique varieties of lectins. Because of their abundance and ability to bind many different monosaccharides and glycans, such plant lectins have become primary reagents to study glycosylation pathways in other organisms, as the lectins seem to recognize specific glycosylation patterns and each lectin is unique in some way. Plants represent the principle of stringent adaption of glycan form and structure to function for each organism competing to survive.
Thus, while some might consider a vertebrate polysaccharide such as HS to be incredibly complex and an example of the pinnacle of complexity, pectin easily rivals it. Furthermore, a unicellular organism may generate a relatively small repertoire of glycans, e.g.,
The current evidence indicates that the most important aspect of the glycome of an organism is the uniqueness of its collective glycan structures. Glycomolecules in all types of organisms may overlap in size and complexity per se, i.e. made up of similar parts, but their specific structures are generally unique. Thus, if we consider existing organisms as representing the outcome of evolution, we can see that each lineage of organisms has conserved the synthesis of certain classes of glycomolecules, e.g., polysaccharides, glycolipids, glycoproteins, etc., although their structures, scale and diversity, deviate considerably and have expanded along with the evolution of increasing genetic and organismal complexity. In addition, organisms have evolved many different classes of glycomolecules, uniquely functional for their biological niches, yet closely related organisms share general pathways for glycomolecule production.
These facts also suggest that there are key forces driving evolution for each organism leading to its generation of glycomolecules uniquely suited for its development and survival. Thus, it is not surprising that genetic mutations in biological pathways that affect glycomolecule elaboration in an organism lead to disruptions in its growth and survival. In microbes and plants this is well documented, for example, for many different glycomolecules, such as glucosyl-ceramide discussed above, for cellulose (248), and also in studies of humans with Congenital Disorders of Glycosylations (CDGs), which, depending on the gene or pathway affected, results in unique and diverse pathologies (249, 250). Horvitz’s pioneering work on GAGs in nematodes also showed that single gene mutations that alter pathways of glycosylation are associated with developmental defects and infertility (251). In addition, in animals and humans any disruption in the degradation of glycans, such as seen for lysosomal storage disorders (LSDs), invariably results in pathology (183). All such genetic evidence from unicellular organisms to metazoans unequivocally validate the vast and important roles of glycomolecules, their biosynthesis and turnover, in development and survival.
There may be multiple evolutionary pressures on organisms to evolve their glycomes. Glycomolecule diversity likely arises from biological needs, including advantages of enhanced efficiency of biological functions and generation of novel functions. A readily identifiable driver of glycan diversity regards the interactions of a host with a potential invader, e.g., a bacteriophage, virus, bacterium, parasite, or some other pathogen or even commensal organisms, e.g., the microbiome (Figure 5). A primitive example of this is the interactions of bacteria with their invaders known as bacteriophages (252, 253). The phage express GBPs which bind to glycans that are expressed on the bacterial surface. Interestingly, there is evidence that phage GBPs can also bind other glycans, perhaps even those expressed by animal hosts of microbes, as in the gastrointestinal tract mucins (254).

Evolving nature of glycan structures by the host which may be recognized by an invading organism. In this host vs invader model, the host may survive by modifying the glycans, e.g., glycan
An interesting historical observation of the interplay of host vs invader was made decades ago by Heidelberger and Avery, who reported in 1923 that the capsule of
These historical studies are important to consider in the light of evolution; bacteriophage is one of the most abundant organisms on earth, yet their existence depends on first binding to a bacterium and then replicating within it. This typically involves binding to bacterial surface molecules including cell surface polysaccharides and porins, e.g., OmpC (261,262,263). One of the most studied phages is the tailed bacteriophage T4, which has the classic appearance of a spaceship with its landing legs; these legs are long tail fibers that attach to the LPS of bacteria (264, 265). The ends of these fibers have a GBP essential for this interaction with the bacterial surface. Of course, bacteria can evolve over time to develop a different polysaccharide to which phage cannot bind, and of course, the phage may evolve new binding activities for the different polysaccharide, apparently ad infinitum (253). This is thought to be an evolutionary driving force that expanded the diversity and biological niches occupied by bacteria (253, 266).
Thus, an interesting aspect of host vs invader involves the invader expressing a GBP or lectin that may interact with host glycans in a destructive process (see glycan 1 in Figure 5). If this is pathological, there may be selective pressure for survival of cells that express a different glycan (glycan 2) that is not bound well by the original invader. The original invader may evolve to generate a GBP that can bind to the novel glycan 2, or another invader can bind to glycan 2. Such modifications may continue, leading to novel glycans and GBPs, in the process termed the
Such host vs invader interactions involving glycomolecules also likely have larger outcomes, as the evolving glycans may not only help hosts evade (or attract) invaders, but the glycans may acquire additional and important biological features independent of the invaders, and in fact, become essential components of the molecules to which they are attached. This is an interesting concept to consider in light of the divergence of glycan modifications of proteins, and the essential roles that glycans have acquired in protein functions and protein folding. There is also the interesting rapid adaptive evolution in mammals and their immune receptors that recognize glycans of diverse microbial pathogens, including viruses and fungi (269). Thus, both glycan structures and proteins that recognize glycans may rapidly evolve and adapt in the never-ending host-invader interactions.
There is also the interesting possibility of an invader presenting glycomolecules that themselves can be pathological to the host or recognized by the host immune system. For example, human and animal parasites, such as protozoans and helminths, have unique glycomes, characterized by features that may benefit the parasite, but compromise the host. Some glycans might be recognized by the host immune system; yet overall, the parasite may survive and the host lives long enough to promote parasite survival. Some human pathogenic bacteria make CPs with sialic acid, but humans express sialic acid-binding proteins or Siglecs (270). Because the Siglecs may function to send inhibitory signals upon binding ligands, bacteria may use sialic acid as a type of glycan mimicry to evade immune attack. However, many parasites, such as the helminths or worms, do not express sialic acid and instead synthesize unusual glycans that are immunogenic (271,272,273), but also capable of interacting with different glycan receptors on immune cells to induce tolerance and evade immunity, a type of glycan gimmickry (274). If the titer and specificity of such anti-carbohydrate antibodies are sufficient these can result in killing of the invader, e.g., bacteria or parasite. This is the basis of anti-carbohydrate-based vaccines, such as the Hib vaccine given to children, which induces antibodies that target the surface polysaccharide of the deadly
Another selective force for glycan diversity may be the enhancement and diversification of molecular interactions important in biological processes. For example, consider a common feature of protein-bound polysaccharides in vertebrates and invertebrates, which is the presence of a short ‘core’ structure to which is linked a longer polysaccharide, often of a repeating nature with disaccharide repeats.
This is seen in GAGs linked to soluble and membrane-bound proteoglycans (280), in the membrane bound glycoprotein α-dystroglycan (281), and in plant cell wall glycoproteins, such as the extensins and arabinogalactan proteins (AGPs) (11, 282, 283). In proteoglycans a common core containing Xyl-Ser is modified by multiple polysaccharides, e.g., CS or HS. In AGPs, which are the most highly glycosylated proteins in plants, hydroxyPro, Ser and Thr residues are modified with a galactan core and then high branched extensions are formed with galactose and arabinose, and further modified by other monosaccharides. Because the glycan component of AGP is >90% of the protein, the protein itself is considered to simply be a carrier for the glycans with no other specific function. However, for the glycoprotein α-dystroglycan it is glycosylated at multiple sites and contains a unique polysaccharide termed matriglycan (281, 284, 285). The extended glycans for both proteoglycans, plant cell wall glycoproteins, and α-dystroglycan become platforms to which multiple GBPs may recognize specific features within the glycans, thus allowing a single glycoprotein to partner with a wide range of other proteins in regulating biological activities.
While all such glycoproteins cannot be discussed here, α-dystroglycan provides an interesting example of glycan diversity associated with diverse functions. α-Dystroglycan has multiple functions in all animal tissues, and is a key component of the surface of muscle cells and many other cells for their interactions with the ECM, linking the cytoskeleton and actin via dystroglycan to the membrane and ECM. The glycan of a-dystroglycan is
A main function of the glycans on α-dystroglycan is to provide recognition sites for binding of ECM proteins such as laminin and others that contain laminin G-like domains (LG domains); these include agrin, perlecan, neurexin1-α, and many others (287). Their interactions and the complexes they create are essential for formation and stability of skeletal muscle and other cellular interactions with the ECM. The vast number of proteins that bind to glycans of α-dystroglycan, and the different features of the disorders arising from defects in glycosylation of α-dystroglycan, suggest possible evolutionary drivers for their generation. The long and varied glycans in metazoan cells provide a wide range of interactions of different affinities to regulate the complex interplay of cells with the ECM, which may differ throughout all animal tissues. Extended matriglycan may also permit multiple interactions simultaneously with different LG proteins. Interestingly, while α-dystroglycan and its core glycosylation is conserved in invertebrates, such as Drosophila, the extensive matriglycan elongated from the
Interestingly, the evolution of GAGs and matriglycan may also involve host-invader mechanisms. GAGs are recognized by a wide number of different viruses, bacteria, fungi and parasites (288). For matriglycan, it was recently shown that it is a receptor for Lassa fever virus (LFV) (289) and for the organism
Metazoans are defined as organisms with a body comprised of differentiated cells or specialized cells. It is noteworthy that the evolution of single cells or unicellular organisms to metazoans included the
This theme of using extracellular polysaccharides to enhance adhesivity is also present in metazoans cells. The sponge is the simplest multicellular organism in which glycomolecules were long ago shown to be key to their survival. There are currently ~8000 species of living sponges; they are ancient, dating back to the Precambrian age, and perhaps a similar number of species can be found in fossils (293, 294). They are in the phylum Porifera, which derives from the Latin
Spongin and its collagen cousins are the most abundant and likely most ancient glycoproteins in all metazoans. Collagens are essential for normal survival of organisms, as seen in mutations of collagen genes in humans (297). There is also conservation in evolution of the main glyco-component of spongin and all collagens, which is glucosyl-galactosyl-hydroxylysine, a pathway and glyco-modification that may also be considered among the most ancient in metazoans (298). As noted above, even giant viruses make a form of collagen, but for them the linkage sugar to hydroxylysine is glucose. Amazingly, some bacteria also synthesize collagen-like glycoproteins capable of forming triple-helical structures; their
Regarding the molecules holding sponge cells together, the classic experiments of Wilson in 1907 are noteworthy. Wilson demonstrated that two species of sponges, when their cells are dissociated and mixed, could reassociate over time, i.e., reaggregate, to regenerate the distinct sponges (302). This whole-body regeneration (WBR) in sponges is one of their amazing attributes, as some animals can only regenerate parts of their body (303). The factors important in specific sponge cell aggregation are termed aggregation factors (AFs), which are types of proteoglycans with sulfated polysaccharides attached, and which mediate the calcium-dependent polyvalent interactions between glycans, i.e., homophylic interactions, and interactions with cell surface receptors (304,305,306,307). Sponges synthesize sulfated polysaccharides, but they do not synthesize GAGs of the types found in other animals (24). In regard to AFs, their genetic evolution and diversification has been well studied, and arises through high allelism, combined with domain shuffling (308). This leads to diversification that promotes allorecognition (self-non-self) in sponges, as originally discovered by Wilson. This is also an example of carbohydrate-carbohydrate interactions, which has been proposed to be a driving force of glycomolecular functions in all organisms (309). This primitive type of allorecognition involving glycomolecules and their interactions is thus carried throughout metazoan evolution.
After the above considerations, it seems fitting to conclude with a discussion of human glycomolecules, as the themes and questions that arise in other organisms are applicable to human glycomolecules. A vast literature has documented the structures and functions of glycomolecules in mammals and human cells in terms of their involvement in basement membranes, connective tissues, cell adhesion, cell signaling, nutrition, and many other pathways (50, 147, 310,311,312,313,314,315,316). A preceding section on
Much is being learned about the
There are some modifications of mammalian glycans in other primates that are not found in human biosynthetic pathways, e.g., α-linked galactose (321) and
While the genome and transcriptome of an individual cell may be deciphered in part through the analysis of the DNA and RNA, because of the ability to amplify these molecules enzymatically, the complete glycome of a single cell is not knowable at present. An adult human may contain ~37 trillion cells, not including the microbiota (325,326,327), and there may be 200–500 defined major cell types. Thus, while the genome is shared by all these cells, the glycomes are likely to be decidedly different. Most analyses of the human glycome have focused on select biosynthetic pathways from a genetic analysis (315). A vast number of studies on glycan structures have focused on glycomolecules that are soluble. When membrane glycoproteins have been studied, such as for the large family of integrins, cadherins, and protocadherins, the number of different
While limited and not really systematically performed, these studies all indicate that the glycomes of human cells and organs are varied and reveal unique differences and some similarities, both quantitatively and qualitatively. Moreover, the glycoproteomes of cells and organs, i.e. the glycoproteins carrying the glycans, are also significantly different in all cell lines and organs examined. While important strides in this area are developing (343, 344), most studies are generally incomplete, due to the limitations of the analytical tools available, and the typical focus of most studies on the most abundant, identifiable glycans. There is also a limited diversity of standards available for glycans, and certainly those with unique modifications, e.g., sulfation or phosphorylation, which complicates the rigor of analyses. Studies also seldom compare all the glycans in all classes in one individual or human donor tissue to another. Furthermore, glycomics is often focused on glycans obtainable by enzyme release, e.g., PNGase F, but enzymes to release all types of glycans, e.g.,
There are many challenges to identifying the total human glycome, including technical and accessibility issues (147). Even with our limited knowledge, however, it is clear that there are age-, gender-, and disease-specific differences in the glycome, as seen, for example, for the serum
Variations in the human glycome may arise from the differential expression of so-called
It may be concluded that studies on human glycans reveal that many glycans have structures clearly different even from other primates, and certainly different between humans compared to those in non-primates (273, 361) (Figure 1). The results suggest that human evolution is associated with discreet changes and diversity in the glycome. However, while humans have a highly diverse set of pathways for generating glycomolecules (362), most of these are conceptually similar to those in other primates and even most higher vertebrates.
Among the hundreds of human glycosyltransferase genes are also many polymorphic alleles (363). One of the best examples of polymorphism is the most ancient and best studied genetic feature of human glycosylation, which is the ABO blood group system. It represents carbohydrate antigens on red blood cells (RBCs) and some other cells, including intestinal cells. These antigens are comprised of Gal, GalNAc, GlcNAc, and Fuc monosaccharides in unique linkages that define each antigen (364). While the ABO blood group is relatively common among primates, there are tremendous variations in expression, and there are many hundreds of polymorphic alleles involved in ABO generation. There is also evidence that diversity in the ABO system is driven by the microbiome (324), which may be a version of host-invader interactions. This is interesting, as there is growing evidence that microbial expression of ABO-like antigens led to the generation of anti-ABO antibodies in cognate individuals; these are restricted, however, to recognize the non-expressed antigens in an individual (365). In addition, an ancient class of GBPs in metazoans is galectins, which interestingly can recognize blood group antigens and kill microbes expressing them in a type of host vs invader response (366). Thus, it is possible that the microbiome of an individual may not only shape the antibody repertoire in that person, but also shape the types of glycans expressed in RBCs, and the intestinal tract and other associated cells.
Even in regard to this simple issue of the ABO-containing glycans, we know little about the ABO glycome of humans or of any particular individual regarding the exact structures, either quantitatively or qualitatively, made by that individual compared to another, despite strong evidence of multiple disease susceptibilities associated with ABO blood group status (367). Only recently have studies provided insight into the
Therefore, many questions remain about the human glycome. We can group these types of questions using the ABO system, but they can apply to any specific aspect of the human glycome. As there are antibodies to many glycan determinants in the blood group systems, such tools could be used to aid in answering these questions. What glycans express ABO determinants in the human body? Where are they located? Which glycolipids, GAGs, and glycoproteins carry these? How do they vary between people? Do they change over time and with age? How does the ABO glycome of a person impact health and disease? Such simple questions could be asked about any cell in the body and any class of glycans. Unfortunately, the availability of multiple human materials for structural analyses is limited and the technologies do not yet scale sufficiently to adequately and perhaps reliably enough to address the complete human glycome (370), at least compared to the technologies useful for studies on the human genome and proteome. For example, consider a model organism that is well studied and exploited in immunology and in all types of questions in mammalian biology, the laboratory mouse
The above considerations infer that there are many forces at play in the evolution of glycomolecules and that there are clearly major differences in glycomes in all unicellular organisms and all metazoans. Some of the major roles of glycomolecules in these ancient and ongoing processes leading to evolutionary change among organisms are summarized below.
Glycosylation and sugar production is universal and appear at the earliest stages of macromolecule evolution.
Glycosylation enhances diversity of protein modifications and functions independently of the genetic code.
Carbohydrates with their multiple –OH and –NH2 groups and their nature as aldoses and ketoses permit multiple modifications and branching to generate large spatial structures.
Glycomolecules coat the surfaces of all cells, providing a plethora of biological capabilities and potential interactions, in which changes in the glycocalyx, independently of surface protein expression per se, can lead to changes in cellular interactions with other cells and their microenvironment.
Glycosylation of surface molecules participate in host-invader dynamics and provide survival advantages to host, as glycans are recognized by GBPs, toxins, and antibodies.
Glycomolecules can function as intercellular adhesion molecules and comprise the ECM of multicellular organisms.
Acquired functions for glycans is likely associated with changes in their structures and increasing complexity.
For microbes, glycomolecules make up their biofilms and provide both protection and promote microbial ecology.
Glycans are ancient forms of nutrient storage.
Glycomolecules, such as chitin and cellulose, are essentially insoluble and provide a structural component to organisms that is both easy to generate without a template and modular, i.e. size distributions can be enormous.
The anomeric nature of monosaccharide linkages, α or β, provides a novel secondary structural feature of glycans that change the biological functions of monosaccharides.
Many monosaccharides, being already relatively oxidized, are rather stable to heat and light and even wide pH ranges, comparable to other building blocks.
Carbohydrates readily participate in H-bonds and can both promote intramolecular structure, extramolecular associations, and promote solubility of glycomolecules that might otherwise be relatively insoluble.
The compartmentalization of glycomolecules and their biosynthesis within eukaryotic cells promotes novel biosynthetic pathways and allows control over assembly, storage, and release.
Many enzymes and proteins involved in glycomolecule metabolism engage different metals as key cofactors, and potential variations in availability of biometals regulates glycomolecule generation and enzyme activities (376).
Understanding evolution is unimaginable without considering the emergence of carbohydrates and their key roles in structure and metabolism. Carbohydrates became essential components of all types of glycomolecules beyond the nucleic acids, as well as being central players in metabolism. The elaboration of glycomolecules beyond simple sugars and monosaccharides exploited the many advantages of carbohydrates over other types of building blocks, including their inherent multivalency due to multiple functional groups, solubility, ease of biosynthesis and metabolism, the low energy considerations of the glycosidic linkage compared to other bonds, e.g., peptide bonds, and the non-template dependence on glycan generation and modulation. The evolutionary factors involving competition between host-invader interactions and the emergence of the diverse functions of glycans associated with unique biological pathways are among several drivers of diversity of glycomolecules in viruses and all living cells. Emerging information about the glycogenomes of organisms, the genetic regulation of glycosylation, and the unique glycomes of each organism, is contributing to our enlightenment about the complex evolution of glycan structure and function. However, as the glycome cannot yet be directly predicted from the genome, new tools and analytical technologies are likely to further illuminate how we think about glyco-evolution and the dynamic landscape of glycan expression, recognition, and function in all life forms.