Accesso libero

Evolution and Diversity of Glycomolecules from Unicellular Organisms to Humans

INFORMAZIONI SU QUESTO ARTICOLO

Cita

Introduction: Glycomolecules are ubiquitous in nature

All cells of all organisms express a dazzling variety of glycomolecules on their membranes and in their secretions (1, 2). In addition, the two monosaccharides ribose and deoxyribose, comprise about one-third of all nucleic acids in ribonucleic acid (RNA) and deoxyribonucleic acid (DNA), respectively. Glycomolecules other than nucleic acids include glycoproteins, glycolipids, oligosaccharides, polysaccharides, glycosylphosphatidylinositol (GPI)-lipid anchored glycoproteins, and the newly discovered glycosylated RNA (glycoRNA) (3, 4). These classifications are somewhat deceiving, as within any class there are many subclasses and the diversity is extraordinary. The term glycan refers to the glyco component of glycomolecules, and generally denotes all kinds of saccharide classes (1). Unlike nucleic acids and proteins, glycomolecules are not generated from a template; thus, there is no information within the genomes of organisms that can accurately predict either glycan structure or diversity. The diversity of glycomolecules in nature is only partly known, as only a tiny fraction of living things has been analyzed. Nevertheless, available information supports the startling fact that all organisms make relatively unique glycans. Glycans in unicellular organisms, such as bacteria (prokaryotes), protists (protozoans), and yeast, are very different from those in plants and even the simplest invertebrates, and certainly different from those in vertebrates and humans (Figure 1).

Figure 1.

Here is depicted a few examples of the diversity of glycans found in glycomolecules of different organisms – mycobacteria, yeast, plants, nematodes, and humans. More details about these structures and discussions about their biosynthesis and function can be found in the following references (5,6,7,8,9,10,11,12). For more information about the symbols above used to designate monosaccharides see https://www.ncbi.nlm.nih.gov/glycans/snfg.html. Hundreds of different monosaccharides are known, and they can be found in multiple and unique combinations in all forms of life (13, 14). As depicted, the monosaccharide linkages are either α or β, and the linkage number, e.g., 2, 3, or 4, etc., refers to the carbon number on the monosaccharide to which the preceding monosaccharide is linked. Asn, asparagine.

This article highlights several concepts regarding the diversity and evolution of glycomolecules, but the complete scope of the subject would require a much larger treatise. References are provided to help guide readers to additional points of interest and further details regarding the subject, and are not meant to be comprehensive.

Unique non-template nature of glycan synthesis and glycan size

While the diversity of nucleic acids and proteins is vast, they arise from a relatively small number of building blocks. Moreover, the macromolecules themselves are made in a linear form arising from a template, either DNA or RNA, directing their specific sequences. However, glycans are created without templates, and collectively in nature they are comprised of hundreds of different monosaccharides, and perhaps even thousands of them (13, 14). A single organism may have a set of monosaccharides and derivatives that can differ greatly from that in another organism. Thus, because the building blocks can vary, and without a template guiding the synthesis, there appears to be no end to the complexity of glycans generated in nature. In contrast to nucleic acids and proteins, glycans may have multiple branches and each branch can convey unique determinants. Using just six different monosaccharides occurring within a hypothetical hexasaccharide there are 1.05 × 1012 possible oligosaccharide isomeric products, both branched and linear, including their α and β anomers (15). By contrast, even using all 20 amino acids, there are only 6.4 × 107 possible linear hexapeptide products.

The term glycosylation denotes an enzymatic attachment of sugars to molecules, which is the focus of this article, whereas the term glycation denotes the non-enzymatic, chemical reaction of monosaccharides with amines in side chains of amino acids. This enzymatic control over glycosylation allows cells to generate specific glycan structures in an orchestrated fashion. In glycation, however, the initial chemical reaction is reversible and not specific, yet rearrangements can occur to generate irreversible adducts (16). Glycation is considered a pathological modification, as exemplified by glycation of hemoglobin and recognized as HbA1c (17), which is monitored in diabetes. Glycation might have been an early feature, however, of glycomolecular evolution. For example, pentoses, such as d-ribose, are more reactive than hexoses, such as d-glucose, but the potential of such evolutionary contributions of glycation are not well understood.

The sizes of glycomolecules in nature vary enormously; they may be tiny or massive, regardless of the organism making them. Amazingly, even the tiniest infectious and transmissible pathogen, the prion, is a single glycoprotein with multiple glycans enzymatically attached by infected cells; importantly, its glycosylation is essential for its neurotoxicity (18). Finally, transfer RNA (tRNA) (19) and non-coding RNAs can be glycosylated (3), and even DNA itself can be glycosylated, as observed for the occurrence of glucosyl-5-hydroxymethylcytosines in phage DNA, which is added by phage-encoded α- and β-glucosyltransferases (20).

In terms of size, consider the common linear polysaccharide hyaluronic acid (HA), which is synthesized by both microbes and animals. In animals HA is just one among the many glycosaminoglycans (GAGs), e.g., chondroitin sulfate (CS) and heparan sulfate (HS), that evolved early in evolution (21,22,23,24). Microbial and human HA are identical in structure, although the lengths may be different; a single HA molecule can have ~25,000 repeat units of the disaccharide N-acetylglucosamine-glucuronic acid (GlcNAc-GlcA), thus ~50,000 monosaccharide units, and be 0.25–25 μm in length with a size up to 10,000 kDa (25). This is larger than titin, the largest protein found in nature, which is ~3800 kDa and contains ~34,000 amino acids (26). In animals, mucins represent some of the largest glycoproteins, with among the largest being MUC16, a mucin of ~22,152 amino acids and up to 2000 kDa; >50% of its mass is carbohydrate and it can be 1–5 μm in length (27,28,29,30). Gel forming mucins can be cross-linked to make larger gigantic polymers of 100 mDa or more, as seen for gel forming mucins (31, 32). Remarkably, such mucins are synthesized intracellularly within the tiny Golgi apparatus, and are packaged and released to the membrane by secretory vesicles; however, how the folding and organization of such large glycomolecules is coupled to their synthesis and cross-linking is only partly understood (33,34,35,36).

Microbes can also generate enormous networks of covalently-linked glycomolecules. For example, the cross-linked network of peptidoglycan with repeats of N-acetylglucosamine-N-acetylmuramic acid (GlcNAc-MurNAc) that encases bacteria, allows them to resist osmotic pressure. Its overall size can be immense, as it is isolatable as peptidoglycan sacculi, in the range of 3 × 109 Da (37). Finally, the most prevalent polysaccharide on earth, cellulose, comprised of linear, repeating units of the monosaccharide glucose, can be up to 10,000 monosaccharides in length (38). At the other extreme, the smallest glycopeptides in nature include the glycopeptide antibiotics, such as vancomycin, made by the soil bacterium Amycolatopsis orientalis, which contains a disaccharide l-vancosaminyl-α1-2-d-glucopyranosyl linked to a 7 amino acid heptapeptide (39). The smallest free glycans, which are disaccharides, include the common disaccharide lactose, found only in mammalian milk, and the disaccharide sucrose, made by plants and microbes.

Thus, glycan diversity is a feature of all organisms. For any particular organism the assortment of glycans is referred to as the glycome (40), with the understanding that such glycans can be linked to other molecules. In higher metazoans the diversity of an organism’s glycans, in which each is considered independently as a bioinformatic unit, rivals, if not exceeds, the diversity of information units of either nucleic acids or proteins, represented by individual genes and their encoded proteins.

Evolutionary considerations in glycomolecule diversity: The genetic code limitation

From these general considerations a simple question arises. What has driven the expansion and diversity of the building blocks (monosaccharides) and the resultant glycomolecules in nature? One likely explanation is that glycan diversity is driven by the same forces, e.g., mutation and natural selection, that drive evolution of other types of macromolecules, such as proteins (41). There have been many excellent reviews about aspects of glycomolecular evolution, often focusing on specific classes of glycomolecules, certain types of glycan modifications, evolution of genes for enzymes, eukaryotic evolution, and other issues [some of them are cited here (6, 42,43,44,45,46,47)]. There are too many genetic pathways for glycomolecule generation to consider them all here, and thus, rather than go into all the interesting genetic aspects of glyco-evolution, herein is a complementary and slightly different approach with a focus more on the nature and diversity of glycomolecules in all forms of life and evolutionary challenges and opportunities associated with their generation.

The concept of evolution driving diversity of glycomolecules implies that there are pressures on organisms to evolve their glycomes in ways that enhance the possibility of their survival. The genome itself presents an interesting conundrum in this regard, which may have promoted the evolution of glycans very early. Consider the limitations inherent in the genetic code. The DNA genetic code is comprised of codons of three nitrogenous bases selected among the four – adenine (A), cytosine (C), guanine (G) and thymine (T). But this three bases codon, when transcribed into RNA with its four nitrogenous bases – adenine, cytosine, uracil, and guanine – can only accommodate 20 amino acids (plus selenocysteine and pyrrolysine), which universally make up all template-derived, complex proteins. Considering, however, all the vast roles that proteins play in biology, this relatively small diversity in side chains of amino acids limits the information content available to proteins.

One way of envisioning this is to imagine a polypeptide as a rope, linear yet very flexible, but restricted in the diversity of shapes and spatial organization it can assume. However, now imagine a rope carrying a virtually unlimited number of attached modifications that represent glycosylation, each with a potentially unique structure and composition at various branchpoints along its backbone; these create new physical properties of the rope. Thus, the potential advantages complementing protein structure that arise from the linkages of carbohydrates to proteins are obvious. The same protein may, when expressed in different cells, carry entirely different glycan structures are various branchpoints, and thus a single gene product, a polypeptide, may give rise to an incredible variety of modified forms.

Because of the differences between polypeptides and glycans, they evolved very different sets of functions. For example, some of the fundamental roles of proteins are to bind other molecules and to function in catalysis. Polypeptides are readily adaptable for such a function, both conformationally and structurally, to promote microcatalytic activities, as well as interactions with small molecules and intermediates, by which their unique amino acid chains are exploited (48). RNA is also adaptable for catalytic activity as ribozymes, which are considered to be important in early evolution (49). However, glycans, with their large monosaccharide building blocks, cannot easily adopt relatively rigid microstructures comparable to those in proteins, and so far no catalytic glycan has been observed. Thus, glyco-components have what may be termed macrobiological activities. These include protecting the cell from environmental stress, protecting proteins from proteolysis, modulation of protein immunogenicity, cell-cell interactions, organization of cellular matrices and cell walls, skeletal structure, interactions with proteins, e.g., growth factors, lectins, etc., and other molecules, as well as facilitating larger aspects of protein conformation and function, and even modifying other macromolecules, such as lipids, to directly link membrane architecture to the extracellular space (50). In regard to the outer membrane of cells, a universal feature of all organisms is the asymmetric expression of glycomolecules; they are concentrated on the outer leaflet of the membrane, rather than the inner leaflet, whereas proteins can be on either side and even transmembrane in nature (51).

The modification of a protein with another molecule is termed a post-translational modification (PTM), although some can occur co-translationally (52). As the diversity and availability of monosaccharides in nature evolved, PTM-based mechanisms evolved to overcome limitations inherent in the structures and bioactivities of polypeptides (and lipids). To fully appreciate the concept of protein glycosylation as a PTM, consider first the phosphorylation of protein as a simple PTM. This is among the simplest and most common PTM of animal proteins, but it typically occurs inside the cell. It is a small, reversible modification, added and removed by enzymes (kinases and phosphatase), and regulates protein interactions. Other PTMs of proteins include acetylation, methylation, hydroxylation, and ubiquitination, among >500 known PTMs (52, 53). But all such modifications are either small in size, or specifically fixed in size, e.g., ubiquitin or SUMO proteins, and typically reversible. However, in regard to secreted and membrane glycoproteins, glycosylation is an essentially irreversible modification and the sizes can range enormously.

Of course, there are also processes of intracellular glycosylation, and unlike most PTMs involving glycosylation, some of these are reversible. An essential type of intracellular protein glycosylation in most metazoans (multicellular organisms) and some pathogenic fungi (54), and which can reside on cytoplasmic, nuclear, and mitochondrial proteins, is the addition of the monosaccharide GlcNAc in linkage to Ser/Thr residues (O-GlcNAc). This pathway uses UDP-GlcNAc as a donor for a novel O-GlcNAc transferase, which is a reversible modification, subject to removal by O-GlcNAcase (55). This modification occurs on thousands of proteins, including most nuclear proteins and nuclear pore proteins, transcription factors, histones, and RNA polymerase II (56). O-GlcNAc addition along with phosphorylation, helps to organize and link nutrition, metabolism, cell signaling, and differentiation (57,58,59). O-Fucosylation by addition of fucose (Fuc) residues to nuclear pore glycoproteins has also been observed in the parasite Toxoplasma gondii (60), and also in nucleocytoplasmic proteins in the plant Arabidopsis thaliana, driven by the O-fucosyltransferase POFUT (61). Other intracellular protein glycosylation pathways also arose in protists to assist in oxygen sensing and involves modifications of hydroxylated proline (62).

Intracellular glycosylation of proteins is actually an ancient theme of glyco-evolution. The most famous and universal type of intracellular glycosylation in prokaryotes and animals involves the production of glycogen (63, 64), and in plants and algae the production of starch (65). In animal cells glycogen may originate from the protein glycogenin, which has inherent glucosyltransferase activity and auto-glucosylates a single tyrosine residue of itself using UDP-Glc to generate O-Glc; this is then extended and branched by enzymes to produce the polysaccharide glycogen (66). Interestingly, glycogen in the mammalian brain has both glucose and glucosamine residues, derived from UDP-glucosamine (67). Glycogen in animals can be a glycoprotein and reside in the cytoplasm of cells as large particles (β particles), some of which can be ~108 Da and ~300 nm in diameter (68). However, whether glycogenin is absolutely required for glycogen synthesis is unclear, as some humans and engineered mice can generate glycogen in the absence of glycogenin-1 (69, 70). Starch synthesis arises by extending malto-oligosaccharides (MOS) or existing starch primers (71).

As mentioned above, the addition of one or more monosaccharides to an amino acid allows the sizes to vary from a single monosaccharide to its elongation by addition of other monosaccharides up to many hundreds, far beyond any other type of PTM in size and diversity. This breakthrough in evolution allows even the tiniest of the amino acids, such as serine or threonine, to become giants, as seen for the O-glycans in nature (Figure 2). The addition of sugar to an amino acid may be thought of as the generation of a new amino acid with a novel side chain that can range in size from small to truly enormous. Interestingly, of the 20 canonical amino acids, 10 of them or their derivatives may be glycosylated, including asparagine (Asn), arginine (Arg), cysteine (Cys), glutamine (Gln), hydroxylysine (HydroxyLys), hydroxyproline (HydroxyPro), Ser, Thr, Trp (tryptophan), and Tyr (72,73,74,75,76,77), and all but the glycosylation of HydroxyPro have been documented in animal cells. Glycosylation most commonly occurs on –OH groups of certain amino acids (O-linked), or on amide (N-linked), but can also occur in other ways, including C-linked, S-linked, etc. Thus, the DNA template need not encode any additional amino acids, as it can easily expand its repertoire by modifying amino acids. Glyco-modifications of proteins in this way are the only PTMs in nature that can be vastly larger collectively than the protein to which they are linked and can occur at multiple sites on a protein. Multiple modifications of a single polypeptide at different sites can generate a tremendous array of what are termed glycoforms, i.e., different glycosylation of a protein with the same polypeptide backbone. For example, a single glycoprotein, the recombinant human enzyme Myozyme, was recently shown to contain more than 10,000 different glycoforms (78). Also, the most infamous glycoprotein in modern times is the spike glycoprotein of SARS-CoV-2 virus, responsible for the recent worldwide COVID-19 pandemic. While this membrane glycoprotein is 17% by weight carbohydrate, its N- and O-glycans cover ~42% of its surface (79), and the total number of glycoforms is likely to be enormous.

Figure 2.

Modifications of proteins with carbohydrates can increase both the complexity and size, as illustrated here for simple sugar modifications of the amino acid serine, as often seen in O-glycans of animals. While the largest one shown here has a tetrasaccharide attached, some O-glycans in animals can reach enormous sizes and contain many hundreds of monosaccharide moieties. Moreover, multiple serine, threonine or tyrosine residues may also be O-glycosylated on a single protein, thus further enhancing diversity and size.

The ancient nature of glycosylation

While glycomolecules are ubiquitous in nature, until a few years ago it was unclear as to whether organisms in all domains of life, Eukarya, Bacteria and Archaea (80), exhibited commonalities in the classes or types of glycomolecule they express. For example, it was commonly believed not long ago that bacteria did not synthesize glycoproteins. It is now clear, however, that glycoproteins are found in all bacteria, and in fact collectively they express amazing varieties of glycoproteins (81,82,83,84,85,86,87,88). Examples of glycoproteins expressed by Gram negative bacteria are S-layer glycoproteins in outer membranes (82, 89, 90), the multiple and essential glycoproteins produced by a single strain of Bacteroides fragilis (91), and the wide range of O-glycosylated (Ser/Thr) glycoproteins in the periplasm of Flavobacterium johnsoniae (92). Gram positive bacteria also express S-layer glycoproteins, and other surface glycoproteins, e.g., serine-rich repeat glycoproteins (93). Thus, expression of glycoproteins is a universal feature of life. Like all organisms, bacterial glycoproteins occur in secretions and on their outer membranes. Similarly, as for animal and plant glycoproteins, protein glycosylation in microbes has highly specialized biological functions. For example, the Neisseria gonorrhoeae pilin glycoprotein has a critical Ser-linked disaccharide needed for adhesion (94), the AIDA-I glycoprotein is necessary for pathogenic Escherichia coli adhesion (95), and the many dozens of glycoproteins containing O- and N-glycans in Mycobacterium tuberculosis affect immune reactions to the microbes (96, 97). There are ~80 glycoproteins that contain an Asn-linked heptasaccharide generated by Campylobacter jejuni, which are required for efficient cell invasion and colonization (98, 99). Such examples illustrate that glycoproteins are made by all organisms but their structural diversity and functions are often vastly different and specific to the organism.

A common biosynthetic concept for the elaboration of glycoproteins shared by bacteria and all other organisms regards the synthesis of lipid-linked glycans, broadly classified as glycolipids; they serve both as mature products made by cells and as precursors for generating other glycomolecules. A common type of protein glycosylation requires a lipid-linked sugar precursor. Such a preformed glycan can then be added to a protein or another glycolipid. For bacterial glycoproteins, this occurs such that the glycoprotein is modified within the periplasm at the membrane interface, as observed in C. jejuni for their Asn-linked pathway (100), and for S-layer glycoprotein generation (101). However, bacteria can also directly glycosylate proteins in some cases without preformed precursors. For example, sublancin, which is generated by Bacillus subtilis 168, has an S-linked (Cys-linked)-Glc, which is added by the enzyme SunS using UDP-Glc as a donor; this modification is essential for its antimicrobial activity (102). Also, all of the glycopeptide antibiotics made by bacteria are synthesized in the cytoplasm using nucleotide sugar donors and glycosyltransferases in pathways that are genetically traceable to be hundreds of millions of years old (103).

Cytoplasmic glycosylation of proteins also occurs in eukaryotes, as for the addition of O-GlcNAc and O-Glc, as discussed above. O-GlcNAc-containing glycoproteins also occur within the compartments of the mitochondria (104). This ancient nature of protein glycosylation in all organisms also includes the biosynthesis of nucleotide sugars; these are used to generate preformed lipid-linked sugar precursors or to directly glycosylate a protein (105, 106). The universality of glycosylating peptides and proteins either in the cytoplasm or elsewhere using nucleotide sugars and lipid-linked glycans has been extended and modified over time so that all types of cells, especially mammalian cells, can generate an astonishing number and variety of unique glycan modifications on an individual cell. Taken together, if we believe that longevity and ubiquity are any measures of success in molecular evolution, DNA and RNA are the most successful, but glycomolecules are a very close second.

Monosaccharide evolution

Monosaccharide evolution likely coincided with the formation of ribose and deoxyribose as these were incorporated into macromolecules essential for reproduction. Ribose is easily produced chemically by simple formose reactions involving formaldehyde and water, and proceed through glycolaldehyde and the formation of aldoses and ketoses (107, 108) (Figure 3). Glyoxylate and C6 aldonates, which can transform to ribose, are also candidates as evolutionary precursors of sugars (109, 110). Multiple lines of evidence indicate the primitive nature of formation of sugars in the prebiotic world (109, 111,112,113). The Murchison meteorite contained both aldonic acids and ribose (114, 115). Glycolaldehyde, the smallest possible molecule with both an aldehyde and an –OH group, has been detected in interstellar space and could be a source of pre-RNA intermediates (116). It is also interesting that sugar or alcohol alternatives to ribose and deoxyribose were probably available for inclusion in nucleic acids, e.g., threose nucleic acid or glycerol nucleic acid. However, neither their structural features, nor those of any other monosaccharide, competed effectively with ribose and deoxyribose for the formation of RNA and DNA, respectively (107, 117,118,119). Deoxyribose within DNA might provide greater flexibility and a higher stability to alkaline conditions compared to RNA, among other differences (120). Early in evolution, it is likely that mechanisms for the synthesis of the nitrogenous bases, select amino acids, lipids, and monosaccharides were established and conditions were such that these building blocks were at least semi-stable.

Figure 3.

Examples of formose reactions leading to the generation of d-ribose from formaldehyde. The generation of glycolaldehyde, glyceraldehyde and their condensation can lead to the formation of the 5-carbon pentulose, a ketose, which can epimerize to the straight chain aldose d-ribose, which itself is in equilibrium with its ring form, as shown for β-d-ribose. See the text and references therein for a more complete discussion of the origin and interconversions of monosaccharides.

The generation of additional monosaccharides, d-fructose and d-glucose, can arise by related pathways, and also through reactions of aldoses with hydrogen cyanide (HCN), which can extend their length by a carbon through generation of cyanohydrins (121). Monosaccharides will also polymerize chemically under the proper conditions through glycal and epoxide intermediates, which also gives them attractive structural features. These are just some among the multiple considerations regarding ribose and glucose and the intersection of their metabolism in early life forms, pertinent to the ascension of ribose (and deoxyribose) as central players in nucleic acids, and glucose as a central player in metabolism (122). Nevertheless, early in evolution glucose became the leading portable hexose building block and nutrient, and the most abundant building block on earth.

In terms of carbohydrate stereochemistry, the dominance of the stereoisomeric l-amino acids and d-sugars, the typical enantiomers of the building blocks of earth for proteins and glycomolecules, may have occurred early in chemical and biological evolution (123). The reason for chosen chiralities is unclear and debatable (124). l-amino acids and d-sugars may have dominated by random chance, or because their energy states differ slightly from their mirror images, or because differential reactivities, etc.; in any case, once established their recognition by other chiral molecules, e.g., proteins, would likely co-evolve to be specific to a restricted enantiomer. As for d-ribose, it occurs in macromolecules as β-d-ribofuranose (119, 123, 125). [It is interesting that both l-ribose and l-glucose can be easily chemically synthesized, but neither are found in organisms, and only d-ribose and d-glucose occur. d-Amino acids do not typically occur in messenger RNA (mRNA)-directed proteins, but they do occur in the peptide linkages within bacteria peptidoglycans (126)].

Furthermore, a monosaccharide such as d-glucose can readily be enzymatically isomerized with low energy costs into various epimers, such as the 4-epimer d-galactose, and the 2-epimer d-mannose, both of which are also incredibly common in nature. It is noteworthy that d-glucose, d-fructose, and d-mannose can slowly isomerize in certain solutions without enzymes (127). With such considerations it is easy to envision the generation of the vast array of monosaccharides found in nature (13).

The unique nature of monosaccharides drives structural diversity

Monosaccharides linked together in glycomolecules can generate linear or branched structures, there may be different stereoisomers, and they may occur in different anomeric linkages, either α or β, thus permitting a jigsaw puzzle of structural variety (Figure 1). Consider, that within starch and glycogen d-glucose occurs as an α-linked polysaccharide; it is both linear and highly branched, which promotes its compact folding. These features reduce the water content of such structures, which allows this high density nutrient to be compartmentalized within cells (128,129,130). However, too compact a structure creates a pathological condition associated with glycogen storage disorders, such as Lafora disease, caused by accumulation of dense, improperly branched, essentially insoluble glycogen aggregates termed Lafora bodies; these can occur in brain cells and lead to severe neurological disorders (131). By contrast, the isomeric structure β-linked d-glucose, is the basis of the linear, unbranched, giant polymer cellulose that occurs outside of cells; it is insoluble and has a profound structural role in cell wall architecture. Interestingly, the evolution of starch is thought to coincide with the endosymbionts ~1 billion years ago, in which an ancestor of cyanobacteria was internalized by a eukaryotic cell (132); both cyanobacteria and green algae can accumulate starch (133). The continued evolution of photosynthesis for the production of triose phosphates as precursors to d-glucose and its isomer d-fructose, which can be combined to produce sucrose by the ancient sucrose synthases (134), led to sucrose becoming the universal portable energy source in plants.

The ubiquity of sucrose and its unusual structure raises some interesting questions about its origin and dominance, as many other potential disaccharides could supply a similar about of energy. Sucrose is a non-reducing disaccharide comprised of d-fructose and d-glucose, i.e., it lacks any aldehyde or ketone groups, and cannot generate glycation products through common Maillard reaction mechanisms, as it is not reactive with lysine residues or amino groups in nucleotides or nucleic acids (135, 136). Although sucrose might accumulate in high concentrations in organisms, it is highly soluble but non-toxic in that regard, and thus, it could have been favored as a portable energy source over other competing disaccharides. A similar feature occurs in trehalose, in that case a disaccharide composed only of d-glucose, but also in a non-reducing form. Trehalose is produced in high amounts by many invertebrates, is a major source of energy for flying insects, and also has nutritional importance and preservative properties in cells (137, 138). By contrast, animals in general do not biosynthesize free disaccharides, except for mammals that generate high amounts of a reducing disaccharide lactose. It is comprised of d-galactose and d-glucose, and has an aldehyde group that is reactive with proteins by glycation (139). However, the restricted lactation-dependent and periodic accumulation of milk, as well as its restriction to the glandular products, may minimize the degree of potential pathology associated with its ability to glycate proteins.

Thus, while the glyco components of glycomolecules may vary in linkage, anomericity, monosaccharide units, branching, they may also be modified to become even more complex, as they may contain aglycone components, e.g., methylation, acetylation, phosphorylation, sulfation, phosphorylcholination, etc. Each glycan linkage and modification is catalyzed by discreet enzymes, but even a relatively few enzymes, acting combinatorially, but in an orchestrated fashion, can create an amazing variety of novel molecular structures. The types of structures shown in Figure 1 illustrate the terrific diversity of structures possible by linking monosaccharides to each other in such ways and further modifying the sugars. In that sense, if glycosylation is thought of as a PTM, the further modification of glycans themselves may be a post-PTM. Thus, glycomolecules present an imposing ensemble of structural diversity across all forms of life. Glycosylation of amino acids within proteins and the generation of oligo- and polysaccharides are ancient processes, and all organisms have multiple glycosylation pathways (46).

Photosynthetic systems for generating monosaccharides

The issue of sucrose raises another evolutionary consideration, which is key to the development of glycomolecules, and that is photosynthesis and the harvesting of energy to generate sugars (140, 141). This singular outcome, which is thought to have arisen in microorganisms, combines carbon dioxide and the 5-carbon sugar ribulose-1,5,-bisphosphate, to generate two molecules of d-glyceraldehyde 3-phosphate (G3P), which is also a monosaccharide (142). These condense to generate the 6-carbon d-glucose phosphate. Thus, the overall equation is simple; two relatively abundant and simple precursors, water and carbon dioxide, combine to produce glucose and oxygen. Through such processes the synthesis of trioses, riboses, and hexoses undoubtedly provided immense new possibilities for the evolution of sugars as photosynthetic organisms had a virtually unlimited supply of them. A dramatic variety of enzymatic modifications and condensations can generate additional monosaccharides, depending on the organism.

Photosynthesis evolved 3.2–3.5 billion years ago soon after life itself came into existence (143). It is utilized by cyanobacteria (blue-green algae), green algae, and plants as the most efficient way to trap solar energy and provide energy for the cell. In eukaryotes photosynthesis probably arose through endosymbiosis of a cyanobacterial-like organism (144). Amazingly, the smallest photosynthetic organism known, Prochlorococcus, is a marine cyanobacterium which lives in oceans and produces up to 20% of the oxygen in our biosphere, and an amazing amount of glucose and macromolecules for the marine food chain (145). An interesting Gedankenexperiment or thought experiment is to imagine a photosynthetic process, perhaps discarded in evolution, that might generate some molecule that is not a sugar, such as an amino acid, a lipid or a nucleotide; any of these could theoretically be energy sources and precursors for other molecules. However, the champions of evolution were the simple trioses, glucose building blocks, and specifically d-glucose. Its attributes include its incredible water solubility and multiple equatorial –OH groups; its ring form occurs as a glucopyranose, but it can also exist as a straight chain aldehyde. It can be easily degraded and further modified. Such kaleidoscopic features of d-glucose likely provided advantages over other small molecules, exploiting its manageable reactivity, and providing a stable building block and precursor for many other glycomolecules.

Glycan and glycomolecule functions

Given their incredible structural diversity, it stands to reason that glycans have a tremendous number of functions indispensable for life (50), and some have been discussed above. Some of the broad categories include the following. (i) Glycomolecules are structural components, essential for survival of a cell or organism. Examples of such functions are seen for RNA and DNA, and in polysaccharides, such as cellulose and chitin, and extracellular matrix (ECM) components, such as the polysaccharide HA and the glycoprotein collagen. Microbes that have peptidoglycans in their cell walls require them for osmotic integrity and environmental responses, and microbial ecology. Glycolipids overall are essential in biological membranes for membrane integrity and membrane functions and biological topology. Similarly, GPI-anchored glycoproteins expressed in eukaryotes and likely in Archaea, but not bacteria, are critical for membrane organization and survival (146). (ii) Glycans in glycoproteins are commonly found to be essential to one or more features, such as protein structure, oligomerization, trafficking, turnover, resistance of proteins to proteases, free radical attack, etc., in what might be called indirect functions (147, 148). This is evidenced by phenotypes of cells and organisms in which protein glycosylation is defective; the constituent proteins are often unstable or poorly functional. Glycans at specific sites can modulate functions of proteins even if the glycans themselves are not directly recognized. Such is seen for chemokine interactions with their receptors, in which different structures of glycans modulate the specificity and affinity of binding (149). It is also seen in collagen, where the modifications of hydroxylysine are indirectly required for correct organization of collagen fibrils (150). O-Fucosylation of Ser/Thr residues in Notch receptors regulates interactions with their ligands, which in turn regulate expression of a tremendous variety of other genes and biological pathways (151). (iii) Glycans can also have direct functions, e.g., be directly recognized by proteins, glycan-binding proteins (GBPs) that include lectins, adhesins, antibodies, etc. In such cases, the glycans may be recognized rather independently. Consider the tens of thousands of GBPs and the carbohydrate-binding modules (CBMs) within them, expressed by bacteria and used to adhere to specific glycomolecules (152). Also, the formation of the lymphatic system is regulated by direct interactions of the O-glycans of the membrane glycoprotein podoplanin with the platelet GBP termed CLEC-2 (153). (iv) Glycomolecules may also be sources of energy, such as provided by starch, glycogen, lactose, sucrose, etc. (v) Finally, glycomolecules, such as the glycopeptide antibiotics and aminoglycoside antibiotics, are toxic and can allow an organism to control its environment (154, 155). There are many other categories of glycomolecule function, but these illustrate the diverse evolutionary variety of glycomolecules to provide unique structural features and biological activities.

Glycolipid biosynthesis, diversity and commonality in organisms

Glycolipids are important structural components of cellular membranes (156), but also important for sugar metabolism and glycomolecule biosynthesis. All organisms have within their membranes one or more of different types of glycolipids, which include those derived from fatty acids, glycerol, sphingolipids, phospholipids, polyisoprenols, and steroids, all of which can be glycosylated, e.g., glucosylated cholesterol (157). In evolution, no specific glycolipid is conserved in all organisms, but each organism has essential glycolipids, and their classes are often unique to certain organisms. For example, bacteria have lipopolysaccharides (LPS), animals have glycosphingolipids, such gangliosides (158); plants and algae lack these, but have galactolipids such as mono- and digalactosyldiacylglycerol (MGDG, DGDG) and the sulfolipid sulfoquinovosyldiacylglycerol (SQDG), not found in other organisms (159). Amazingly, MGDG and DGDG are the major glycolipids in the thylakoid membranes of all photosynthetic organisms. All such glycolipids have important biological functions, too vast to consider in detail here. Glucosylceramide is one of the simplest glycolipids and is expressed by fungi, plants, and animals; it is essential for normal development in all such organisms, as shown in genetic studies in Caenorhabditis elegans, plants, and mice (160,161,162). Thus, an important theme of all cells is that they express glycolipids, but they may differ in many ways, undoubtedly with vastly different functions, depending on the organism (163). Bacteria can also synthesize an amazing variety of very complex glycolipids, but also simple glycolipids unlike those in any animal or plant (164). These include small glycans on fatty acids, fatty alcohols, sterols, hopanoids and carotenoids, that can vary in degrees of substitutions, chain lengths, etc.

The polyisoprenoids represent a unique class of lipids for glycomolecule generation. In animals they are represented by dolichol, a polyisoprenoid of C95-C125, i.e., carbons in length (163, 165,166,167,168,169); the bacterial counterparts are the C55 undecaprenoids, e.g., bactoprenol (169). These are present in all organisms and serve as lipid-linked precursors that may contain monosaccharides and oligosaccharides essential for biosynthesis and sugar addition to other molecules. The polyisoprenoids are derived by condensation of the universal 5-carbon precursors of all isoprenoids, isopentenyl pyrophosphate (IPP) and dimethylallyl pyrophosphate (DMAPP), as well as steroids such as cholesterol (170).

It is thought that a momentous step in evolution was the formation of membranes to encapsulate molecular activities associated with life (171, 172). But compartmentalization of metabolic activities also created the problem of topology (Figure 4). Many glycomolecules are present on the outer surface or membranes of cells, which has been termed the glycocalyx (173), but the sugar precursors to generate them arise mainly in the cytoplasm. Sugars are hydrophilic and averse to the hydrophobic membrane environment and do not diffuse through lipid membranes. The polyisoprenoids function as reusable precursors that allow the protein-controlled flipping of glycans across membranes as lipid-linked sugars, from the cytoplasmic side of the cell to the opposite side of the membrane, e.g., the lumen of an organelle such as the endoplasmic reticulum (ER), or the outer membrane of microbes (Figure 4). In this ancient process, a glycolipid precursor of some type is made on the cytoplasmic side of a membrane and then flipped or translocated to the outside of the membrane using protein-assisted channeling pathways (174), as seen in microbes, or the lumen of an organelle, e.g., ER (175). For example, this flipping process is used by microbes for the synthesis of LPS. While membrane-associated glycolipids are typically asymmetrically presented on the outside of the membrane lipid bilayer, an interesting exception is an inside-facing diacylglycerol-based glycolipid with a short oligosaccharide structure termed the protein integrase or MPIase. For E. coli it is required for general integration of proteins into the inner membrane (176). In most biosynthetic processes, following glycan transfer from the lipid-sugar intermediate, the released lipid, such as a polyisoprenoid, flips back to the opposite size of the membrane for recharging to continue the cycle.

Figure 4.

Cells are bounded by membranes and glycomolecules are typically on the outer membrane of cells or secreted from the cells; however, the precursors to such glycomolecules are formed in the cytoplasm. One approach found in all organisms to overcome this topological barrier is to generate unique types of lipid-linked sugars. Within the cytoplasmic side of a membrane, cells can generate a lipid-linked sugar, and it can then be flipped to the opposite side (external or lumenal). In the two examples, taken from animal cells, but common to eukaryotes, glucosylceramide is synthesized on the cytoplasmic side of the Golgi apparatus using UDP-glucose as a donor, and the product is then flipped to the lumen of the Golgi by protein-assisted pathways. Similarly, for N-glycosylation pathways in animals, one example is shown here. The dolichol-pyrophosphoryl-GlcNAc2Man4 is synthesized on the cytoplasmic face of the ER, and finally modified to dolichol-pyrophosphoryl-GlcNAc2Man5 through the donor GDP-mannose. This particular product is then flipped to the ER lumen for further modifications there and eventual transfer of its preformed oligosaccharide to the newly synthesized protein, either co- or post-translationally. This basic concept of such topological resolutions also occurs in prokaryotes, as seen for LPS. Also shown is the direct synthesis at the plasma membrane of HA in both prokaryotes and eukaryotes using cytoplasmic precursors and extrusion of the product into the extracellular space. ER, endoplasmic reticulum; GlcNAc, N-acetylglucosamine; HA, hyaluronic acid; LPS, lipopolysaccharides.

Such biosynthetic recycling, however, is not the case for glycosphingolipid biosynthesis using ceramide; the common precursor glucosyl-ceramide, is generated from UDP-Glc and ceramide on the cytoplasmic side of the Golgi apparatus and is then flipped into the Golgi lumen (177,178,179), and within the Golgi apparatus it can be extended to generate many other glycolipids (Figure 4). Thus, the lipid is not immediately recycled, but of course it may be reutilized after glycolipid degradation in the lysosome. Oddly, the topological issue of generating the isomeric galactosyl-ceramide in animals is completely different from that for generating glucosyl-ceramide. Galactosyl-ceramide is generated within the lumen of the ER by the enzyme UDP-Gal:ceramide galactosyltransferase which uses UDP-Gal imported there as a donor and no flipping of the product is needed (180). Thus, the co-evolution of such glycolipids along with monosaccharides allowed cells to both exploit the membrane-bound and compartmentalized nature of early metabolism, as well as to preserve and target forms of biosynthesis of specific biomolecules for presentation on the inside and outside of cells. Interestingly, enzymes for synthesizing glucosyl-ceramide and its simple extensions have been reported to be present in the mitochondria-associated endoplasmic-reticulum subcompartment (MEM), which may be a site for their production and movement into the mitochondria, as this organelle has been reported to contain small quantities of such glycolipids (181).

Compartmentalization for intracellular glycomolecule synthesis and membrane-directed synthesis for special polysaccharides

Other pathways also evolved in eukaryotes to synthesize glycomolecules within compartments (organelles) and exploit the topological barrier. Nucleotide sugar transporters allow the movement of nucleotide sugar precursors made in the cytoplasm into lumenal compartments, e.g., ER and Golgi apparatus, through specific transporters. [An exception of this rule in eukaryotes is the nucleotide sugar CMP-sialic acid, which is generated in the cell nucleus (182).] Within those compartments glycomolecule biosynthesis can proceed in a more regulated fashion and the nucleotide by-products are retro-transported to the cytoplasm for recycling (183). The newly synthesized glycomolecules are either soluble or membrane bound within the compartment, yet topologically outside of the cytoplasm, and may reach the outside through membrane fusion events during secretion.

A completely different pathway arose, however, for the three giant polysaccharides found in nature – cellulose, chitin, and HA. Interestingly, all of them have either Glc or the Glc derivatives GlcA or GlcNAc. For HA synthesis, as shown in Figure 4, hyaluronic acid synthase (HAS) in microbes and animals is a transmembrane enzyme that extrudes HA co-synthetically to the extracellular space; within its single polypeptide the enzyme has two catalytic activities, that can alternately use nucleotide precursors to generate the repeating HA disaccharide GlcNAc-GlcA (22, 184). Chitin and cellulose biosynthesis also proceeds through such transmembrane channel enzymes at the cellular membrane (184, 185). As for HA, the glycans have lengths that greatly exceed the dimensions of the cell synthesizing them, but the sizes are somewhat indeterminant. These processes contrast with those used for linear microbial LPS and capsular polysaccharide (CP) synthesis, where they are generated at the membrane coupled with synthesis and typically use lipid-linked sugar donors (186, 187); however, their exact lengths can also be indeterminant (Figure 4). It interesting that polysaccharides are the only polymers in nature that can have thousands of repeats of a single building block, e.g., glucose in cellulose or starch. No such giant counterparts exists for proteins or nucleic acid comprised of a single repeating building block, although some polypeptides, relatively short by comparison to the giant polysaccharides, can be generated from a single amino acid, e.g., epsilon-poly-l-lysine (ɛ-PL) generated by non-ribosomal pathways (188).

The complex nature of glycosylating nascent polypeptides during their biosynthesis

The essential nature of protein modifications with sugars, as discussed above regarding microbial glycoproteins, also includes the common N-glycosylation of eukaryotic glycoproteins that occurs in the ER. There are many conceptual aspects of protein glycosylation that are shared between prokaryotes and eukaryotes. The linkage of oligomannose to Asn residues in the eukaryotic ER represents an en bloc addition of a large glycan of 14 sugar residues comprised of Glc3Man9GlcNAc2 from the precursor Glc3Man9GlcNAc2-P-P-dolichol, where the sugars are linked by a pyrophosphoryl linkage to the giant polyisoprenoid dolichol (8) (Figure 4). Such glycosylation often occurs at multiple Asn sites on emerging nascent polypeptides, and thereby proteins can acquire glycans co-translationally (and also post-translationally) (189). During this process, a glycoprotein must fold upon itself, as well as interact with chaperones and other proteins in the densely packed ER. One can imagine the potentially intense intramolecular and intermolecular interplay between glycan and protein in this dense ER environment. This potential chaos in such an environment is controlled through evolution by the generation of host protein chaperones that bind proteins, and GBPs that bind glycans in the ER that act as molecular glycan chaperones, interacting with the newly transferred glycans independently of the protein, in a reversible manner, as other protein chaperones do by peptide recognition (190). Such binding, release, rebinding, and remodeling of the ER glycans through de- and re-glucosylation of the N-glycan, coupled with protein folding pathways, eventually allow a glycoprotein to correctly fold. There are elaborate processes in eukaryotes to deal with misfolded glycoproteins; these are forced to exit the ER and be degraded in the cytoplasm through multiple pathways involving glycan recognition and removal (190). Interestingly, such cytoplasmic N-glycosylated glycoproteins must be efficiently and rapidly deglycosylated; genetic defects in this process, which is universal in animals are associated with severe developmental disorders, such as NGLY1 deficiency (191). Thus, the glycosylation itself becomes not only an essential feature of the protein, but through its quality control attributes, it contributes to synthesizing functional glycoproteins (192). All eukaryotes have evolved these ER GBPs and quality control pathways to regulate their production and folding of glycoproteins in the ER. In the absence of Asn-glycosylation cells die, as they cannot efficiently produce mature glycoproteins, and the subsequent accumulation of misfolded proteins leads to ER stress, e.g., ERAD, and apoptosis. In certain protozoa, the same process occurs; however, the precursor lacks Glc residues, and Man9GlcNAc2 is added from the precursor Man9GlcNAc2-P-P-dolichol (193); importantly, in those cases glucosylation and re-glucosylation still occurs on the protein glycans to regulate glycoprotein folding (194). In other microbes, the precursor N-glycans can be even shorter, e.g., in Plasmodium falciparum the precursor is only GlcNAc2-P-P-dolichol and lacks mannose entirely (195). Thus, this ancient pathway is fundamental to protein folding in the secretory pathway and is preserved in all its basic elements throughout eukaryotic life, but the sizes and functions of both the precursor and mature N-glycans in these eukaryotes have evolved dramatically.

Glycomolecules in viruses

Many viruses contain glycoproteins and other glycomolecules, such as capsid or envelope glycoproteins. Glycoproteins are present in small capsid viruses, such as the adenoviruses, and in large enveloped viruses, such as HIV-1 (196, 197). For many viruses, their glycomolecules are synthesized by the infected host using virus-encoded proteins and host enzymes. The genomes of animal viruses lack encoded genes or machinery to dictate the glycosylation of viral molecules, but instead completely depend on the host cells they infect to add needed glycans to their viral glycoproteins (197, 198), in what may be classified as host-dependent viral glycosylation. Well studied examples of such viruses that infect animals include HIV-1, influenza viruses, and of course, coronaviruses, such as the notorious SARS-CoV-2. Envelope viruses, as they bud from cellular membranes, may also acquire glycolipids and other host glycomolecules present in membranes (199). The glycomolecules in viruses are often key to their interactions with host cells or interactions and evasion of the human immune system (197, 200, 201). It is interesting to speculate that animal viruses, which can infect animals that have immune systems, were selected to specifically not be able to glycosylate their own proteins, but use the glycosylation systems in the host cells. Such acquired host glycans, therefore, might be less likely to provoke immune responses and block virus replication, acting as a shield (202). Yet, the virus may not always gain the advantage in this system, as glycan presentation on the viral glycoprotein can be unique in conformation and organization and induce antibodies to the carbohydrates, as seen for broadly neutralizing antibodies (bnAbs) to HIV (203). These bnAbs bind to common oligomannose N-glycans, but bind uniquely because of their conformational presentation on the envelope protein surface. Interestingly, while not encoding genes to synthesize glycans, influenza virus binds to sialic acid-containing glycans and other host receptors through its virus-encoded hemagglutinin (viral HA), and then can destroy the sialic acid-containing receptor through its virus-encoded neuraminidase/sialidase (viral NA) (204, 205).

There are a few animal viruses whose genomes encode glycosyltransferase enzymes that can modify glycans. The Bo17 gene of bovine herpesvirus 4, a bovine Rhadinovirus, encodes a β-1,6-N-acetylglucosaminyltransferase (β-1,6GnT) that is 81.1% identical on the amino acid level with the human homolog β-1,6GnT-M. It may have been acquired from its bovine host ~1.5 million years ago (206). The enzyme provided by the virus can be alternatively spliced, and participate in remodeling of the infected host cell glycome (207). The genome of the South American myxoma virus, which infects forest rabbits, encodes an α2,3-sialyltransferase that is associated with enhanced virulence (208). The well-known baculovirus Autographa californica infects juvenile insects, and upon infection expresses a virus-encoded soluble glucosyltransferase that glucosylates the molting hormone, leading to its inactivation, and a block of pupation (209, 210). Finally, in a related manner, some viruses, e.g., human T-cell leukaemia virus (HTLV-1), upon infection induce expression of the host fucosyltransferase VII, leading to elevation of the surface expression of the carbohydrate antigen sialyl-Lewis x (sLex) on host cell glycomolecules, a potential ligand for various selectins that interact with leukocytes (211).

By contrast, the genomes of many viruses that infect organisms other than animals encode enzymes for producing their own varieties of glycomolecules. The genomes of bacteriophage, which are among the smallest viruses, encode a variety of glycosyltransferase and glyco-modifying genes active on their own molecules, including their DNA (212). But other viruses also encode glycogenes, e.g., glycosyltransferases, capable of modifying their own proteins and host proteins. The genomic sequencing of giant viruses revealed extensive numbers of such glycogenes (212,213,214,215). Giant viruses are ubiquitous in all fresh water, and infect freshwater eukaryotic algae; they are known as nucleocytoplasmic large DNA viruses (NCLDVs) (216). The chlorovirus Paramecium bursaria chlorella virus (PBCV-1) synthesizes unusual Asn-linked glycans unlike any of those in animals. The glycans are linked to Asn through a β-glucose linkage, and are highly branched, even containing a hyperglycosylated fucose modified at all sites, and all the glycoforms contain a dimethylated rhamnose as the capping residue of the main chain (217) (Figure 1). The genome of the related Marseillevirus marseillevirus, a type of giant mimivirus, encodes several glycosyltransferases (214, 218), that both hydroxylate lysine residues in proteins and then add a collagen-like linkage of Glc (animals use Gal) to the hydroxylysine. The enzymes can modify their own collagen-like proteins and even animal collagens (219). These viruses are also rare non-animal organisms that generate collagen-like molecules. Amazingly, animals exposed to such mimivirus collagens produce anti-collagen antibodies that may promote arthritis (220).

Microbial polysaccharides

Many studies have documented the tremendous differences in glycomolecules between different unicellular organisms, and certainly between bacteria and archaea (9, 98, 221,222,223). Bacteria lack a nucleus and are bounded by a single membrane (and outer membrane in Gram negative) in which is embedded glycolipids, e.g., LPS in Gram negative and lipoteichoic acids (LTAs) in Gram positive. As noted above, glycoproteins are also produced by all types of bacteria, Gram negative and positive. In regard to polysaccharides, bacteria are amazing in their ability to generate diversity from a common theme. LPS, which is essential for bacterial viability, is densely expressed on the surface and is a linear polysaccharide; depending on the organism, it may have repeating units of di- to nonasaccharides (two to nine), and there may be several million copies of LPS expressed on the surface of a single bacterium (224, 225). There may also be non-stoichiometric modifications of the LPS, which are partly regulated by small non-coding sRNAs (226). But the genetics and the genes encoding the enzymes responsible for such linear LPS molecules for example, are remarkably few, and each independently adds a sugar from a precursor intermediate. Thus, each microbial strain may synthesize a dominant but unique LPS or a CP of varying lengths, leading to their recognition as serotypes, based on antibodies that can distinguish the antigenic glycan components (227). The strain differences, as for E. coli, are usually based on variations in the LPS structure or O-antigen, which arise from variations in the linear genes in the operon responsible for LPS production (228). Over 700 different serotypes of E. coli are known and ~470 of these produce Shiga toxin (229). For Salmonella, there are >2400 serotypes (230). Although common bacteria, even E. coli, can also express other glycomolecules and glycoproteins, their glycome is typically dominated by their unique LPS. By contrast, LPS is not synthesized by cyanobacteria, and they lack the core sugars of Gram negative LPS such as 3-deoxy-d-manno-octulosonic acid (KDO) and heptoses found in typical LPS (231). Their exopolysaccharides can be assembled stepwise by enzymes in the cytoplasm and then translocated to the outside through the membrane.

Archaea lack both LPS and LTA, common components of Gram negative and Gram positive bacteria, respectively (223, 232). Archaeal glycans are much more complicated in any individual organism compared to bacteria; archaea also lack conventional murein or peptidoglycan, but instead make a tremendous variety of glycomolecules related to peptidoglycan in their cell wall. Archaea can also synthesize glycoproteins, and a major feature of the archaeal cell surface is the S-layer (233). They also generate glycoproteins containing N-glycans linked to asparagine, a modification found in protists, along with unusual heteropolysaccharides, methanochondroitin, and many other types of polysaccharides (223). Overall, as might be expected, therefore, genomes of archaea are much more complex and enlarged in terms of glycogenes responsible for elaborating their glycomolecules, compared to eubacteria.

Plant glycomolecules

There are many pathways of glycomolecule biosynthesis that plants share with all eukaryotes, but there are some that are specific to plants and even a plant species (234, 235). For example, the initial process of modifying Asn residues in glycoproteins in plants is essentially similar to the processes in all eukaryotes, i.e., it utilizes a dolichol-linked glycan precursor to co-translationally add sugars to proteins entering the ER (236). Plants also have GPI-anchored glycoproteins using a lipid-linked precursor similar to yeast and animals (237). However, the types of mature N-glycans resulting in plants vs mammals, while similar in some ways, are very different in others (235, 238). Plant N-glycans do not contain sialic acid, which is a common terminal modification of N-glycans in mammals. Plant N-glycans often contain xylose β1,2-linked to the core mannose, and fucose α1,3-linked to the core chitobiosyl unit, but these are not found in mammalian N-glycans. In animals O-glycosylation pathways involve addition of N-acetylgalactosamine (GalNAc) to Ser/Thr residues, but this does not happen in plants; instead, they modify Ser, Thr, and hydroxyproline residues in their glycoproteins with other sugars, e.g., arabinose and galactose (239). In addition, keratan sulfate, which can occur in N-glycans of animals, and the Ser-linked GAGs within proteoglycans, such as HS on Syndecan-1, do not exist in plants but are major contributors to many biological pathways in animals (240).

Plants generate enormously complex polysaccharides with very unique structures not found in animals. A great example of a very unique set of polysaccharides synthetized by many plants is pectin, found in cell walls (241, 242). Pectin includes galacturonic acid-rich polysaccharides such as homogalacturonan (HG), rhamnogalacturonan I (RG-I), and the substituted galacturonans rhamnogalacturonan II (RG-II) and xylogalacturonan (XGA) (243). Its biosynthesis requires nearly 70 different transferases, such as glycosyltransferases and methyltransferases, which function in the Golgi to synthesize pectin that is then exported outside of cells (242). Pectin is comprised of 17 different monosaccharides in an amazing variety of linkages, at least 20 types (244). Plants also synthesize a dazzling variety of polysaccharides in general, such as xylans, mannans, galactans, arabinoxylans, and many others (Figure 1) (11). In addition, collectively plants make a nearly incomprehensible variety of GBPs (lectins) (245), which are commonly found in their seeds, roots, and bark, which are important in seed survival, plant protection, nitrogen assimilation, and many other pathways (246). They are considered to be a part of the innate immune system of plants, aiding in resistance to viruses, bacteria, fungi and animals. Amazingly, each species of plant appears to generate unique varieties of lectins. Because of their abundance and ability to bind many different monosaccharides and glycans, such plant lectins have become primary reagents to study glycosylation pathways in other organisms, as the lectins seem to recognize specific glycosylation patterns and each lectin is unique in some way. Plants represent the principle of stringent adaption of glycan form and structure to function for each organism competing to survive.

Distinguishing feature of glycomolecules in evolution are their structural complexities

Thus, while some might consider a vertebrate polysaccharide such as HS to be incredibly complex and an example of the pinnacle of complexity, pectin easily rivals it. Furthermore, a unicellular organism may generate a relatively small repertoire of glycans, e.g., E. coli, or it can generate an amazing variety of small and gigantic glycans, e.g., Mycobacterium tuberculosis (Mtb) (247). The glycomolecules in Mtb, which uses humans as its obligate host, are incredibly complex and varied in expression and the types generated, most of which are different from anything found in an animal. Its glycome is so complex that it could perhaps rival that in any single eukaryotic cell, even a single human cell type.

The current evidence indicates that the most important aspect of the glycome of an organism is the uniqueness of its collective glycan structures. Glycomolecules in all types of organisms may overlap in size and complexity per se, i.e. made up of similar parts, but their specific structures are generally unique. Thus, if we consider existing organisms as representing the outcome of evolution, we can see that each lineage of organisms has conserved the synthesis of certain classes of glycomolecules, e.g., polysaccharides, glycolipids, glycoproteins, etc., although their structures, scale and diversity, deviate considerably and have expanded along with the evolution of increasing genetic and organismal complexity. In addition, organisms have evolved many different classes of glycomolecules, uniquely functional for their biological niches, yet closely related organisms share general pathways for glycomolecule production.

These facts also suggest that there are key forces driving evolution for each organism leading to its generation of glycomolecules uniquely suited for its development and survival. Thus, it is not surprising that genetic mutations in biological pathways that affect glycomolecule elaboration in an organism lead to disruptions in its growth and survival. In microbes and plants this is well documented, for example, for many different glycomolecules, such as glucosyl-ceramide discussed above, for cellulose (248), and also in studies of humans with Congenital Disorders of Glycosylations (CDGs), which, depending on the gene or pathway affected, results in unique and diverse pathologies (249, 250). Horvitz’s pioneering work on GAGs in nematodes also showed that single gene mutations that alter pathways of glycosylation are associated with developmental defects and infertility (251). In addition, in animals and humans any disruption in the degradation of glycans, such as seen for lysosomal storage disorders (LSDs), invariably results in pathology (183). All such genetic evidence from unicellular organisms to metazoans unequivocally validate the vast and important roles of glycomolecules, their biosynthesis and turnover, in development and survival.

Host vs invader and relationships to glycomolecule evolution

There may be multiple evolutionary pressures on organisms to evolve their glycomes. Glycomolecule diversity likely arises from biological needs, including advantages of enhanced efficiency of biological functions and generation of novel functions. A readily identifiable driver of glycan diversity regards the interactions of a host with a potential invader, e.g., a bacteriophage, virus, bacterium, parasite, or some other pathogen or even commensal organisms, e.g., the microbiome (Figure 5). A primitive example of this is the interactions of bacteria with their invaders known as bacteriophages (252, 253). The phage express GBPs which bind to glycans that are expressed on the bacterial surface. Interestingly, there is evidence that phage GBPs can also bind other glycans, perhaps even those expressed by animal hosts of microbes, as in the gastrointestinal tract mucins (254).

Figure 5.

Evolving nature of glycan structures by the host which may be recognized by an invading organism. In this host vs invader model, the host may survive by modifying the glycans, e.g., glycan 1 to glycan 2, to which the invader may not bind well. The invader or other invaders may evolve the ability to bind the new glycan, leading the host to again respond with a modified glycan, e.g., glycan 3 or 4. This cycle may be endless, and leads to great diversity and complexity of both glycans through time on the host and multiple types of binders produced by the invaders.

An interesting historical observation of the interplay of host vs invader was made decades ago by Heidelberger and Avery, who reported in 1923 that the capsule of Streptococcus pneumoniae contains a surface polysaccharide (255). Avery and Goebel later reported in 1933 that infected patients made antibodies to the CP (256). Follow-up studies to those of Griffiths (257), indicated that smooth vs rough appearing colonies of bacteria, differed in production of polysaccharide; the former produced surface polysaccharides giving the bacteria a smooth, shiny appearance (S), whereas the rough lack this polysaccharide (R); the R strains are also non-virulent. In other studies Hershey and Chase confirmed that DNA is the genetic material, by demonstrating that bacteriophage T2 injected their DNA into bacteria, and that the new phage emerged from the injected DNA (258). Thus, the transforming principle was proven to be DNA through studies of polysaccharide expression. Interestingly, this concept was generally rejected for a considerable amount of time by the scientific community, as DNA was considered to be a boring, uninteresting molecule (259). Over time all objections were overcome, and this famous discovery by Avery, MacLeod, and McCarty based partly on the nature of glycomolecule synthesis is a cornerstone of modern genetics (260).

These historical studies are important to consider in the light of evolution; bacteriophage is one of the most abundant organisms on earth, yet their existence depends on first binding to a bacterium and then replicating within it. This typically involves binding to bacterial surface molecules including cell surface polysaccharides and porins, e.g., OmpC (261,262,263). One of the most studied phages is the tailed bacteriophage T4, which has the classic appearance of a spaceship with its landing legs; these legs are long tail fibers that attach to the LPS of bacteria (264, 265). The ends of these fibers have a GBP essential for this interaction with the bacterial surface. Of course, bacteria can evolve over time to develop a different polysaccharide to which phage cannot bind, and of course, the phage may evolve new binding activities for the different polysaccharide, apparently ad infinitum (253). This is thought to be an evolutionary driving force that expanded the diversity and biological niches occupied by bacteria (253, 266).

Thus, an interesting aspect of host vs invader involves the invader expressing a GBP or lectin that may interact with host glycans in a destructive process (see glycan 1 in Figure 5). If this is pathological, there may be selective pressure for survival of cells that express a different glycan (glycan 2) that is not bound well by the original invader. The original invader may evolve to generate a GBP that can bind to the novel glycan 2, or another invader can bind to glycan 2. Such modifications may continue, leading to novel glycans and GBPs, in the process termed the evolutionary glycan arms race (267). This has also been referred to as the Red Queen effect, ala Alice in Wonderland, where evolution is a never ending race between hosts evolving in response to rapidly evolving invaders in their interplay, both for good and bad (43). An interesting example of the evolving nature of such interactions is also seen for the tectonins, which are lectins conserved throughout evolution. They are found in bacteria, e.g., actinomycetes, fungi, invertebrates, fish, and humans. Tectonins are proteins with protective and toxic activities, such as fungal tectonin-2, which binds to methylated monosaccharides within glycans expressed in invading microbes and nematodes (268). This concept of host vs invader is inherent in the nature of life, as all organisms compete for niches, and glycomolecules participate in these continuing challenges.

Such host vs invader interactions involving glycomolecules also likely have larger outcomes, as the evolving glycans may not only help hosts evade (or attract) invaders, but the glycans may acquire additional and important biological features independent of the invaders, and in fact, become essential components of the molecules to which they are attached. This is an interesting concept to consider in light of the divergence of glycan modifications of proteins, and the essential roles that glycans have acquired in protein functions and protein folding. There is also the interesting rapid adaptive evolution in mammals and their immune receptors that recognize glycans of diverse microbial pathogens, including viruses and fungi (269). Thus, both glycan structures and proteins that recognize glycans may rapidly evolve and adapt in the never-ending host-invader interactions.

There is also the interesting possibility of an invader presenting glycomolecules that themselves can be pathological to the host or recognized by the host immune system. For example, human and animal parasites, such as protozoans and helminths, have unique glycomes, characterized by features that may benefit the parasite, but compromise the host. Some glycans might be recognized by the host immune system; yet overall, the parasite may survive and the host lives long enough to promote parasite survival. Some human pathogenic bacteria make CPs with sialic acid, but humans express sialic acid-binding proteins or Siglecs (270). Because the Siglecs may function to send inhibitory signals upon binding ligands, bacteria may use sialic acid as a type of glycan mimicry to evade immune attack. However, many parasites, such as the helminths or worms, do not express sialic acid and instead synthesize unusual glycans that are immunogenic (271,272,273), but also capable of interacting with different glycan receptors on immune cells to induce tolerance and evade immunity, a type of glycan gimmickry (274). If the titer and specificity of such anti-carbohydrate antibodies are sufficient these can result in killing of the invader, e.g., bacteria or parasite. This is the basis of anti-carbohydrate-based vaccines, such as the Hib vaccine given to children, which induces antibodies that target the surface polysaccharide of the deadly Haemophilus influenza type b (275). This issue of glycan expression in all organisms and their interplay with other organisms, either as commensal, parasitic and/or pathogenic, is extensive and requires a much larger discussion than can be afforded here. A currently compelling area of great interest in this regard is animal and human microbiomes and the roles of host and microbial glycans, and dietary glycans, in modulating and regulating normal homeostasis, as well as their roles in dysbiosis (276,277,278,279).

Glycomolecule evolution associated with enhanced functions

Another selective force for glycan diversity may be the enhancement and diversification of molecular interactions important in biological processes. For example, consider a common feature of protein-bound polysaccharides in vertebrates and invertebrates, which is the presence of a short ‘core’ structure to which is linked a longer polysaccharide, often of a repeating nature with disaccharide repeats.

This is seen in GAGs linked to soluble and membrane-bound proteoglycans (280), in the membrane bound glycoprotein α-dystroglycan (281), and in plant cell wall glycoproteins, such as the extensins and arabinogalactan proteins (AGPs) (11, 282, 283). In proteoglycans a common core containing Xyl-Ser is modified by multiple polysaccharides, e.g., CS or HS. In AGPs, which are the most highly glycosylated proteins in plants, hydroxyPro, Ser and Thr residues are modified with a galactan core and then high branched extensions are formed with galactose and arabinose, and further modified by other monosaccharides. Because the glycan component of AGP is >90% of the protein, the protein itself is considered to simply be a carrier for the glycans with no other specific function. However, for the glycoprotein α-dystroglycan it is glycosylated at multiple sites and contains a unique polysaccharide termed matriglycan (281, 284, 285). The extended glycans for both proteoglycans, plant cell wall glycoproteins, and α-dystroglycan become platforms to which multiple GBPs may recognize specific features within the glycans, thus allowing a single glycoprotein to partner with a wide range of other proteins in regulating biological activities.

While all such glycoproteins cannot be discussed here, α-dystroglycan provides an interesting example of glycan diversity associated with diverse functions. α-Dystroglycan has multiple functions in all animal tissues, and is a key component of the surface of muscle cells and many other cells for their interactions with the ECM, linking the cytoskeleton and actin via dystroglycan to the membrane and ECM. The glycan of a-dystroglycan is O-linked via mannose, and includes the core of Man, which may be phosphorylated, GlcNAc, GalNAc, and ribitol-phosphate, to which is linked a repeating polysaccharide termed matriglycan comprised of Xyl-GlcA repeats, which may vary in length due to the activity of the bifunctional glycosyltransferase LARGE1 (284). The generation of the core plus matriglycan requires at least 18 genes in vertebrates, and defects in any of these lead to a set of disorders termed α-dystroglycanopathies (286). Interestingly, specific genetic defects, recognized as congenital muscular dystrophies (CMDs), lead to different outcomes and clinical phenotypes, such as Walker-Warburg Syndrome, Muscle-eye-brain disease (MEB), Fukuyama CMD, and many others.

A main function of the glycans on α-dystroglycan is to provide recognition sites for binding of ECM proteins such as laminin and others that contain laminin G-like domains (LG domains); these include agrin, perlecan, neurexin1-α, and many others (287). Their interactions and the complexes they create are essential for formation and stability of skeletal muscle and other cellular interactions with the ECM. The vast number of proteins that bind to glycans of α-dystroglycan, and the different features of the disorders arising from defects in glycosylation of α-dystroglycan, suggest possible evolutionary drivers for their generation. The long and varied glycans in metazoan cells provide a wide range of interactions of different affinities to regulate the complex interplay of cells with the ECM, which may differ throughout all animal tissues. Extended matriglycan may also permit multiple interactions simultaneously with different LG proteins. Interestingly, while α-dystroglycan and its core glycosylation is conserved in invertebrates, such as Drosophila, the extensive matriglycan elongated from the O-Man core is lacking in Drosophila (285). Thus, the modifications and extended lengths of such glycans may have arisen through specific interactions that increased as the complexity of the glycans increased. In that sense, glycan complexity for all the glycans discussed thus far may provide opportunities for participation in and regulation of many biological pathways.

Interestingly, the evolution of GAGs and matriglycan may also involve host-invader mechanisms. GAGs are recognized by a wide number of different viruses, bacteria, fungi and parasites (288). For matriglycan, it was recently shown that it is a receptor for Lassa fever virus (LFV) (289) and for the organism M. leprae, which causes leprosy (290). The combination of glycan complexity driving biological pathways and the opportunistic interactions that arise with invaders is consistent with the roles of glycomolecules in the natural competition in nature for survival.

Roles of glycomolecules in the success of metazoans

Metazoans are defined as organisms with a body comprised of differentiated cells or specialized cells. It is noteworthy that the evolution of single cells or unicellular organisms to metazoans included the conceptual conservation of certain pathways for glycomolecule synthesis, along with the development of new ones. A conceptual relationship for example, is in regard to generating extracellular polysaccharides of functional importance. Cyanobacteria (blue-green algae) bind together to form floating mats or aggregates through interactions with the extracellular sulfated polysaccharides they produce. Such mats can trap oxygen which the organisms produce and thereby increases their buoyancy, e.g., cyanobacterial blooms (291). Thus, the evolutionary model of using glycomolecules to enhance adhesive interactions is also a common theme in all types of bacterial biofilms (292), which have multiple functions, and protect microorganisms from challenges and promotes adhesion and colonization. Overall, all organisms make some type of extracellular polysaccharide with different biological functions.

This theme of using extracellular polysaccharides to enhance adhesivity is also present in metazoans cells. The sponge is the simplest multicellular organism in which glycomolecules were long ago shown to be key to their survival. There are currently ~8000 species of living sponges; they are ancient, dating back to the Precambrian age, and perhaps a similar number of species can be found in fossils (293, 294). They are in the phylum Porifera, which derives from the Latin porus, highlighting the pores throughout their bodies. Interestingly, while sponges lack organized tissues, they do have specialized cells, e.g., choanocytes, porocytes, amoebocytes, and pinacocytes. Amazingly, the visible body of a sponge is a molecular skeleton, a collection of cells encased within mineralized polysaccharides and other glycomolecules. These include spongin, which is a collagen-like glycoprotein that supports the spicules. Spongin has many similarities to human collagen, and is a member of the superfamily of collagens found in the ECM of all animals (295), but not found in prokaryotes. The commercial, flexible bath sponge is produced by processing the animal to generate an acellular, spongin-rich fibrous skeleton (296).

Spongin and its collagen cousins are the most abundant and likely most ancient glycoproteins in all metazoans. Collagens are essential for normal survival of organisms, as seen in mutations of collagen genes in humans (297). There is also conservation in evolution of the main glyco-component of spongin and all collagens, which is glucosyl-galactosyl-hydroxylysine, a pathway and glyco-modification that may also be considered among the most ancient in metazoans (298). As noted above, even giant viruses make a form of collagen, but for them the linkage sugar to hydroxylysine is glucose. Amazingly, some bacteria also synthesize collagen-like glycoproteins capable of forming triple-helical structures; their O-glycosylation of the collagen-like region of BclA in B. anthracis is unlike that in animals and consists of short glycans containing rhamnose (299). Animal collagens also have conserved N-glycosylation features, with important proteostatic functions (300). Certain bacteria that inhabit sponges, such as the Poribacteria, have genomes encoding a wide variety of glycohydrolases and sulfatases to aid in their scavenging of the sponge ECM and carbohydrates for their unique survival there, but which may also be participated in host-invader processes (301).

Regarding the molecules holding sponge cells together, the classic experiments of Wilson in 1907 are noteworthy. Wilson demonstrated that two species of sponges, when their cells are dissociated and mixed, could reassociate over time, i.e., reaggregate, to regenerate the distinct sponges (302). This whole-body regeneration (WBR) in sponges is one of their amazing attributes, as some animals can only regenerate parts of their body (303). The factors important in specific sponge cell aggregation are termed aggregation factors (AFs), which are types of proteoglycans with sulfated polysaccharides attached, and which mediate the calcium-dependent polyvalent interactions between glycans, i.e., homophylic interactions, and interactions with cell surface receptors (304,305,306,307). Sponges synthesize sulfated polysaccharides, but they do not synthesize GAGs of the types found in other animals (24). In regard to AFs, their genetic evolution and diversification has been well studied, and arises through high allelism, combined with domain shuffling (308). This leads to diversification that promotes allorecognition (self-non-self) in sponges, as originally discovered by Wilson. This is also an example of carbohydrate-carbohydrate interactions, which has been proposed to be a driving force of glycomolecular functions in all organisms (309). This primitive type of allorecognition involving glycomolecules and their interactions is thus carried throughout metazoan evolution.

Human glycomolecules and the human glycome

After the above considerations, it seems fitting to conclude with a discussion of human glycomolecules, as the themes and questions that arise in other organisms are applicable to human glycomolecules. A vast literature has documented the structures and functions of glycomolecules in mammals and human cells in terms of their involvement in basement membranes, connective tissues, cell adhesion, cell signaling, nutrition, and many other pathways (50, 147, 310,311,312,313,314,315,316). A preceding section on Glycan and Glycomolecule Functions discussed specific roles of glycans in a molecular context. The human glycome and the genome encoding the information for their production, overlap in many ways with those in other mammals (50, 317). Types of such overlapping glycomolecules include many different types of membrane-associated and soluble glycoproteins, e.g., mucins, proteoglycans/GAGs, GPI-anchored glycoproteins, intracellular glycoproteins (O-GlcNAc), various types of glycolipids, e.g., ceramide-based, polyisoprenoid-based, and phosphatidyl-based, and an amazing variety of oligosaccharides and polysaccharides (50). Virtually all membrane proteins and most secreted proteins in mammals and human cells are glycosylated, often on multiple amino acids (318). The GAGs especially illustrate the non-coding nature of human glycomolecules. HS, which is essential for normal survival of an organism (319), is a linear polysaccharide made by most animal cells. It has a tremendous number of glycan modifications, e.g., acetylation, sulfation, and monosaccharide isomerism, which regulate its interactions with a variety of proteins, and yet these structural features are not encoded within a template and the absolute sequences cannot be predicted.

Much is being learned about the functions of glycans in humans, which is accelerating due to identifications of genetic mutations that alter glycomolecule production and turnover, and engineered mutations in animal models (183, 250). However, remarkably little is known about the structures of glycans within the entire human glycome, as there are no systematic studies in which all glycans in all glycomolecules throughout development and in each cell type in an adult are defined. It is common to think that the glycome of the human is the most complex of all, but since all glycomes of all animals are not known, this is perhaps an overstatement. There have been attempts to estimate just the number of glycan determinants in the human glycome, defined as the linkages of 2–6 monosaccharides recognizable by a GBP or having a specific function (320); such estimates suggest >7000 determinants, but when linked multiply on branched and large glycans, the number of glycans themselves likely reaches the millions. It may be fairer to state that the human glycome, like the glycomes of other vertebrates, is extraordinarily complex, and each differs in specific ways as the animals adapted to different environmental and developmental stresses. Some core pathways are similar in humans to those in many other eukaryotes, but some are unique to mammals.

There are some modifications of mammalian glycans in other primates that are not found in human biosynthetic pathways, e.g., α-linked galactose (321) and N-glycolylneuraminic acid (Neu5Gc) (322). The general nature and biosynthetic pathways overall are relatively conserved in all mammals, and also highly related to those in other vertebrates. The major differences in glycomes of humans, compared to other organisms, is in the structure of the glycans making up their glycomes. Also of interest in light of the many types of monosaccharides found in nature, human glycans are comprised of a relatively unique set of monosaccharide building blocks (13), similar to those in many other higher animals, but different when compared to those in monera, protists, fungi, and plants. Few if any pathways of glycosylation are found only in Homo sapiens and missing in other mammals. But there are some clearly unique human glycans not found in all other mammals. Human milk oligosaccharides are comprised of hundreds of free glycans; however, significant structural differences are seen in human milk oligosaccharides, compared for example, to those in milk from Canoidea, e.g., raccoon, mink, and dog, etc. (323). Several terminal glycan modifications on glycoproteins and glycolipids are rather uniquely human or at least unique to certain primate lineages of humans. These include the ABO blood group system (324), and the absence of α-linked galactose and Neu5Gc in humans and Old-World primates.

While the genome and transcriptome of an individual cell may be deciphered in part through the analysis of the DNA and RNA, because of the ability to amplify these molecules enzymatically, the complete glycome of a single cell is not knowable at present. An adult human may contain ~37 trillion cells, not including the microbiota (325,326,327), and there may be 200–500 defined major cell types. Thus, while the genome is shared by all these cells, the glycomes are likely to be decidedly different. Most analyses of the human glycome have focused on select biosynthetic pathways from a genetic analysis (315). A vast number of studies on glycan structures have focused on glycomolecules that are soluble. When membrane glycoproteins have been studied, such as for the large family of integrins, cadherins, and protocadherins, the number of different N- and O-glycans and their diversity is astonishing (328, 329). Soluble glycomolecules include serum glycoproteins, where the is often a focus on N-glycans (330), and the free glycans in human milk (323, 331, 332). Multiple studies have also been done on cultured human cells; examples include the N-glycans in glycoproteins of HeLa cells, which represent cervical carcinoma cells (333), human embryonic kidney (HEK) cells (334), and human colorectal cell lines (335). There have also been studies on N- and O-glycomes in a few human organs or their components, e.g., brain (336,337,338,339,340), liver (341), and intestine (342).

While limited and not really systematically performed, these studies all indicate that the glycomes of human cells and organs are varied and reveal unique differences and some similarities, both quantitatively and qualitatively. Moreover, the glycoproteomes of cells and organs, i.e. the glycoproteins carrying the glycans, are also significantly different in all cell lines and organs examined. While important strides in this area are developing (343, 344), most studies are generally incomplete, due to the limitations of the analytical tools available, and the typical focus of most studies on the most abundant, identifiable glycans. There is also a limited diversity of standards available for glycans, and certainly those with unique modifications, e.g., sulfation or phosphorylation, which complicates the rigor of analyses. Studies also seldom compare all the glycans in all classes in one individual or human donor tissue to another. Furthermore, glycomics is often focused on glycans obtainable by enzyme release, e.g., PNGase F, but enzymes to release all types of glycans, e.g., O-linked GalNAc, O-linked Man, O-linked Fuc, and unusual modified amino acids mentioned above, are not available. It has been estimated that the human glycome may comprise many thousands of terminal sequences recognizable as determinants by antibodies and lectins (320). The human genome also encodes many types of GBPs that recognize microbial glycans and can distinguish human from microbial glycans (345,346,347). Such limitations of studies on the human glycome are not unique, however, as they apply to any animal glycome, and thus it is fair to say we know little overall about the total glycome of any animal.

Individual variations in the human glycome

There are many challenges to identifying the total human glycome, including technical and accessibility issues (147). Even with our limited knowledge, however, it is clear that there are age-, gender-, and disease-specific differences in the glycome, as seen, for example, for the serum N-glycome (348, 349), as well as significant differences in the human milk glycome from one individual to another (350), and from one mammalian species to another (323, 351). Such major differences in milk oligosaccharides between mammals highlights the importance of glycan structure as being a key feature of glycan evolution.

Variations in the human glycome may arise from the differential expression of so-called glycogenes, for which there may be >700 (147, 352). These are genes encoding enzymes and other proteins that are involved in biosynthesis, turnover, and elaboration of glycans of all classes; this would also include their polymorphic alleles and splice forms (353). However, there are likely to be many additional genes encoding modifiers, i.e., pathways altering the glycan structure or expression independently of creating a glycosidic bond, e.g., unusual acetyltransferases, methylases, sulfotransferases, kinases, isomerases, etc. (354). These lists do not include the many hundreds of genes encoding proteins that bind to specific glycans, which could also influence the diversity and expression of glycans. The constellation of genetic factors that dictate the eventual glycomolecule production both qualitatively and quantitatively in humans is being studied, as has been done for the genetic regulation of plasma protein glycosylation (355); however, it is difficult to imagine in the near future gaining comprehensive knowledge about all the structures of glycomolecules in all cells and tissues. Additionally, there is strong evidence that microRNAs (miRNA, miR), which are non-coding RNAs, play important roles in regulating protein glycosylation and the overall glycome (356, 357). There are also interesting aspects of monosaccharide metabolism relative glycomolecule production, as seen for the unusual sugar 2-keto-3-deoxy-d-glycero-d-galacto-2-nonulosonic acid (KDN) (358). KDN is a component of glycomolecules in cold-blooded vertebrates but not yet found in those of humans, although it is found in human plasma and tissues, and may serve to help regulate mannose metabolism. By contrast, Neu5Gc, which humans cannot synthesize, is found in some human glycomolecules through consumption of Neu5Gc in the diet and uptake by cells, and its presence induces anti-Neu5Gc antibodies (359, 360). Thus, there are continuing uncertainties about even the entire monosaccharide constellation of human glycans.

It may be concluded that studies on human glycans reveal that many glycans have structures clearly different even from other primates, and certainly different between humans compared to those in non-primates (273, 361) (Figure 1). The results suggest that human evolution is associated with discreet changes and diversity in the glycome. However, while humans have a highly diverse set of pathways for generating glycomolecules (362), most of these are conceptually similar to those in other primates and even most higher vertebrates.

Human glycome and ABO blood groups

Among the hundreds of human glycosyltransferase genes are also many polymorphic alleles (363). One of the best examples of polymorphism is the most ancient and best studied genetic feature of human glycosylation, which is the ABO blood group system. It represents carbohydrate antigens on red blood cells (RBCs) and some other cells, including intestinal cells. These antigens are comprised of Gal, GalNAc, GlcNAc, and Fuc monosaccharides in unique linkages that define each antigen (364). While the ABO blood group is relatively common among primates, there are tremendous variations in expression, and there are many hundreds of polymorphic alleles involved in ABO generation. There is also evidence that diversity in the ABO system is driven by the microbiome (324), which may be a version of host-invader interactions. This is interesting, as there is growing evidence that microbial expression of ABO-like antigens led to the generation of anti-ABO antibodies in cognate individuals; these are restricted, however, to recognize the non-expressed antigens in an individual (365). In addition, an ancient class of GBPs in metazoans is galectins, which interestingly can recognize blood group antigens and kill microbes expressing them in a type of host vs invader response (366). Thus, it is possible that the microbiome of an individual may not only shape the antibody repertoire in that person, but also shape the types of glycans expressed in RBCs, and the intestinal tract and other associated cells.

Even in regard to this simple issue of the ABO-containing glycans, we know little about the ABO glycome of humans or of any particular individual regarding the exact structures, either quantitatively or qualitatively, made by that individual compared to another, despite strong evidence of multiple disease susceptibilities associated with ABO blood group status (367). Only recently have studies provided insight into the N-glycans on RBC glycoproteins (368), and remarkably individual differences were observed between donors with different blood types. Humans also vary in many other blood types, often only thought of in regard to transfusion medicine, and these include the Lewis, P blood group, and the just recently described Forssman blood group (317, 369). One could easily imagine that no two individuals would express exactly the same amounts or structures of all their blood group-related glycans and especially the constellation of all blood group antigens. Thus, in some ways the total glycome, if it could be known, might be as unique to an individual as the genome or fingerprints.

Therefore, many questions remain about the human glycome. We can group these types of questions using the ABO system, but they can apply to any specific aspect of the human glycome. As there are antibodies to many glycan determinants in the blood group systems, such tools could be used to aid in answering these questions. What glycans express ABO determinants in the human body? Where are they located? Which glycolipids, GAGs, and glycoproteins carry these? How do they vary between people? Do they change over time and with age? How does the ABO glycome of a person impact health and disease? Such simple questions could be asked about any cell in the body and any class of glycans. Unfortunately, the availability of multiple human materials for structural analyses is limited and the technologies do not yet scale sufficiently to adequately and perhaps reliably enough to address the complete human glycome (370), at least compared to the technologies useful for studies on the human genome and proteome. For example, consider a model organism that is well studied and exploited in immunology and in all types of questions in mammalian biology, the laboratory mouse Mus musculus. While we know its genome well, we know very little about its total glycome, though some systematic studies are underway (371,372,373,374), which, not surprisingly, are revealing major differences between the glycomes of each mouse organ, and mouse glycans compared to human glycans. Even different strains of mice can differ in certain aspects of their N-glycosylation of Fc domain of their IgG (375).

Constellation of evolutionary factors combining to generate glycomolecule diversity

The above considerations infer that there are many forces at play in the evolution of glycomolecules and that there are clearly major differences in glycomes in all unicellular organisms and all metazoans. Some of the major roles of glycomolecules in these ancient and ongoing processes leading to evolutionary change among organisms are summarized below.

Some molecular considerations driving the evolution of glycomolecules

Glycosylation and sugar production is universal and appear at the earliest stages of macromolecule evolution.

Glycosylation enhances diversity of protein modifications and functions independently of the genetic code.

Carbohydrates with their multiple –OH and –NH2 groups and their nature as aldoses and ketoses permit multiple modifications and branching to generate large spatial structures.

Glycomolecules coat the surfaces of all cells, providing a plethora of biological capabilities and potential interactions, in which changes in the glycocalyx, independently of surface protein expression per se, can lead to changes in cellular interactions with other cells and their microenvironment.

Glycosylation of surface molecules participate in host-invader dynamics and provide survival advantages to host, as glycans are recognized by GBPs, toxins, and antibodies.

Glycomolecules can function as intercellular adhesion molecules and comprise the ECM of multicellular organisms.

Acquired functions for glycans is likely associated with changes in their structures and increasing complexity.

For microbes, glycomolecules make up their biofilms and provide both protection and promote microbial ecology.

Glycans are ancient forms of nutrient storage.

Glyco components on a single species of protein or lipid can vary tremendously, giving rise to glycoforms that greatly multiplies the information content of a single encoded gene product.

Glycomolecules, such as chitin and cellulose, are essentially insoluble and provide a structural component to organisms that is both easy to generate without a template and modular, i.e. size distributions can be enormous.

The anomeric nature of monosaccharide linkages, α or β, provides a novel secondary structural feature of glycans that change the biological functions of monosaccharides.

Many monosaccharides, being already relatively oxidized, are rather stable to heat and light and even wide pH ranges, comparable to other building blocks.

Carbohydrates readily participate in H-bonds and can both promote intramolecular structure, extramolecular associations, and promote solubility of glycomolecules that might otherwise be relatively insoluble.

The compartmentalization of glycomolecules and their biosynthesis within eukaryotic cells promotes novel biosynthetic pathways and allows control over assembly, storage, and release.

Many enzymes and proteins involved in glycomolecule metabolism engage different metals as key cofactors, and potential variations in availability of biometals regulates glycomolecule generation and enzyme activities (376).

Conclusion

Understanding evolution is unimaginable without considering the emergence of carbohydrates and their key roles in structure and metabolism. Carbohydrates became essential components of all types of glycomolecules beyond the nucleic acids, as well as being central players in metabolism. The elaboration of glycomolecules beyond simple sugars and monosaccharides exploited the many advantages of carbohydrates over other types of building blocks, including their inherent multivalency due to multiple functional groups, solubility, ease of biosynthesis and metabolism, the low energy considerations of the glycosidic linkage compared to other bonds, e.g., peptide bonds, and the non-template dependence on glycan generation and modulation. The evolutionary factors involving competition between host-invader interactions and the emergence of the diverse functions of glycans associated with unique biological pathways are among several drivers of diversity of glycomolecules in viruses and all living cells. Emerging information about the glycogenomes of organisms, the genetic regulation of glycosylation, and the unique glycomes of each organism, is contributing to our enlightenment about the complex evolution of glycan structure and function. However, as the glycome cannot yet be directly predicted from the genome, new tools and analytical technologies are likely to further illuminate how we think about glyco-evolution and the dynamic landscape of glycan expression, recognition, and function in all life forms.

eISSN:
2719-8634
Lingua:
Inglese
Frequenza di pubblicazione:
Volume Open
Argomenti della rivista:
Chemistry, Biochemistry, Life Sciences, Evolutionary Biology, Philosophy, History of Philosophy, other, Physics, Astronomy and Astrophysics