Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
1 | P a g e
Structure and function carbohydrates
Subtopics
Definition and classification of carbohydrates
Monosaccharides: classification, physical and chemical properties
Disaccharides: maltose, lactose, sucrose,
Oligosaccharides and their nomenclatures
Polysaccharides: starch, glycogen and cellulose
Glycoproteins: simple glycoproteins, proteoglycans and mucins
2 | P a g e
Introduction to carbohydrates
Overview
Carbohydrates or saccharides (Greek: sakcharon, sugars) are polyhydroxy aldehydes or ketones, or
substances that yield such compounds upon hydrolysis. They are carbon-based biomolecules containing
an aldehyde or ketone functional group and two or more hydroxyl groups. Carbohydrates are the most
abundant class of biomolecules on Earth. The name carbohydrate, which literally means “carbon
hydrate,” stems from the empirical formula for many carbohydrates, which is roughly (C∙H2O) n, where n
≥ 3. Some carbohydrates contain nitrogen, phosphorus, or sulfur. There are three major size classes of
carbohydrates. These are monosaccharides, oligosaccharides, and polysaccharides.
Monosaccharides are the simplest building blocks of carbohydrates; consist of a single polyhydroxy
aldehyde or ketone unit. They typically contain from three to nine carbon atoms that are bound to
multiple hydroxyl groups. They vary in number of carbon atoms (size) and in the stereo chemical
configuration at one or more carbon centers. They cannot be further hydrolyzed to simpler forms.
Monosaccharides of more than four carbons tend to have cyclic structures. The six-carbon sugar D-
glucose or dextrose is the most abundant monosaccharide in nature. It is produced from CO2 and H2O in
the process of photosynthesis. Many monosaccharides are also synthesized from simpler substances in a
process named gluconeogenesis. Nearly all biological molecules can be generated from the products of
photosynthesis and gluconeogenesis. Monosaccharides are important components of nucleic acids and
complex lipids.
Oligosaccharides are short chains of monosaccharide units, or residues covalently linked together by
characteristic linkages called glycosidic bonds. A few types of monosaccharide units can be joined to
form a large variety of oligosaccharides and polysaccharides. The most abundant oligosaccharides are
the disaccharides, with two monosaccharide units. Sucrose (cane sugar), which consists of the six-carbon
sugars D-glucose and D-fructose, is a typical disaccharide. Oligosaccharides are often associated with
proteins (glycoproteins) and lipids (glycolipids) in which they have both structural and regulatory
functions. Glycoproteins and glycolipids are collectively called glycoconjugates.
Polysaccharides are carbohydrate polymers containing 20 or more monosaccharide units. Insoluble
polysaccharides such as cellulose in plants, glycosaminoglycans (GAGs) in animals, and chitin in fungi are
important structural materials. GAGS, large polymers made up of many repeats of dimers, are key
components of cartilage. Thes polymers serve as structural and protective elements in the cell walls of
3 | P a g e
bacteria and plants and in the connective tissues of animals. Others such as starch and glycogen are
storage polysaccharides. Starch is an important nutritional reservoir in plants where as glycogen is in
animals. Still other polysaccharide polymers lubricate skeletal joints. Structural polysaccharides such as
cellulose have linear chains whereas storage polysaccharides such as glycogen and starch have branched
chains. Both glycogen and cellulose consist of recurring units of D-glucose, but they differ in the type of
glycosidic linkage and consequently have strikingly different properties and biological roles.
Carbohydrates are not only the most important sources of energy for all living organisms they are also
the most complex sources of information for molecular recognition. The oxidation of carbohydrates
through metabolic breakdown of monosaccharides is the central energy-yielding pathway of cellular life
forms. All cells are coated in a dense and complex coat of oligosaccharides. The diverse oligosaccharide
structures displayed on cell surfaces are well-suited as sites of interaction between cells and their
environments. Oligosaccharide-containing glycoproteins are important mediators of cell–cell
recognitions such as fertilization, differentiation and the aggregation of cells to form tissues and organs.
Glycoproteins are also implicated in interactions between variety of pathogenic bacteria and viruses and
their host cells. They are the basis of human blood groups.
A key property of carbohydrates is the possibility of attaining tremendous structural complexity and
diversity. The branched structures of oligosaccharides greatly increase their complexity and diversity.
The sheer diversity and complexity of oligosaccharides and polysaccharides suggest that carbohydrates
are information-rich molecules. They can augment the already immense diversity of proteins. Cell-
surface oligosaccharides are important for intercellular communications but not for intracellular
housekeeping function. Besides, secreted proteins are often glycoproteins that are extensively
decorated with oligosaccharides essential to their structures and functions. Oligosaccharides covalently
attached to proteins or lipids act as signals that participate in recognition and adhesion between cells.
The intracellular location or metabolic fate of these hybrid molecules, glycoconjugates, is determined by
carbohydrates.
Glycobiology
Glycobiology is the field of study that characterizes the structure and function of carbohydrates and
their conjugates. It deals with the synthesis and degradation as well as how carbohydrates are attached
to and recognized by other molecules such as proteins. Glycome refers to all of the carbohydrates and
carbohydrate-associated molecules that cells produce. It complements genomics (for DNA) and
4 | P a g e
proteomics (for proteins). It is clear that the glycome is very dynamic. It varies depending on, the type of
species, the type of cellular or environmental conditions.
The branched structure of oligosaccharides greatly increases their complexity and hence the difficulty in
determining their sequences. The complexity of an organism’s glycome greatly exceeds that of its
proteome due to the diversity the glycome’s constituent monosaccharides and the number of ways they
can interact with one another and with proteins. Characterization of oligosaccharides is complicated by
microheterogeneity, which often has biological significance. Because the biosynthesis of carbohydrates
is not under direct genetic control, there is currently no method for amplifying them. Methods for
synthesizing specific oligosaccharides have been hampered by extensive branching of oligosaccharides,
their large number of functional groups that must be differentially protected during elongation
reactions, and the chiral nature of glycosidic bond. Thus, the only way of obtaining sufficient quantities
of a particular polysaccharide was to isolate it from natural sources.
Carbohydrate microarray is the use of a particular fluorescently labeled prey (protein, RNA, or cell type)
and several thousand different oligosaccharides (baits) that are covalently or physically immobilized at
specific sites on a solid surface (glass slide) to identify the carbohydrates that specifically bind to a
particular prey. The mixture is incubated, rinsed, and subsequently the oligosaccharides to which the
prey binds are identified by the fluorescence at their corresponding positions. Carbohydrate microarrays
have been employed in both basic and applied research including disease diagnosis and the
development of carbohydrate based drugs and vaccines.
Monosaccharides
Monosaccharides are derivatives of either aldehydes or ketones of straight-chain polyhydroxy alcohols
containing at least three carbon atoms. They are colorless, sweet and crystalline solids that are freely
soluble in water but insoluble in nonpolar solvents. Their backbones are unbranched and all the carbon
atoms are linked by single bonds. The carbons of a monosaccharide are numbered beginning at the end
of the chain nearest to the carbonyl group. Many of the hydroxyl groups are attached to chiral carbon
centers generating several sugar stereoisomers found in nature. There are two families of
monosaccharides based on the chemical nature of their carbonyl group. Aldoses are monosaccharides in
which the carbonyl group is at an end of the carbon chain and hence have aldehyde functional group.
The smallest aldose is glyceraldehyde (GAL) with D-and L-forms. Ketoses are monosaccharides which
have a ketone functional group. The smallest ketose is dihydroxyacetone (DHA).
5 | P a g e
Classification
Monosaccharides are classified according to the number of their carbon atoms. The most common
monosaccharides with three, four, five, six, and seven carbon atoms in their backbones are called,
respectively, trioses, tetroses, pentoses, hexoses, and heptoses. The hexoses are the most common
monosaccharides in nature. The terms indicating the two families can be combined with those terms
indicating the number of carbon atoms. For example glucose is an aldohexose whereas ribulose is a
ketopentose.
Configuration and conformation
Configuration refers to the covalent binding pattern in a molecule and it is constant for a molecule.
Configuration is static; the primary structure is the configuration. Conformation refers to the three-
dimensional structure of a molecule and it depends on molecular environment (temperature, pH,
dielectric constant, ionic strength and interaction with other molecules). Conformation is dynamic; there
is always a range of different structures that the molecules sample at equilibrium. Secondary, tertiary
and quaternary structures are conformations. Two conformations of a molecule can be interconverted
without the breakage of covalent bonds whereas two configurations can be interconverted only by
breaking covalent bond.
Stereoisomers
Isomers are two or more molecules that have the same molecular formula but different structures. They
have different physical and chemical properties. Constitutional isomers differ in the order of attachment
of atoms. For example, DHA and GAL are constitutional isomers. On the other hand, stereoisomers have
atoms that are connected in the same order but differ in spatial arrangement. Since stereoisomers are
optically active, they are optical isomers. All monosaccharides except dihydroxyacetone contain one or
more asymmetric (chiral) carbon centers. Therefore, they exist in a variety of stereochemical forms. The
molecular formula of aldohexoses indicates that all but two of its carbon atoms, C1 and C6, are chiral
centers. In general, n-carbon aldoses have n-2 chiral centers and 2n-2 stereoisomers. Glyceraldehyde has
three carbons, one chiral center and 21 = 2 stereoisomers; aldohexoses have six carbons, four chiral
centers and 24 = 16 stereoisomers of which glucose is one such isomer. The position of the carbonyl
group gives keto sugars one less asymmetric center than their isomeric aldoses. Therefore, n-carbon
ketoses have n-3 chiral centers and 2n-3 stereoisomers. The most common ketoses have their ketone
group at C2.
6 | P a g e
By convention, the asymmetric center farthest removed or most distant from the carbonyl group of a
monosaccharide of any carbon-chain length is called reference carbon. The stereoisomers of a
monosaccharide can be divided into two groups that differ in the configuration at the reference chiral
center. Sugars that have the same absolute configuration at the reference carbon as that of D-
glyceraldehyde are designated D isomers, and those with the same configuration as L-glyceraldehyde
are L isomers. Enantiomers are isomers that are non-superimposable mirror images of each other. D-and
L-glucose are enantiomers with respect to C6. Fischer projection formula can be used to represent
three-dimensional sugar structures on paper. When the hydroxyl group on the reference carbon is on
the right in the projection formula, the sugar is the D isomer; when on the left, it is the L isomer. As for
other biomolecules with chiral centers, the absolute configurations of monosaccharides are determined
by X-ray crystallography.
Diastereoisomers are isomers that are not mirror images of each other. Epimers are two sugars that
differ in the configuration at only a single asymmetric carbon atom. Thus, D-glucose and D-mannose are
epimers with respect to C2, whereas D-glucose and D-galactose are epimers with respect to C4. Epimers
are diastereomers. Isomeric forms of monosaccharides that differ only in their configuration about the
hemiacetal or hemiketal carbon atom are called anomers. The hemiacetal (or carbonyl) carbon atom is
called the anomeric carbon. Anomers are isomers that differ at a new asymmetric carbon atom formed
upon ring closure.
D-sugars are biologically much more abundant than L-sugars. D-Glucose commonly occurs in nature as a
monosaccharide. Several other aldoses such as D-GAL, D-ribose, D-mannose, and D-galactose are
important components of larger biological molecules. The four- and five-carbon ketoses are designated
by inserting “ul” into the name of a corresponding aldose. The most abundant ketohexose is D-fructose
(Latin: fructus, fruit). Fructose is commonly used as a sweetener and fruits are rich in fructose. Fructose
is converted into glucose derivatives inside the cell. Other biologically prominent ketoses include DHA,
7 | P a g e
D-ribulose, D-xylulose and D-sorbose. Sorbus is the genus of mountain ash which has berries rich in the
related sugar alcohol sorbitol. Some sugars such as L-arabinose occur naturally in their L forms.
Table of aldose sugars
Carbon atoms Group name Examples with suffice of -ose
3 Trioses Glyceraldehyde
4 Tetroses Erythrose, Threose
5 Pentoses Ribose (Rib), Arabinose (Ara), Xylose (Xyl), Lyxose (Lyx)
6 Hexoses Allose, Altrose, Glucose (Glc), Mannose (Man),
Gulose, Idose, Galactose (Gal), Talose,
7 Heptoses
Simple sugars serve not only as fuel molecules but also as fundamental constituents of biological
macromolecules. Glucose is an essential energy source for virtually all forms of life. RNA has a backbone
consisting of alternating phosphoryl groups and D-ribose, a cyclic five-carbon sugar. D-deoxyribose is the
monosaccharide component of DNA.
Table of ketose sugars
Carbon atoms Group name Examples (insertion of –ul-before suffix)
3 Trioses Dihydroxyacetone
4 Tetroses Erythrulose
5 Pentoses Ribulose, xyluose
6 Hexoses Psicose, Fructose, Sorbose, tagatose,
7 Heptoses
Chemical properties
Monosaccharides can chemically reactive molecules. The chemistry of monosaccharides is largely the
function of their carbonyl and hydroxyl groups. The three most common reaction partners are alcohols,
phosphates, and amines. Monosaccharides can also undergo oxidation-reduction reactions. The
8 | P a g e
carbonyl groups of monosaccharides can be oxidized to aldonic and glycuronic acids whereas the
hydroxyl groups can be reduced to alditols.
Reaction with alcohols
Reaction of monosaccharides with alcohol functional groups is the chemical basis for the formation of
cyclic sugar structures and polymerization of monosaccharides by glycosidic bond. The carbonyl groups
of aldehydes and ketones react with alcohols in a 1:1 ratio to yield hemiacetals and hemiketals,
respectively. Likewise, the aldehyde or the ketone functional groups of monosaccharides can react
intramolecularly with hydroxyl groups along the chain to form cyclic intramolecular hemiacetals and
hemiketals. This is the chemical basis for the spontaneous conversion of the open-chain form of
monosaccharides into ring or cyclic form.
Cyclic structures
Many common monosaccharide units occur predominantly as cyclic structures in aqueous solutions and
inside the cells. For an aldohexose such as D-glucose, the C1 aldehyde in the open-chain form reacts
with the C5 hydroxyl group to form an intramolecular hemiacetal. The resulting cyclic, six-membered
ring is called D-glucopyranose because of it resembles the smallest six-membered compound pyran.
Only aldoses having five or more carbon atoms can form pyranose rings. For ketohexose such as
fructose, the C2 keto group in the open-chain form reacts with either the C6 or C5 hydroxyl group to
form a six or five-membered cyclic ring. D-Fructose readily forms the the five-membered D-
fructofuranose ring in analogy with furan.
The formation of cyclic or ring form of a monosaccharide renders the former carbonyl carbon
asymmetric generating yet another type of diastereoisomeric pairs. The creation of a new chiral center
adds further stereochemical complexity to this class of compounds. The resulting pairs of diastereomers,
designated as α- and β-forms, are known as anomers. In this designation, α means that the hydroxyl
substituent to the anomeric carbon is on the opposite side of the sugar ring from the chiral center that
designates the D or L configuration whereas β means that the hydroxyl group is on the same side of the
ring as C6. The Haworth perspective formula is a very convenient representation of the configurations of
9 | P a g e
the substituents to each carbon atom and the stereochemistry of the cyclic form of a monosaccharide.
In these depictions, the carbon atoms in the ring are not written out. The plane of the ring is nearly
perpendicular to the plane of the paper and heavy lines on the ring show projections toward the reader.
The biological properties and functions of monosaccharide units are determined by their specific three-
dimensional conformations.
Hexoses and pentoses may each assume pyranose or furanose forms. The stabilities of the five- and six-
membered rings of these sugars are so great that their cyclic forms predominate in aqueous solution
and inside the cell. The linear forms are normally present in only minute amounts. The six-membered
aldopyranose ring is much more stable than the aldofuranose ring and predominates in aldohexose
solutions. Glucose almost exclusively assumes its D-glucopyranose form in aqueous solutions. In glucose,
C1 (the carbonyl carbon atom in the open-chain form), which becomes a new asymmetric center, is
called anomeric carbon atom. The two anomeric forms are α-D-glucopyranose and β-D-glucopyranose.
An equilibrium mixture of D-glucose contains approximately one-third α-anomer, two-thirds β-anomer,
and, 1% of the open chain form. The two anomeric forms can be interconverted in aqueous solution
only by passing via the open-chain form of glucose by a process called mutarotation.
Fructose forms both six-membered pyranose and five-membered furanose rings. In fructose, C2 (the
carbonyl carbon atom in the open-chain form), which becomes a new asymmetric center, is the
anomeric carbon atom. Fructose that is free in solution is 67% D-fructopyranose and 33% D-
fructofuranose. In each case, both α and β anomers are possible. The β-D-fructopyranose form that
predominates in aqueous solution is one of the sweetest chemicals known. It is commonly used as
sweetener. It is found in honey and high fructose corn syrup. Heating converts β-D-fructopyranose form
into the β-D-fructofuranose form, reducing the sweetness of the solution. That is why corn syrup with a
high concentration of fructose in the β-D-fructopyranose form is used as a sweetener in cold, but not,
hot drinks. Excessive fructose consumption leads to fatty liver, insulin insensitivity and obesity. On the
other hand, the furanose form predominates in many fructose derivatives. The most common anomer
of fructose in combined forms or in derivatives is β-D-fructofuranose. Similarly, the pyranose form of
ribose predominates in aqueous solution whereas its furanose form predominates in many ribose
derivatives such as DNA and RNA. Ribose that is free in solution is 75% pyranose and 25% furanose.
The use of Haworth projections may lead to the erroneous impression that furanose and pyranose rings
are planar. They are not planar because of the sp3-hybridized tetrahedral geometry of their saturated
carbon atoms. The non-polar pyranose ring, like the substituted cyclohexane ring, may adopt two
10 | P a g e
classes of conformations, termed chair and boat because of the resemblance to these objects. The
substituents to the ring on the chair conformer can have two orientations: axial (ax) and equatorial (eq).
Axial groups are nearly perpendicular to the average plane of the ring or parallel to threefold rotational
axis, whereas equatorial groups are roughly parallel to this plane. Axial substituents are rather close-
fitting and sterically hinder each other if they emerge on the same side of the ring. In contrast,
equatorial substituents are staggered and therefore minimally crowded. The chair form of β-D-
glucopyranose is more stable because hydrogen atoms occupy the axial positions but the boat form is
disfavored because of steric hindrance. Note that β-D-glucose is the only D-aldohexose that can
simultaneously have all five non-H substituents in the equatorial position.
Like pyranose rings, furanose rings are not planar. They can be puckered so that four atoms are nearly
coplanar and the fifth atom is about 0.5 Å away from this plane. This conformation is called an envelope
form because the structure resembles an opened envelope with the back flap raised. In the ribose
moiety of most biomolecules, either C2 or C3 is out of the plane and on the same side as C5. These
conformations are called C2-endo and C3-endo, respectively. The remaining four atoms lie
approximately in a plane.
11 | P a g e
Glycosidic bonds
Addition of alcohol functional group to monosaccharides produces an acetal or ketal. When the alcohol
functional group is part of another sugar molecule, the bond produced is called glycosidic bond. If it
connects the anomeric carbon atom of a carbohydrate to the oxygen atom of an alcohol, it is called O-
glycosidic bond. In an acid catalyzed condensation, the anomeric hydroxyl substituent of a sugar
reversibly reacts with alcohols to form α-and β-glycosidic bond. This reaction represents the formation
of an acetal from a hemiacetal and an alcohol (a hydroxyl group). Individual monosaccharide units are
held together by O-glycosidic bonds to form long polysaccharide polymers. Therefore, glycosidic bond is
the carbohydrate analog of the peptide bond in proteins. Some oligosaccharides are linked to proteins
by O-glycosidic bond.
If a bond joins the anomeric carbon atom of a sugar to a nitrogen atom in glycoproteins and nucloetides,
it is called N-glycosidic bond. N-glycosidic bond is responsible for linking ribose residues to nitrogenous
bases in nucleosides. The reversal of this reaction is hydrolysis—attack by H2O on the glycosidic bond.
Glycosidic bonds are readily hydrolyzed by acid but resist cleavage by base. Disaccharides can be
hydrolyzed to yield their free monosaccharide components by boiling with dilute acid. Glycosidases
catalyze the hydrolysis of glycosidic bonds. Glycosidic bonds do not undergo mutarotation. Glycosidases
differ in specificity according to the identity and anomeric configuration of the glycoside.
Reactions with phosphates and amines
The biochemical properties of monosaccharides can be modified by reaction with other molecules such
as alcohols, amines and phosphates. Monosaccharides are modified by alcohols and amines through the
formation of glycosidic bonds. On the other hand, they are modified by phosphates through the
formation of phosphate ester bonds. Monosaccharides can also be modified by the addition of
functional groups to carbons other than the anomeric carbon. These modifications increase the
biochemical versatility of carbohydrates, enabling them to serve as signal molecules or facilitating their
metabolism. Monosaccharides in which a hydroxyl group in the parent compound is replaced with other
substituents are called sugar derivatives. Living organisms contain a wide variety of biologically
important sugar derivatives.
Reactions with phosphates
12 | P a g e
Monosaccharide units in which a phosphoryl group is transferred from ATP to a hydroxyl group of the
sugar are known as phosphorylated sugars. Phosphorylation is a common modification of sugars.
Phosphorylated sugars are critical intermediates in energy generation and biosynthesis. Phosphorylated
sugars are so reactive intermediates that they will more readily undergo metabolism. The conversion of
glucose into glucose 6-phosphate is an example of metabolite activation. Besides, several multiply
phosphorylated derivatives of ribose play key roles in the biosynthesis of purine and pyrimidine
nucleotides. Phosphorylation not only activates sugars for subsequent chemical transformation but it is
also used as a regulatory mechanism. Phosphorylation makes sugars anionic at neutral pH which traps
the phosphorylated sugar inside the cell. Phosphorylated sugars are prevented from spontaneously
leaving the cell by crossing lipid bilayer membranes. Phosphorylated sugars are unable to interact with
transporters of the unmodified sugars. Most cells do not have membrane transporters for
phosphorylated sugars.
Reactions with amines
Monosaccharide units in which one or more hydroxyl groups are replaced by an often acetylated amino
group are called amino sugars. Substitution of OH in the parent monosaccharide for NH2 group produces
an amino sugar. The hydroxyl group at C2 of the parent compound is replaced with an amino group in α-
D-glucosamine (2-amino-2-deoxy-α-D-glucopyranose), α-D-galactosamine (2-amino-2-deoxy-α-D-
galactopyranose), and D-mannosamine. D-glucosamine and D-galactosamine are critical components of
numerous biologically important polysaccharides. The amino group is nearly always condensed with
acetic acid, as in N-acetylglucosamine (GlcNAc, NAG), N-acetylgalactosamine (GalNAc), N-acetylmuramic
acid (Mur2Ac, NAM) and N-acetylneuraminic acid (Neu5Ac, NANA) which is also known as sialic acid
(Sia). NAM is consisted of D-lactic acid (a three-carbon carboxylic acid) ether linked to the oxygen at C3
of NAG. NANA is derived from N-acetylmannosamine and pyruvic acid. NAG and NAM are prominent
component of peptidoglycan bacterial cell walls. N-acetylgalactosamine is the carbohydrate moiety
bound to the protein in mucins. NANA is an important constituent of glycoproteins and glycolipid.
Oxidation–reduction reactions
Mild oxidation of the carbonyl (aldehyde) carbon of aldoses into carboxylic acids, either chemically or
enzymatically, yields aldonic acids. The systematic name of aldonic acids involves appending the suffix -
onic acid to the root name of the parent aldose. Example is D-gluconic acid. The hydroxyl group attached
to the carbon atom at the other end of the backbone chain such as C6 of glucose is called primary
13 | P a g e
alcohol. Oxidation of the primary alcohol groups at the reference carbon of aldoses yields uronic acids,
which are named by appending -uronic acid to the root name of the parent aldose. D-Glucuronic acid, D-
galacturonic acid and D-mannuronic acid are important components of many polysaccharides. At
physiologically important pH, the carboxylic acid groups of the acidic sugar derivatives are ionized. These
acidic sugars contain a carboxylate group, which confers a negative charge at neutral pH. Therefore, the
resulting compounds are also named as the carboxylates—glucuronate, galacturonate, and so forth.
Both aldonic and uronic acids form lactones. Lactones are stable intramolecular esters or cyclic esters.
D-Glucono-δ-lactone is formed by an ester linkage between the C1 carboxylate group and the C5 (also
known as the δ-carbon) hydroxyl group of D-gluconate.
Mild reduction of the carbonyl carbon of aldoses and ketoses into hydroxyl groups, either chemically by
treatment with NaBH4 or enzymatically, yields acyclic polyhydroxy alcohols known as alditols. The
systematic naming of alditols involves appending the suffix -itol to the root name of the parent aldose.
Ribitol is a component of flavin coenzymes FMN and FAD. Glycerol and the cyclic polyhydroxy alcohol
myo-inositol are important lipid components. Xylitol is a sweetener that is used in “sugarless” gum and
candies. Either L-sorbitol or D-glucitol can be produced by reduction of D-glucose.
Reducing sugars
Saccharides bearing free carbonyl carbon atoms that are capable of reducing relatively mild oxidizing
agents such as silver (Ag+), ferric (Fe3+) or cupric (Cu2+) ion are called reducing sugars. In the hemiacetal
(ring) form the carbonyl carbon (aldehyde or ketone) of reducing sugars cannot be oxidized to a carboxyl
group (aldonic acid). Oxidation of the anomeric carbon of the cyclic form which exists in equilibrium with
the linear form occurs only in the open chain form. This property is the basis of Tollens’ reaction, a
classic test for the presence of a reducing sugar. The reduction of Ag+ in an ammonia solution (Tollens’
reagent) yields a metallic silver mirror lining on the inside of the reaction vessel. Another qualitative test
for the presence of a reducing sugar is Fehling’s reaction. Reducing sugars reduce cupric ion (Cu2+) in
Fehling’s solution into cuprous ion (Cu+) while being oxidized to aldonic acid. The cuprous ion (Cu+)
produced under alkaline conditions forms a red cuprous oxide precipitate.
Reducing sugars can often nonspecifically react with a free amino group of proteins to form a stable
covalent bond. These modifcations, known as advanced glycation end products (AGE), have been
implicated in aging, arteriosclerosis, and diabetes, as well as other pathological conditions. Glucose
reacts with hemoglobin to form glycosylated hemoglobin. Monitoring changes in the amount of
14 | P a g e
glycosylated hemoglobin is an especially useful means of assessing the effectiveness of treatments for
diabetes mellitus. Blood glucose concentration is commonly determined by measuring the amount of
H2O2 produced in the reaction catalyzed by glucose oxidase. In the reaction mixture, a second enzyme,
peroxidase, catalyzes reaction of the H2O2 with a colorless compound to produce a colored compound,
the amount of which is then measured spectrophotometrically.
Deoxy sugars
Monosaccharide units in which an OH group is replaced by H are known as deoxy sugars. Deoxy sugars
are found in plant polysaccharides and in the complex oligosaccharide components of glycoproteins and
glycolipids. The most important deoxy sugar is β-D-2-deoxyribose. It is the sugar part of the sugar–
phosphate backbone of DNA. L-Rhamnose (6-deoxy-L-mannose) and L-fucose (6-deoxy-L-galactose, Fuc)
are widely occurring components of biologically important polysaccharides. Note that most deoxy
sugars occur in nature as the L isomers.
Oligosaccharides
Oligosaccharides are short polymers of monosaccharide units that are often attached to proteins or
lipids at the cell surface. They have both structural and regulatory functions. Some oligosaccharides,
with six or more different sugars connected in branched chains, carry information. They functions as
specific cellular signals and biological markers to provide highly specific points of recognition in many
cellular processes. Disaccharides such as maltose, lactose, and sucrose are the smallest and most
abundant oligosaccharides. They are formed from two monosaccharides joined covalently by an O-
glycosidic bond. An O-glycosidic bond is formed when an alcohol (OH) of one monosaccharide
condenses with the intramolecular hemiacetal anomeric carbon of another monosaccharide with
elimination of H2O.
Nomenclature
A systematic name is required to describe complex oligosaccharides unambiguously. It involves
identifying all the component monosaccharides, specifying their anomeric forms, their ring types and
how they are linked together. By convention, the first and last monosaccharide units are located at the
left (nonreducing end) and the right (reducing end) respectively. The end of an oligosaccharide chain
with a free anomeric carbon (one not involved in a glycosidic bond) is called the reducing end and it
15 | P a g e
contains the reducing residue. Three-letter abbreviations for the monosaccharides are often used in
systematic names.
The three-letter codes for the common sugars include ribose (Rib), xylose (Xyl), arabinose (Ara), glucose
(Glc), mannose (Man), galactose (Gal), rhamnose (Rha), fructose (Fru), fucose (Fuc) and abequose (Abe).
Sugar derivatives can follow the same nomenclature. Acids can be described as glucuronic acid (GlcA),
iduronic acid (IdoA), muramic acid (Mur) and neuraminic acid (Neu). Simple amines have identifying N
such as glucosamine (GlcN) and galactosamine (GalN). On the other hand, acetylated amines can be
described as N-acetylglucosamine (GlcNAc), N-acetylgalactosamine (GalNAc) N-acetylmuramic acid
(Mur2Ac) and N-acetylneuraminic acid (Neu5A).
There are four useful conventional rules to describe the systematic name a complex oligosaccharide.
First, name the configuration of the anomeric carbon of the nonreducing residue. This is specified by
either α- or β-form. Second, name the first monosaccharide unit and its ring type using the prefix
“furano” or “pyrano” to distinguish five- and six-membered rings. Third, name the carbon atoms
involved in the glycosidic bond linking together the first and the second monosaccharide units. Indicate
in parentheses using an arrow to connect the two numbers for the carbon atoms. For example, (1→4)
indicates that C1 of the first-named sugar residue is joined to C4 of the second. Fourth, name the second
monosaccharide unit following the same rules as before.
The most common glucosyl–glucose disaccharides include maltose [O-α-D-glucopyranosyl-(1→4)-α-D-
glucopyranose], isomaltose or dextrin [O-α-D-glucopyranosyl-(1→6)-α-D-glucopyranose, trehalose [O-α-
D-glucopyranosyl-(1→1)-α-D-glucopyranoside and cellobiose [O-β-D-glucopyranosyl-(1→4)-β-D-
glucopyranose]. Isomaltose or limit dextrin is the material fragments of amylopectin not easily digestible
because of α 1→6 branch point. It is hydrolyzed by α-dextrinase or debranching enzyme. Trehalose Glc
(α1↔ α1) Glc is nonreducing sugar. It is a major constituent of the circulating fluid (hemolymph) of
insects, serving as an energy-storage compound.
Maltose
Maltose is a disaccharide formed from two molecules of D-glucose joined by a glycosidic bond between
C1 (the anomeric carbon) of one glucose residue and C4 of the other. The configuration of the anomeric
carbon atom forming the glycosidic bond is α. Hence, the abbreviated systematic name for maltose is
Glc (α1→4) Glc. Since maltose retains a free anomeric carbon (C1) of the second glucose residue on the
16 | P a g e
right, it is a reducing sugar. Maltose is hydrolyzed by maltase, an enzyme found in lysosomes. Maltase is
α-1→4 glucosidase. Enzymatic hydrolysis of starch produces maltose.
Lactose
Lactose or milk sugar is a disaccharide formed from one molecule of D-galactose and one molecule of D-
glucose joined by a glycosidic bond between C1 of galactose residue and C4 of the glucose residue. The
configuration of the anomeric carbon atom forming the glycosidic bond is β. Hence, the systematic
name for lactose is [O-β-D-galactopyranosyl-(1→ 4)-β-D-glucopyranose] or in short Gal (β1→4) Glc. The
free anomeric carbon of its glucose residue makes lactose a reducing sugar. Lactose is hydrolyzed to its
component monosaccharides for absorption into the blood stream by the intestinal enzyme lactase or β-
D-galactosidase. Infants normally express lactase. However, most African and almost all Asian adults
have very low levels of lactase. Consequently, much of the lactose in milk product moves through their
digestive tract to large intestine. Lactose is a good source of energy for bacterial fermentation in colon.
Bacterial fermentation converts lactose into irritating lactic acid and large quantities of CH4, CO2 and H2.
These gases create the problem of flatulence. Lactate draws water by active osmosis resulting diarrhea.
This type of abdominal cramps and diarrhea is called lactose intolerance. Lactase-containing pills and
lactose free milk products are now widely available.
Sucrose
Sucrose or table sugar is a disaccharide of one molecule of D-glucose and one molecule of D-fructose
joined by a glycosidic bond between C1 of the glucose residue and C2 of the fructose residue. The
configuration of the anomeric carbon atom forming the glycosidic bond is α. Hence, the systematic
name for sucrose is [O-α-D-glucopyranosyl-(1→2)-β-D-fructofuranoside] or in short Glc (α1↔2β) Fru.
Sucrose has no free anomeric carbon and hence it is a nonreducing sugar. Nonreducing disaccharides
are named as glycosides. Sucrose can also be represented as Fru (2β↔α1) Glc. Sucrose is the most
abundant disaccharide. It is the major intermediate product of photosynthesis throughout the plant
kingdom. Sucrose is synthesized in the cytoplasm and transported from the leaves to other parts of the
plant body. However, sucrose cannot be synthesized in animals. Sucrose is hydrolyzed to its component
monosaccharides D-glucose and D-fructose by sucrase or β-D-fructofuranosidase (Fru (2β→ α1) Glc).
The hydrolysis of sucrose is accompanied by a change in optical rotation from dextro to levo.
Consequently, hydrolyzed sucrose is sometimes called invert sugar and the enzyme that catalyzes this
process is archaically named invertase.
17 | P a g e
Polysaccharides
Polysaccharides or glycans are large polymers of monosaccharides linked together by glycosidic bonds
which can be made to any of the hydroxyls of a monosaccharide. Unlike proteins and nucleic acids,
polysaccharides can form branched as well as linear polymers. Given the variety of different
monosaccharides that can be put together in any number of arrangements, the number of possible
polysaccharides is huge. Homopolysaccharides and heteropolysaccharides are consisted of only one
type and more than one type of monosaccharide residue, respectively. Homopolysaccharides may be
further classified on the basis of their monomeric unit. Polymers of glucose are called glucans, whereas
polymers of galactose are called galactans.
Storage polysaccharides
Glucose is an important source of energy for virtually all life forms. However, free glucose molecules
cannot be stored to maintain the osmotic balance of the cell. Glucose is rather stored in the form of
readily accessible storage polysaccharides such as starch and glycogen. The storage of glucose as
polysaccharides greatly reduces the large intracellular osmotic pressures. Starch and glycogen are
glucans in which the component glucose molecules are linked together by α-1→4 glycosidic bonds. The
α-1→4 glycosidic bonds are responsible for the hollow helical structures of starch and glycogen. This
compact and accessible hollow helix is well-suited for storage.
Starch
Starch is the nutritional reservoir in plants which is synthesized and deposited in the cytoplasm of plant
cells as insoluble granules composed of α-amylose and amylopectin. The α-amylose form is a linear
polymer of several thousand glucose residues linked by α-(1→4) glycosidic bond. It is the unbranched
form of starch. Amylopectin is the branched form of starch containing about one α-(1→6) branching
point per approximately 27 α-(1→4) linkage. The α-amylose form adopts an irregularly aggregating
helically coiled conformation due to α-(1→4) glycosidic bonds. This left-handed helix has 6 regularly
repeating glucose residues per turn.
Starch is a major source of carbohydrates for animals. It is rapidly hydrolyzed by α-amylase, an enzyme
secreted by the salivary glands and the pancreas. Salivary and pancreatic α-amylases hydrolyze all α-(1→
4) glycosidic bonds of starch except the outermost bonds and those next to branches. Salivary α-amylase
is inactivated by the low pH in the stomach. These enzymes degrade starch to a mixture of the
18 | P a g e
disaccharide maltose, the trisaccharide maltotriose and the oligosaccharide dextrin. Maltotriose
contains three α-(1→ 4) linked glucose residues whereas dextrin contains three α-(1→ 4) linked glucose
residues and at least one α-(1→6) branch. It is the end products of the exhaustive digestion of
amylopectin by α-amylase. Maltose is cleaved into two glucose molecules by maltase. Maltotriose and
other undigested oligosaccharides are digested by α-glucosidase which removes one glucose residue at
a time from oligosaccharides. Dextrin is hydrolyzed by α-dextrinase or debranching enzyme which
hydrolyzes both α-(1→6) and α-(1→4) bonds. These specific enzymes are contained in the intestinal
brush border (the fingerlike microvilli of intestinal epithelial cells). The resulting monosaccharides are
absorbed by the intestinal membrane and transported to the bloodstream.
Glycogen
Glycogen is the major storage form of carbohydrate in animals which is synthesized and deposited in the
cytoplasm of animal cells as cytoplasmic granules of a large, branched polymer of glucose residues. It is
present in all cells but is most prevalent in skeletal muscle and liver cells. Most of the glucose units in
glycogen are linked by α-(1→4) glycosidic bonds. The branches are formed by α-(1→6) glycosidic bonds,
present about once in 12 units. Glycogen has a similar structure as amylopectin but it is more highly
branched. The highly branched structure of glycogen generates many nonreducing ends, permitting the
rapid mobilization of glucose in times of metabolic need. Glycogen phosphorylase phosphorolytically
degrades α-(1→4) bonds in glycogen sequentially inward from nonreducing ends to yield glucose-1-
phosphate. The remaining α-(1→6) branches of glycogen are cleaved by a debranching enzyme.
Structural polysaccharides
Storage polysaccharides such as starch and glycogen are formed by α-(1→4) glycosidic bond. In contrast,
structural polysaccharides such as cellulose and chitin are formed by β-(1→4) linkages. This simple
difference in stereochemistry results in two groups molecules with very different structural properties
and biological functions. Cellulose is the primary structural component of plant cell walls. Chitin is as an
important component of exoskeleton. Chitin is the second most abundant polysaccharide in the
biosphere next to cellulose.
Cellulose
Cellulose is a linear polymer of β (1→4)-linked D-glucose residues found in plant cell walls. Hence, it is an
unbranched glucan. The primary structure of cellulose fiber was determined through methylation
19 | P a g e
analysis whereas its secondary structure was determined by X-ray fiber diffraction. Each of the β (1→4)-
linked successive glucose residue in a chain is flipped 180° with respect to its preceding residue in such a
way that the linear polymer assumes a straight chain, very long and fully extended conformation. Each
strand is held in this very long, thin and fully extended conformation by intrachain hydrogen bonds.
About 36 parallel straight chains arranged in an extended fashion are easily packed into a rigid almost
crystalline cellulose fiber. Parallel strands in each cellulose fibril or cellulose cable interact with one
another through intermolecular hydrogen bonds between glucose units of neighboring chains.
Multiple cellulose fibers line up laterally to form sheets, and these sheets stack vertically. The entire
assembly is stabilized by intramolecular and intermolecular hydrogen bonds. Cellulose is insoluble in
water despite the high hydrophilic nature of the constituent D-glucose due to extensive hydrogen
bonding. This highly cohesive, hydrogen bonded structure gives cellulose fibers exceptionally high
tensile strength suitable for supportive function. The rigid plant cell is able not only to withstand high
osmotic pressure but it also has a load-bearing function.
In plant cell wall, fibrous cellulose structure is cross-linked by other polysaccharides and embedded in an
amorphous cementing matrix containing lignin and pectin. Lignin is a plastic like phenolic polymer
derived from phenylalanine and tyrosine. Pectin is a polygalacturonic acid polysaccharide that gives
tomatoes and other fruits their firmness. It is naturally depolymerized by the enzyme polygalacturonase
(PG). As pectin is hydrolyzed, the tomatoes soften, making rigors of shipping very difficult. Inhibiting PG
can delay ripening facilitating storage and shipment. Cellulose also occurs in the stiff outer mantles of
marine tunicates.
Vertebrates including herbivorous mammals do not possess any enzyme to hydrolyze the β (1→4)-
glycosidic bond. Hence, they are unable to digest wood and vegetable fibers. However, herbivores and
termites contain symbiotic microorganisms that secrete a series of enzymes, collectively known as
cellulases. Since cellulose is tightly packed and the D-glucose units are not easily accessible for enzymes,
the degradation of cellulose is a very slow process. Cellulose and other insoluble fibers can minimize
exposure time to toxins in the diet by increasing the rate at which digestion products pass through the
large intestine. On the other hand, pectin (polygalacturonic acid) and other soluble fibers allow
improved digestion and absorption of nutrients by slowing down the movement of food through the
gastrointestinal tract.
Glycoproteins
20 | P a g e
Glycoproteins are conjugate proteins formed by the covalent attachment of carbohydrate groups to
proteins. Most secreted eukaryotic proteins are glycoproteins including blood proteins such as
antibodies and hormones, milk proteins such as lactalbumin and proteins contained in lysosomes and
some of the proteins secreted by the pancreas such as ribonuclease. Besides, almost all membrane-
associated eukaryotic proteins are glycosylated. Indeed, protein glycosylation is more abundant than all
other types of posttranslational modifications combined. Glycosylation greatly increases the complexity
of the proteome. No generalization can be made about the effects of glycosylation on protein
properties; they must be experimentally determined on a case-by-case basis. In many cases the
functions of the carbohydrate moieties of glycoproteins remain enigmatic.
Nevertheless, it is becoming increasingly evident that oligosaccharides tend to extend from the surfaces
of proteins rather than participate in their internal structures owing to their hydrophilic character. Both
experimental and theoretical studies indicate that oligosaccharides have mobile and rapidly fluctuating
conformations which account for the difficulty in crystallizing glycoproteins. Structures of most
glycoproteins are unaffected by the removal of their associated oligosaccharides. Glycosylation can
affect protein properties in many ways, including protein folding, oligomerization, physical stability,
specific bioactivity, rate of clearance from the bloodstream, and protease resistance.
N-and O-linked glycoproteins
Oligosaccharides in glycoproteins can be linked by glycosidic bond formed either to the amide nitrogen
atom in the side chain of asparagine (termed an N-linkage) or to the oxygen atom in the side chain of
serine or threonine (termed an O-linkage). N-linked glycans are around 5-fold more common than O-
linked glycans. Sequence analyses of glycoproteins showed that the amide nitrogen of an Asn residue is
β-linked to NAG residue of an oligosaccharide only if Asn is part of an Asn-X-Ser or Asn-X-Thr sequence,
where X is any amino acid residue except Pro. N-linked glycans tend to attach to proteins at sequences
that form β-bends. All N-linked oligosaccharides have a distinctive pentasaccharide core containing
branched (Man)3 (NAG)2 residues. The innermost pentasaccharide core serves as a common foundation
for attachment of additional sugars to form a wide variety of N-linked oligosaccharide patterns found in
glycoproteins.
Conversely, O-linked polysaccharides tend to be clustered into segments of polypeptide chains rich in
Ser, Thr, Pro and other helix-breaking residues. Hence, O-linked glycans tend to attach to proteins at
sequences that form intrinsically disordered regions. The carbohydrates’ hydrophilic and steric
21 | P a g e
interactions further stabilize the extended conformations of heavily glycosylated regions. The most
common O-linked attachment involves the disaccharide core β-galactosyl-(1→3)-α-N-
acetylgalactosamine α-linked to the OH group of either Ser or Thr. All other hydroxyl-bearing amino acid
side chains such as Tyr, 5-hydroxy-Lys (Hyl) and 4-hydroxy-Pro (Hyp) occasionally form O-glycosidic
bonds. O-linked glycoproteins often have protective functions.
Synthesis
The protein moieties of glycoproteins are synthesized under genetic control. In contrast, the
carbohydrate moieties are enzymatically synthesized. Endoplasmic reticulum (ER) and Golgi complex
(GC) are the major organelles that play central roles in protein glycosylation and trafficking. The
polypeptide chain, synthesized by ribosomes attached to the cytoplasmic face of the ER, is either passed
into the lumen or inserted into the ER membrane. Proteins in the lumen of the ER and in the ER
membrane are transported to the Golgi complex. Synthesis of the carbohydrate chain for the N–linked
glycosylation begins in the ER and continues in the GC, whereas the O-linked glycosylation takes place
exclusively in the GC. Carbohydrate units of glycoproteins are altered and elaborated in the Golgi
complex.
The endoplasmic reticulum is the most extensive membrane in the cell that forms compartments for the
synthesis of lipids and proteins. The rough endoplasmic reticulum is studded with ribosomes that are
engaged in the synthesis of proteins that are either membrane-bound or destined for secretion. The ER
lumen has an oxidizing environment for the formation of disulfide bonds in proteins destined for
secretion. The smooth endoplasmic reticulum is the site of lipid synthesis. Dolichol phosphate is
specialized lipid molecule that contains about 20 isoprene (C5) units located in the ER membrane.
Isoprene units of dolichol phosphates are also key building blocks for many important biomolecules such
as steroids and secondary metabolites in all life forms. The terminal phosphate group of the dolichol
phosphate is the site of attachment and assembly of an oligosaccharide destined for attachment to the
asparagine residue of a protein. This activated (energy-rich) form of the oligosaccharide is subsequently
transferred to a specific asparagine residue of the polypeptide chain by an enzyme located on the
lumenal side of the ER.
The Golgi complex is a stack of flattened membranous sacs that functions as a major sorting center of
the cell for targeting proteins to lysosomes, secretory vesicles, and the plasma membrane. Different sets
of vesicles transfer proteins from the endoplasmic reticulum to the cis-face of the Golgi complex, from
22 | P a g e
one compartment of the Golgi complex to another, and from the trans-face of Golgi complex to target
sites. Proteins proceed from the Golgi complex to different target sites, according to signals encoded
within their amino acid sequences and three-dimensional structures. Complex oligosaccharides are
synthesized in Golgi complex through the action glycosyltransferases and activated sugar nucleotides,
such as UDP-glucose. Glycosyltransferases catalyze the formation of glycosidic bonds using
carbohydrates donated by activated sugar nucleotides.
Classification
There are three classes of glycoproteins. The first class, in which the protein constituent is the largest
component by weight, is simply referred to as glycoproteins. Many glycoproteins are components of cell
membranes, where they take part in cell adhesion such as the binding of sperm to egg and cell-cell
communication. The second class of glycoproteins comprises the proteoglycans. In proteoglycans, the
core protein component is conjugated to a particular type of polysaccharide called glycosaminoglycan
(GAG). The GAG moiety commonly makes up a much larger percentage by weight of the proteoglycan
and it determines the structures and biological activities of proteoglycans. The oligosaccharides in
glycoproteins are smaller and more structurally diverse than the glycosaminoglycans of proteoglycans.
The oligosaccharide portions of glycoproteins are less monotonous than the glycosaminoglycan chains
of proteoglycans. A third class of glycoproteins is the mucins (mucoproteins). Mucins are glycoprotein
components of mucus. They are abundant in saliva where they function as lubricants.
Glycoproteins
Glycoproteins are carbohydrate-protein conjugates in which the carbohydrate moieties are small but
structurally diverse. Glycoproteins have one or several oligosaccharides of varying complexity joined
covalently to a protein. They are found on the outer face of the plasma membrane, in the extracellular
matrix, and in the blood. Inside cells they are found in specific organelles such as Golgi complexes,
secretory granules, and lysosomes. They are rich in information, forming highly specific sites for
recognition and high-affinity binding by other proteins.
RNase B
One of the simplest glycoproteins is bovine pancreatic ribonuclease B (RNase B). The oligosaccharide
does not affect the native enzyme’s conformation, substrate specificity, or catalytic properties.
However, RNase A folds to its native state more slowly than does RNase B and tends to aggregate. This
23 | P a g e
suggests that the oligosaccharide functions similarly to a molecular chaperone, most likely by shielding a
hydrophobic patch on the protein surface.
Erythropoietin
Erythropoietin is a vital hormone present in the blood serum that has dramatically improved treatment
for anemia. Erythropoietin (EPO) is secreted by the kidneys and stimulates the production of red blood
cells. EPO is composed of 165 amino acids and is N-glycosylated at three asparagine residues and O-
glycosylated on a serine residue. The mature EPO is 40% carbohydrate by weight, and glycosylation
enhances the stability of the protein in the blood. Artificial EPO can be distinguished from natural EPO in
athletes by detecting differences in their glycosylation patterns through the use of isoelectric focusing.
Glycophorin A
Glycophorin A is one of the best-characterized membrane glycoproteins of red blood cells. It contains
60% carbohydrate by mass, in the form of 16 oligosaccharide chains covalently attached to the N-
terminal domain of the membrane protein. Fifteen of the oligosaccharide chains are O-linked to Ser or
Thr residues, and one is N-linked to an Asn residue. Plasmodium falciparum invades erythrocytes by
using glycan-binding protein to bind to the carbohydrate moiety of glycophorin A. Disrupting this
interaction is clinically significant.
Granulocyte–macrophage colony-stimulating factor
Human granulocyte–macrophage colony-stimulating factor (GM-CSF) is a 127-residue protein growth
factor that promotes the development, activation, and survival of the white blood cells known as
granulocytes and macrophages. It is variably glycosylated at two N-linked sites and five O-linked sites.
The lifetime of GM-CSF in the bloodstream increases with its level of glycosylation. However, GM-CSF
that is produced in E. coli and hence is unglycosylated (bacteria rarely glycosylate the proteins they
synthesize) has a 20-fold higher specific biological activity than does the naturally occurring
glycoprotein.
The sugar code
The sugar code refers to the use of carbohydrates as carriers of chemical information for molecular and
cell-cell interactions. All biological polymers can be reasonably assumed to be built from 4 different
nucleotide subunits, 20 different common amino acids and 20 different basic monosaccharide units.
24 | P a g e
However, monosaccharides can be assembled into an almost limitless variety of oligosaccharides, which
differ in the stereochemistry and position of glycosidic bonds, the type and orientation of substituent
groups, and the number and type of branches. Therefore, oligosaccharides are by far the richest
information molecules in the cell. Carbohydrates surpass proteins in information density by two orders
of magnitude.
Molecular recognition
Glycoproteins are important constituents of plasma membranes. Eukaryotic cells have a thick and fuzzy
coating of glycoproteins and glycolipids named the glycocalyx that prevents the close encounter among
cells. Many cell-surface receptor proteins have relatively short and presumably stiff O-glycosylated
regions that link membrane-bound domains to the functional cytosolic domains. This type of
arrangement extends the functional domains in a lollipop-like manner above the cell’s densely packed
glycocalyx. The oligosaccharide markers of membrane-bound and secreted glycoproteins mediate a
variety of intercellular and intercellular interactions.
Molecular tags
Cells tend to synthesize a large repertoire of a given glycoprotein, in which each variant species differs
somewhat in the sequences, locations, and numbers of its covalently attached oligosaccharides. The
many different glycosylated forms of a given protein with several potential glycosylation sites and
patterns are called glycoforms. Each glycoform can be generated only in a specific cell type, tissue type
or developmental stage. Thus, the species specific and tissue-specific distribution of glycoforms that
each cell synthesizes endows it with a characteristic spectrum of biological properties. Similar “ticketing”
mechanisms govern the compartmentalization (molecular zip codes) and degradation (molecular timers)
of glycoproteins within cells. Therefore, a variety of glycoforms for a given glycoprotein ensures that it
has a range of cellular distributions and lifetimes. The protein component of glycoforms is identical but
the composition of the carbohydrate component is highly variable. This phenomenon, which compounds
the difficulties in the purification and characterization of proteins with different patterns of PTM, is
known as microheterogeneity.
Molecular zip codes
Glycosylation functions as molecular zip code for identifying the particular spatial destination of a
protein during protein trafficking. Mannose-6-phosphate containing oligosaccharide marks newly
25 | P a g e
synthesized proteins in the Golgi complex for transfer to the lysosome. This marker is acquired in the
Golgi complex in a two-step process. First, GlcNAc phosphotransferase adds a phospho-N-
acetylglucosamine unit to the 6-OH group of a mannose. Next, an N-acetylglucosaminidase removes the
added sugar to generate a mannose 6-phosphate residue in the core oligosaccharide.
Molecular timer
Glycosylation functions as molecular timer for specifying the age of a particular protein. The residues of
Neu5Ac (a sialic acid) situated at the ends of the oligosaccharide chains of many plasma glycoproteins
such as ceruloplasmin protect them from uptake and degradation in the liver. Removal of the sialic acid
residues by the enzyme sialidase is one way in which the body marks “old” proteins for destruction and
replacement. Similar mechanism is apparently responsible for removing old erythrocytes from the
mammalian bloodstream.
Nutrient sensing
Glycosylation can also functions as molecular sensor for identifying the energy status of the cell.
GlcNAcylation is an especially important glycosylation reaction involving the covalent attachment of N-
acetylglucosamine (GlcNAc) to serine or threonine residues of cellular proteins when nutrients are
abundant. The concentration of GlcNAc reflects the active metabolism of carbohydrates, amino acids
and fats. Glucose signals carbohydrate availability, acetate signals fatty acid availability, nitrogen signals
protein availability. The combination indicates that nutrients are abundant. The reaction is catalyzed by
GlcNAc transferase. The GlcNAcylation sites are also potential phosphorylation sites. Hence, the O-
GlcNAc transferase and protein kinases may be involved in cross talk to modulate one another’s
signaling activity. Like phosphorylation, GlcNAcylation is reversible, with GlcNAcase catalyzing the
removal of the carbohydrate. Dysregulation of GlcNAc transferase has been linked to insulin resistance,
diabetes, cancer and neurological pathologies.
Lectins
The high density of information encoded by oligosaccharides provides the sugar code with essentially
unlimited number of unique “words” small enough to be read and decoded by carbohydrate-binding
proteins called lectins. Lectins (Latin: legere, to select) are a special class of carbohydrate-binding
proteins that recognize one or more specific monosaccharides with particular linkages to other sugars in
oligosaccharides, usually with exquisite specificity. They were first discovered in plants but are now
26 | P a g e
known to occur in all organisms. Major functions of animal lectins include adhesion, cell–cell recognition
and targeting of newly synthesized proteins to specific cellular locations. For cell-cell recognition, lectins
on the surface of one cell interact with arrays of monosaccharides displayed on the surface of another
cell. In vertebrates, oligosaccharide tags read by lectins govern the rate of degradation of certain
peptide hormones, circulating proteins, and blood cells.
Protein–carbohydrate interactions typically include multiple weak noncovalent interactions that ensure
specificity yet permit reversible binding. Interactions include multiple hydrogen bonds, which often
include bridging water molecules, and the packing of hydrophobic sugar faces against aromatic side
chains. The carbohydrate-binding specificity of a particular lectin is determined by the amino acid
residues that bind the carbohydrate. A molecule of lectin usually contains two or more carbohydrate-
binding sites. Many sugars have a more polar and a less polar side. The more polar side interacts with
the lectin by hydrogen bonds, while the less polar side undergoes hydrophobic interactions with
nonpolar amino acid residues. Each interaction is weak, but the composite produces high-affinity and
high specificity binding characterizing the unique flow of information central to many physiological
processes.
Classification
Lectins can be grouped into different classes. The C type (for calcium requiring) lectins found in animals
use calcium ions as a bridge for direct interactions with OH groups on mannose residues of the sugar. C-
type lectins function in cell–cell recognition and receptor-mediated endocytosis, which is a process by
which soluble molecules are bound to receptors on the cell surface and subsequently internalized.
Selectins are a family of plasma membrane lectins that bind to different components of the immune-
system. The L, E, and P forms of selectins bind specifically to carbohydrates on lymph-node vessels (L),
endothelium (E) and activated blood platelets (P), respectively. Selectins are members of the C-type
family of lectins. The L–lectins are readily available in the seeds of leguminous plants, and many of the
initial biochemical characterizations of lectins were performed on L-lectins. Although the exact role of
lectins in plants is unclear, they can potentially serve as potent insecticides. Other L-type lectins, such as
calnexin and calreticulin, are prominent chaperones in the eukaryotic endoplasmic reticulum that
facilitate the folding of other proteins.
Applications
27 | P a g e
Purification of carbohydrates and glycoproteins is a prerequisite for carbohydrate analysis. Immobilized
lectins have been extensively used for the purification of carbohydrates and glycoproteins using affinity
chromatography. Concanavalin A (ConA) from jack bean, wheat germ agglutinin (WGA) and Mannose-
binding protein A (MBP-A) are among the best characterized lectins. ConA specifically binds α-D-glucose
and α-D-mannose residues whereas WGA specifically binds β-N-acetylmuramic acid and α-N-
acetylneuraminic acid. WGA causes cells to agglutinate or clump together. MBP binds specifically to
high-mannose octasaccharide. Glycoproteins can also be labeled with lectins that have been conjugated
(covalently cross-linked) to ferritin, an iron transporting protein that is readily visible in the electron
microscope. Such experiments with lectins of different specificities and with a variety of cell types, have
demonstrated that the carbohydrate groups of membrane-bound glycoproteins are, for the most part,
located on the external surfaces of cell membranes.
Viral infection
Many pathogens gain entry into specific host cells by adhering to cell-surface carbohydrates. Influenza
virus, the causative agent of the respiratory tract infection, binds to sialic acid residues present on cell-
surface glycoproteins of the host cell. Hemagglutinin is the viral protein that binds to these sugars. Viral
infection involves binding to the target cell, endocytosis, budding and release. After binding, the virus is
engulfed by the cell and begins to replicate. Viral assembly results in the budding of the viral particle
from the cell. Another viral protein, neuraminidase (sialidase), cleaves the glycosidic bonds between the
sialic acid residues and the rest of the cellular glycoprotein, freeing the virus to infect new cells, and thus
spreading the virus. Inhibitors of this enzyme such as oseltamivir (Tamiflu) and zanamivir (Relenza) are
important anti-influenza agents. The carbohydrate-binding specificity of viral hemagglutinin may play an
important role in species specificity of infection and ease of transmission. Avian influenza H5N1 (bird flu)
is especially lethal and is readily spread from bird to bird but not to human.
Bacterial infection
Helicobacter pylori, the causative agent of gastric ulcers, adhere to the inner surface of the stomach by
interactions involving lectins on bacterial membrane and specific oligosaccharides of membrane
glycoproteins of the gastric epithelial cells. The H. pylori recognition is part of the type O blood group
determinant. Consequently, people of blood type O show several fold greater incidence of gastric ulcers
than those of type A or B. Similarly, Vibrio cholera enters intestinal cells by interactions involving cholera
28 | P a g e
toxin molecule and the o pentasaccharide of ganglioside GM1 from a membrane phospholipid on target
cells.
Blood groups
Some membrane lipids of eukaryotic cells have covalently bound carbohydrates in which a complex
oligosaccharide functions as the polar head group. Gangliosides are lipopolisacchrides of sialic acid
containing oligosaccharides that determines human blood groups. Blood groups are based on protein
glycosylation patterns. The human ABO blood groups illustrate the effects of glycosyltransferases on the
formation of glycoproteins. Each blood group is designated by the presence of one of the three different
carbohydrates, termed A, B, or O, attached to glycoproteins and glycolipids on the surfaces of red blood
cells.
These structures have in common an oligosaccharide foundation called the O (or sometimes H) antigen.
The A and B antigens differ from the O antigen by the addition of one extra monosaccharide, either N -
acetylgalactosamine (for A) or galactose (for B) through an α-1, 3 linkage to a galactose moiety of the O
antigen. Specific glycosyltransferases add the extra monosaccharide to the O antigen. Each person
inherits the gene for one glycosyltransferase of this type from each parent. The type A transferase
specifically adds N acetylgalactosamine, whereas the type B transferase adds galactose. The O
phenotype is the result of a mutation in the O transferase that results in the synthesis of an inactive
enzyme. These structures have important implications for blood transfusions and other transplantation
procedures.
Cancer
Some pathological conditions such as cancer alter the glycosylation patterns of membrane proteins.
Normal cells stop growing when they touch each other, a phenomenon known as contact inhibition.
Cancer cells, however, are under no such control and therefore form malignant tumors. There are
29 | P a g e
significant differences in the distributions of cell-surface carbohydrate between cancerous and
noncancerous cells.
In born errors of glycosylation
Errors in glycosylation can result in pathological conditions. There is an entire family of severe inherited
human diseases called congenital disorders of glycosylation (CDG). These pathological conditions reveal
the importance of proper modification of proteins by carbohydrates and their derivatives. I-cell disease
(also called mucolipidosis II), is a lysosomal storage disease. Lysosomes are organelles that degrade and
recycle damaged cellular components or material brought into the cell by endocytosis. In patients with I-
cell disease, lysosomes contain large inclusion bodies of undigested glycosaminoglycans and glycolipids.
These inclusion bodies are present because the enzymes normally responsible for the degradation of
glycosaminoglycans are missing from affected lysosomes.
The responsible enzymes contain a mannose-6-phosphate residue as a component of an N-
oligosaccharide that serves as the marker directing the enzymes from the Golgi complex to lysosomes. I
cell patients are deficient in the N-acetylglucosamine phosphotransferase catalyzing the first step in the
addition of the phosphoryl group; the consequence is the mistargeting of eight essential enzymes. The
enzymes are present at very high levels in the blood and urine since active enzymes are synthesized, but
in the absence of appropriate glycosylation, they are secreted instead of being exported to lysosomes. In
other words, in I-cell disease, a whole series of enzymes are incorrectly addressed and delivered to the
wrong location.
Proteoglycans
Proteoglycans are macromolecules comprising a core protein to which one or more glycosaminoglycan
chains are covalently attached. The protein components of proteoglycans such as aggrecan and
syndecan range in molecular weight from 40 to 200 kDa. The polysaccharide components of
proteoglycans, such as keratan sulfate and chondroitin sulfate, are called a glycosaminoglycan (GAGs).
Glycosaminoglycans (mucopolysaccharides) are large, linear polymers of repeating disaccharide units.
One of the two monosaccharides is either N-acetylglucosamine or N-acetylgalactosamine. The other is
often a uronic acid, usually D-glucuronic or L-iduronic acid. Hence, GAGS are heteropolysaccharides. At
least one of the hydroxyl groups of the amino sugar in the repeating unit is esterified with sulfate.
Consequently, GAGs are anionic with high density of carboxylate and sulfate groups. GAGs assume an
extended conformation in solution to minimize the repulsive forces among neighboring charged groups.
30 | P a g e
Many proteoglycans are secreted into the extracellular matrix, but some are integral membrane
proteins exposed at the cell surface. The main biological role of proteoglycans is the provision of
multiple binding sites, rich in opportunities for hydrogen bonding and electrostatic interactions. The
specific patterns of sulfated and nonsulfated sugar residues in GAGs generate specific molecular
recognition sites for a wide variety of ligands that bind to them by hydrogen bonding and electrostatic
interactions. GAGs also function as structural components and lubricants. They provide viscosity,
adhesiveness, and tensile strength to the extracellular matrix. The major glycosaminoglycan in animals is
hyaluronate. Others include chondroitin sulfate (CS), keratan sulfate (KS), dermatan sulfate (DS),
heparan sulfate (HS) and heparin. Other glycosaminoglycans differ from hyaluronate in two respects:
they are generally much shorter polymers and they are covalently linked to specific proteins as
proteoglycans. Chitin, found in exoskeletons, is also a glycosaminoglycan.
Hyaluronic acid
Hyaluronic acid molecules (also called hyaluronan) are composed of a large number of β (1→4)-linked
disaccharide units that consist of D-glucuronic acid and N-acetyl-D-glucosamine linked by β (1→3) bond.
At physiological pH, hyaluronic acid exists as hyaluronate anion that binds tightly to cations such as K+,
Na+, and Ca2+. The hyaluronate polyanion forms an extended, left-handed, single-stranded helix with 3
disaccharide units per turn stabilized by intramolecular hydrogen bonds.
Hyaluronate has structural features that suit its biological function. It is rigid and highly hydrated
molecule with high molecular weight and numerous mutually repelling anionic groups. Consequently,
hyaluronate solutions have a shear dependent viscosity with high tensile strength and elasticity. At low
shear rates, the hyaluronate molecules form tangled masses that greatly impede flow. At high shear
rates, the stiff rod like hyaluronate molecules tend to line up with the flow and thus offer less resistance
to it. This viscoelastic behavior makes hyaluronate solutions excellent biological shock absorbers and
lubricants. Hence, hyaluronate is an important GAG component of ground substance (the extracellular
matrix of cartilage and tendons), synovial fluid (the fluid that lubricates the joints), and the vitreous
humor of the eye. It also occurs in the capsules surrounding certain pathogenic bacteria.
Some proteoglycans can form aggregates in the extracellular matrix. These aggregates are formed by
association of a hundred or more molecules of the core protein aggrecan; all bound to a single, very long
molecule of hyaluronate. Each aggrecan molecule is decorated by many covalently bound chondroitin
sulfate and keratan sulfate chains. Link proteins situated at the junction between each core protein and
31 | P a g e
the hyaluronate backbone mediate the core protein–hyaluronate interaction. The β (1→4)-linkages of
hyaluronic acid and other GAGs are hydrolyzed by hyaluronidase. This enzyme occurs in sperm cells and
a variety of animal tissues, in some pathogenic bacteria and in snake and insect toxins. In sperm cells, it
hydrolyzes an outer glycosaminoglycan coat around the ovum allowing sperm penetration. In
pathogenic bacteria, it renders animal tissues more susceptible allowing bacterial invasion.
Chondroitin-4-sulfate
Chondroitin-4-sulfate molecules (Greek: chondros, cartilage) are composed of D-glucuronic acid and N-
acetyl-D-galactosamine-4-sulfate joined by β (1→3)-bond. The disaccharide units are linked by β (1→4)-
bond. When the N-acetyl-D-galactosamine moiety is sulfated at the C6 position, the copolymer is
chondroitin-6-sulfate. These two types of chondroitin sulfates occur separately or in mixtures depending
on the tissue. Chondroitin-4-sulfate is a major component of cartilage, tendons, ligaments, and the walls
of the aorta. It contributes to the tensile strength of these connective tissues. Chondroitin sulfate is
connected to a Ser residue in the core protein via typical trisaccharide linker. The xylose residue at the
reducing end of the linker is joined by its anomeric carbon to the hydroxyl of the Ser residue.
Dermatan-4-sulfate
Dermatan sulfate molecules (Greek: derma, skin) are polymers of L-iduronate and N-acetyl-D-
galactosamine- 4-sulfate joined by α (1→3)-bond. The disaccharide units are linked by β (1→4)-bond.
Dermatan-4-sulfate differs from chondroitin- 4-sulfate only by an inversion of configuration about C5 of
the β-D-glucuronate (GlcA) residues to form α-L-iduronate (IdoA). Enzymatic epimerization of these
residues occurs after the formation of chondroitin. The epimerization is usually incomplete, so dermatan
sulfate also contains glucuronate residues. Dermatan sulfate is prevalent predominantly in skin, in blood
vessels and heart valves. It contributes to the pliability of skin.
Keratan-6- sulfates
Keratan sulfate molecules (Greek: keras, horn) are polymers mainly of alternating β (1→4)-linked D-
galactose and N-acetyl-D-glucosamine-6-sulfate residues. The disaccharide units are linked by β (1→3)-
bond. Some disaccharides in these polymers contain small amounts of fucose, mannose, N-
acetylglucosamine, and sialic acid. Consequently, their sulfate content is variable and they are very
heterogeneous groups. Kerata-4-sulfates are present in cartilage, bone, cornea, as well as a variety of
horny structures formed of dead cells: hair, nails, horn, hoofs, and claws.
32 | P a g e
Heparin
Heparin sulfate molecules (Greek: hepar, liver) are variably sulfated polymers of mainly alternating α
(1→4)-linked L-iduronate-2-sulfate and N-sulfo-D-glucosamine-6-sulfate residues. The disaccharide units
are linked by α (1→4)-bond. Heparin contains primarily sulfated iduronic acid (IdoA) and a smaller
proportion of glucuronic acid (GlcA), and is generally highly sulfated and heterogeneous in length. It has
an average of 2.5 anionic sulfate groups per disaccharide unit, which makes it the most negatively
charged polyelectrolyte of any known biological macromolecule. Unlike other GAGs, heparin is not a
constituent of connective tissue, but occurs almost exclusively in the intracellular granules of a type of
leukocytes called mast cells. Mast cells are found near the walls of arterial blood vessels and on the
surfaces of endothelial cells especially in the liver, lungs, and skin. They release heparin and other
molecules from their dense granules into the extracellular space when triggered.
Heparin acts as an anticoagulant by increasing the rate of formation of irreversible complexes between
antithrombin III and thrombin, a serine protease essential for blood clotting. The strongly electrostatic
heparin binding causes antithrombin III to bind to and inhibit thrombin. Purified heparin is routinely
added to blood samples clinically as an anticoagulant to assist the termination of blood clotting.
Plasmodium falciparum, the parasitic protozoan that causes malaria, relies on glycan binding to infect
and colonize its host. Glycan-binding proteins of the parasitic form initially injected by the mosquito bind
to the glycosaminoglycan heparin sulfate on the liver, initiating the parasite’s entry into the cell.
Heparan sulfate
Heparan sulfate is initially synthesized as a long polymer of alternating N-acetylglucosamine (GlcNAc)
and glucuronic acid (GlcA) residues. This simple chain is then enzymatically modified at specific regions.
First, some of the acetyl groups of the GlcNAc residues are replaced with sulfates by N-deacetylase: N-
sulfotransferase. This enzyme generates clusters of N-sulfated glucosamine (GlcN) residues. Second,
these clusters attract an epimease which converts GlcA to IdoA and sulfotransferases that produce
sulfate esters at the C2 hydroxyl of IdoA and the C6 hydroxyl of N-sulfated GlcN. The resulting heparan
sulfate molecules are polymers of alternating α (1→4)-linked D-iduronate and N-acetyl-D-glucosamine-
6-sulfate residues. The disaccharide units are linked by α (1→4)-bond. Heparan sulfate has highly
sulfated (S) domains alternating with domains having unmodified GlcNAc and GlcA residues (NA
domains).
33 | P a g e
Heparan sulfate is structurally similar to heparin but has a higher proportion of N-acetyl groups and
fewer N-and O-sulfate groups, arranged in a less regular pattern. The high density of negative charges in
heparin sulfate brings positively charged molecules of lipoprotein lipase into the vicinity and holds them
by electrostatic interactions as well as by sequence specific interactions with S domains. Heparan sulfate
is a ubiquitous component of cell surfaces as well as an extracellular matrix (basal lamina) in blood
vessel walls, epithelial cells and brain. Extracellular ground substance is a sheet like substance that
separates organized groups of cells. Epithelial cells are the cells lining body cavities and free surfaces.
Heparan sulfates in proteoglycans bind to a variety of extracellular ligands and growth factors and
thereby modulate the ligands’ interaction with specific receptors of the cell surface. Growth factors are
proteins that function to induce the growth, differentiation, and migration of their specific target cells.
They are expressed in specific spatial and temporal patterns in embryos and adults. The S domains bind
specifically to extracellular ligands and signaling molecules to alter their biological activities. The change
in activity may result from three different mechanisms of action. The first mechanism is conformational
change induced by binding to S domains. The second mechanism is by the brokering action of S
domains. Brokering is the ability of adjacent domains of heparan sulfate to bind to two different
proteins and enhance protein-protein interactions by bringing them into close proximity. A third
mechanism is by acting as coreceptor. The S domains bind to signaling molecules electrostatically to
maintain their high local concentrations and enhance their interaction with receptors on cell surfaces.
Extracellular matrix
In multicellular animals, the extracellular space surrounding cells inside tissues is filled with a gel-like
material called the extracellular matrix or the ground substance. It holds the cells together and provides
a porous pathway for the diffusion of nutrients and oxygen to individual cells. It is composed of an
interlocking meshwork of proteoglycans and fibrous proteins. Connective tissues such as cartilage,
tendon, skin, and blood vessel walls are composed of fibrous proteins such as collagen, elastin,
fibronectin, and laminin embedded in the ground substance. Some of these proteins are multiadhesive,
a single protein having binding sites for several different matrix molecules. Proteoglycans in the
extracellular matrix act as tissue organizers, influence the development of specialized tissues, and
regulate the extracellular assembly of collagen fibrils.
The proteoglycan aggrecan and the protein collagen are the major components of cartilage which is
made largely of cross-linked meshwork collagen fibrils and proteoglycans. Aggrecans interact strongly
34 | P a g e
with collagen in the extracellular matrix of cartilage, contributing to the development and tensile
strength of this connective tissue. However, the characteristic resilience of cartilage results from its high
proteoglycan content. The triple helix of collagen provides structure and tensile strength, whereas
aggrecan serves as a shock absorber. Furthermore, GAGs are important components of synovial fluid
(the fluid that lubricates the joints), and the vitreous humor of the eye.
Aggircan is one of the best characterized members of the proteoglycan in the extracellular matrix. The
protein component of aggrecan is a very large molecule. It has three globular domains named N-
terminal domain (domain 1), central domain (domain 2) and C-terminal domain (domain 3). Domain 1 is
hyaluronic acid–binding domain since it binds noncovalently to hyaluronic acid. This attachment is
stabilized by a link protein. Domain 2 has no particular known function. Domain 3 contains a lectin like
module, which binds certain monosaccharide units. The highly extended region between globular
domains 2 and 3 is the site of glycosaminoglycan attachment. This linear region contains highly
repetitive amino acid sequences, which are sites for the covalent attachment of keratan sulfate and
chondroitin sulfate.
The core protein has three carbohydrate binding regions. These are an N-terminal (inner) region,
oligosaccharide rich (central) region and a C-terminal (outer) region. The N-terminal region binds a
relatively few oligosaccharides predominantly through the amide N atoms of specific Asn residues. It
overlaps with the globular domain 1. The central region serves as anchor point for keratan sulfate chains
through the side chain O atoms of Ser and Thr residues. It overlaps with the globular domain 2. The C-
terminal region mainly binds to chondroitin sulfate chains through the side chain O atoms of Ser
residues in Ser-Gly dipeptides via galactose– galactose–xylose trisaccharide linker.
Many proteoglycans can form huge complexes. Aggrecans have a bottlebrush-like supramolecular
architecture. Many molecules of aggrecan subunits “bristle” are noncovalently attached through their
first globular domain to a very long, central and filamentous hyaluronic acid “backbone”. Many aggrecan
monomers emerge laterally at regular intervals from opposite sides of a central hyaluronate filament.
Each aggrecan chain (projection) is made up of a core protein to which to several bushy chains of
keratan sulfate and chondroitin sulfate are covalently bound protrusions. This heterogeneous assembly
accounts for the enormous molecular masses of aggrecans and for their high degree of polydispersity
(range of molecular masses).
35 | P a g e
A large amount of water is attracted and absorbed into the many negative charges of GAGs. Aggrecans
can cushion compressive forces because of this absorbed water. Water is squeezed from GAGs when
pressure is exerted and returns to GAGS when pressure is released. The application of pressure on
cartilage squeezes water away from these charged regions until charge–charge repulsions prevent
further compression. Water enables aggricans to spring back after having been deformed generating
cushioning effect. Cartilage in the joints, which lack blood vessels, is nourished by this flow of liquid
caused by body movements. This explains why long periods of inactivity cause joint cartilage to become
thin and fragile. The most common form of arthritis is osteoarthritis. It results from the loss of water
from proteoglycan with aging. Other forms of arthritis can result from the proteolytic degradation of
aggrecan and collagen in the cartilage. Mucopolysaccharidoses are a collection of diseases, such as
Hurler disease, that result from the inability to degrade glycosaminoglycans. They result in skeletal
deformities and reduced life expectancies.
Adhesion
Proteoglycans not only function as lubricants and structural components of the extracellular matrix, but
they also mediate adhesion of cells to the extracellular matrix, and bind factors that regulate cell
proliferation. Some proteoglycans provide points of adhesion, recognition, and information transfer
between cells, or between the cell surface and the extracellular matrix. Glycosaminoglycan-containing
macromolecules are also found on the cell surfaces. Proteoglycans at cellular surface mediate the
activities of various growth factors. The core proteins of membrane bound proteoglycans can be
grouped into three classes. These are integral membrane proteins, integral membrane lipoproteins and
peripheral membrane or extracellular matrix proteins.
Syndecan is a core protein and an integral membrane protein. It has an amino-terminal extracellular
domain and a single transmembrane domain. The amino-terminal domain on the extracellular surface is
covalently attached to three chains of heparin sulfate and two chains of chondroitin sulfate, each
attached to a Ser residue by trisaccharide linkers. It is anchored to the plasma membrane by a single
transmembrane helix. Glypicans are integral membrane lipoprotein core proteins. They are anchored to
the plasma membrane by a glycolipid, a derivative of the membrane lipid phosphatidylinositol.
Fibronectin is a peripheral multidomain core protein that can be released into the extracellular space
where it forms part of the basement membrane.
Molecular recognition
36 | P a g e
Molecular recognitions between cells and proteoglycans of the extracellular matrix are mediated by
integral membrane proteins and extracellular matrix proteins. The overall cell-matrix interactions serve
not merely to anchor cells to the extracellular matrix but also to provide paths that direct the migration
of cells in developing tissue. Integrins are a family of integral membrane proteins that act both to attach
cells to extracellular matrix and to mediate signaling between the cell interior and the extracellular
matrix. They are heterodimeric proteins in which each subunit is anchored to the plasma membrane by
a single hydrophobic transmembrane helix. The large extracellular domains of the heterodimer combine
to form a specific binding site for extracellular proteins such as collagen and fibronectin. Integrins have
binding sites for a number of intracellular and extracellular macromolecules giving them the ability to
convey information in both directions across the plasma membrane.
Integrins interact with macromolecular components of the extracellular matrix and convey instructions
on adherence to the matrix or cell migration to the cytoskeletal system. The most common sequence
determinant for integrin binding to extracellular proteins is RGD (Arg–Gly–Asp). Integrins regulate many
processes including platelet aggregation, tissue repair, the activity of immune cells, and the invasion of
tissue by a tumor. Fibrinogen is converted into fibrin during blood clotting at the site of wound.
Integrins are also used to anchor peripheral proteins such as fibronectin to actin filaments inside the
membrane. Fibronectin is an extracellular protein with binding sites for both integrins and
proteoglycans in the extracellular matrix. It has separate binding domains for different GAGs such as
heparan sulfate and fibrous proteins such as fibrin and collagen.
Fibroblast growth factor (FGF)
FGF is an extracellular signaling protein that stimulates cell division. It regulates a variety of critical
biological processes through the four FGF cell-surface receptors (FGFR1–4). FGF only binds to FGFRs in
complex with heparin or with heparan sulfate moieties of syndecan in cell surfaces of the target cells.
Heparan sulfate is joined to syndecan through a trisaccharide bridge commonly at a Ser residue in the
general sequence Ser–Gly–X–Gly. FGFR dimerization in solution requires the presence of heparan sulfate
proteoglycans in addition to FGF. The binding of FGF to heparin or heparan sulfate protects FGF from
degradation. The release of active FGF–FGFR complexes from the extracellular matrix by the proteolysis
or by the partial degradation of heparin sulfate could be important activation mechanism. Several other
growth factors interact similarly with proteoglycans.
Infection
37 | P a g e
Electrostatic and sequence specific interactions with S domains are central in the first step in the entry
of certain viruses (such as herpes simplex viruses HSV-1 and HSV-2) into cells. Lectins on the surface of
HSV-1 and HSV-2 (the causative agents of oral and genital herpes, respectively) bind specifically to
heparan sulfate on the cell surface as a first step in their infection cycle. Infection requires precisely the
right pattern of sulfation on heparan sulfate polymer. Sequence heterogeneity generates different
ability to bind to specific proteins.
Cell walls
Bacteria are surrounded by rigid cell walls. Proteoglycans are important components bacterial and
fungal cell wall. They can be classified as gram-positive and gram-negative depending on whether or not
they take up gram stain. Gram-positive bacteria such as Escherichia coli and Salmonella typhimurium
have a thick cell wall surrounding their plasma membrane. Their cytoplasm is surrounded by plasma
membrane and peptidoglycan cell wall. In contrast, gram-negative bacteria have a thin cell wall covered
by a complex outer membrane. Their cytoplasm is surrounded by plasma membrane, periplasmic space,
peptidoglycan cell wall and outer membrane. The periplasmic space is an aqueous compartment that
lies between the plasma membrane and the peptidoglycan cell wall. It contains proteins that transport
sugars and other nutrients. The outer membrane functions as a barrier to exclude harmful substances
(such as gram stain). Characteristic antigens of bacteria, which are responsible for bacterial virulence,
are components of bacterial cell walls and capsules.
Peptidoglycans
Atomic force microscopy AFM is an imaging technique based on the variation in the force between a
probe that is several nanometers in diameter and a surface of interest as the probe is scanned over the
surface. It can be used to study the chemical structure bacterial cell walls. Bacterial cell wall is
hierarchicaly organized into peptidoglycan repeating unit, peptidoglycan chain and peptidoglycan cable.
Peptidoglycan chains are proteoglycans. The cell walls of both gram-positive and gram-negative bacteria
are made up of peptidoglycan framework that completely encases the cell. Peptidoglycans or mureins
are covalently linked polysaccharide and polypeptide chains.
The peptidoglycan repeating unit of a single peptidoglycan chain is a disaccharide of β (1→4)-linked N-
acetylglucosamine (NAG) and N-acetylmuramic acid (NAM) and a tetrapeptide of L-Ala-D-isoglutamyl-L-
Lys-D-Ala. The disaccharide units are linked by β (1→4)-bond. The lactyl side chain of NAM forms an
amide bond with the branching tetrapeptide chain. The D-amino acids of peptidoglycans render them
38 | P a g e
resistant to proteases. However, lysozyme catalyzes the hydrolysis of the β (1→4) glycosidic linkage
between NAM and NAG. It is found in tears, mucus, and other vertebrate body secretions, as well as in
egg whites.
A long and linear chain of alternating NAG–NAM disaccharides form a monotonous polysaccharide to
which short peptide branches are covalently attached. Multiple neighboring parallel peptidoglycan
chains are covalently cross-linked by a short connecting bridge through their tetrapeptide side chains to
form a right-handed helical peptidoglycan cable. The most common connecting bridge is pentaglycine
(Gly5) bridge that extends from the terminal carboxyl group of one tetrapeptide to the ε-amino group of
the Lys in a neighboring tetrapeptide. This bridge may sometimes contain different amino acid residues
such as Ala or Ser. Several helical peptidoglycan cables wrap about the plasma mebrane to form the
rigid framework of a bacterial cell wall.
Penicillin
In 1928, Fleming noticed that the chance exposure of growing bacterial culture plate to the mold
Penicillium notatum results in lysis of nearby bacteria. Penicillin kills bacteria by disrupting the normal
balance between cell wall biosynthesis and degradation. Penicillin contains a thiazolidine ring fused to a
β-lactam ring. It specifically binds to and inactivates enzymes that function to cross-link the
peptidoglycan strands of bacterial cell walls. Most bacteria that are resistant to penicillin secrete a β-
lactamase (penicillinase) enzyme, which inactivates penicillin by hydrolytically cleaving the amide bond
of its β-lactam ring to form penicillinoic acid.
Teichoic acids
The outer surfaces of gram-positive bacteria are covered by teichoic acids. Teichoic acids are polymers
of glycerol or ribitol linked by phosphodiester bridges often terminating in lipopolysaccharides. The
hydroxyl groups of this sugar–phosphate chain are substituted by D-Ala residues and saccharides such as
39 | P a g e
glucose or NAG. Teichoic acids are anchored to the peptidoglycans via phosphodiester bonds to the C6-
OH groups of their NAG residues.
O-Antigens
Lipopolysaccharides are the dominant surface feature of the outer membrane of gram-negative
bacteria. The outer membranes of gram-negative bacteria are composed of complex
lipopolysaccharides, proteins, and phospholipids that are organized in a complicated manner. They are
decorated with complex and often unusual polysaccharides known as O-antigens that uniquely mark
each bacterial strain and elicit immunological defense system by the host. O-antigens are subjected to
rapid selection pressure for mutational alteration as part of the ongoing biological warfare between
pathogen and host. The mutations are in the genes specifying the enzymes that synthesize the O-
antigens so as to generate new bacterial strains that are not recognized the by the host.
Chitin
Chitin is a homopolymer of β (1→4)-linked N-acetyl-D-glucosamine residues. Hence, it is unbranched
glycosaminoglycan. Chitin is the principal structural component of the exoskeletons of invertebrates
such as insects, crustaceans, and arachnids. The razor sharp beaks of squid used for disabling and
consuming prey are made of extensively cross-linked chitin. Chitin is also a major cell wall constituent of
most fungi and many algae.
Mucins
Mucins or mucoproteins are O-linked glycoproteins possessing often sulfated and hence mutually
repelling carbohydrate chains of N–acetylgalactosamine. They are synthesized by specialized cells in the
tracheobronchial, gastrointestinal, and genitourinary tracts making them common in mucous secretions.
They play important roles in cell adhesion, the immune response and fertilization. At their physiological
concentrations, mucins form entangled networks that function as a protective barrier and lubricants to
various epithelial cells. The defining feature of the protein component of mucins is a region of the
protein backbone termed the variable number of tandem repeats region (VNTR). VNTR is rich in serine
and threonine residues that are extensively O-glycosylated at serine or threonine residues. High
glycosylation of the VNTR renders mucins into an extended conformation. The Cys-rich domains and the
D domain facilitate the polymerization of many molecules. Mucins can be membrane-bound or
40 | P a g e
secreted. In the evolutionary struggle between pathogens and their hosts, mucins have evolved to
contain the target oligosaccharides of certain pathogens to function as decoys.
Carbohydrate analysis
Carbohydrate analysis involves the development of methods for analyzing the structure,
stereochemistry and function of complex oligosaccharides. Unlike nucleic acids and proteins,
oligosaccharides can be branched and joined by a variety of linkages complicating carbohydrate
hydrolysis. Carbohydrate analysis requires four basic processes. First, a target glycoprotein or
lipopolysaccharide is purified from its natural or heterologous source. Second, oligosaccharides are
removed from their protein or lipid conjugates for further analysis. Third, the purified oligosaccharides
are hydrolyzed in strong acid to determine the relative compositions of the various monosaccharides.
Fourth, oligosaccharides are subjected to stepwise degradation with specific reagents that reveal the
position and stereochemistry of glycosidic bonds.
Highly purified lectins, attached covalently to an insoluble support, are useful reagents for detecting and
separating glycoproteins. Once the target glycoprotein is purified, the points of attachment and the
structures of its oligosaccharides can be systematically determined. Characterization of an
oligosaccharide requires elucidating the identities, anomers, linkages, and the order of its component
monosaccharides. The next step is to detach the oligosaccharide moieties from its protein and lipid
conjugate using purified glycosidases and lipases respectively. N-linked oligosaccharides can be released
from proteins by peptide N-glycosidase F, which cleaves the N–glycosidic bonds.
Large oligosaccharides can also be converted into smaller, easily analyzable oligosaccharides chemically
or with sequence specific endoglycosidases. Hydrolysis of oligosaccharides yields a mixture of
monosaccharides that can be modified for chromatographic separation, identification and
quantification. Establishing the complete structure of an oligosaccharide requires determination of
branching positions, the sequence in each branch, the configuration of each monosaccharide unit at
anomeric and other carbons, and the positions of the glycosidic links. Oligosaccharides can be
“sequenced” by a combination of methylation analysis and enzymatic degradation supported by mass
spectrometry and high-resolution NMR spectroscopy.
Methylation analysis
41 | P a g e
Methylation analysis is the process of converting all free hydroxyls to acid-stable methyl ethers by
treating the intact oligosaccharide with methyl iodide in a strongly basic medium to locate glycosidic
bonds. Exhaustive methylation is followed by hydrolysis. The methylated intact oligosaccharide is
subsequently hydrolyzed in acid. Unlike glycosidic bonds, methyl ethers not at the anomeric C atom are
resistant to acid hydrolysis. Consequently, if an oligosaccharide is exhaustively methylated and then
hydrolyzed, the free OH groups on the resulting methylated monosaccharides mark the former positions
of the glycosidic bonds. Methylated monosaccharides are often identified by gas–liquid chromatography
coupled to mass spectrometry.
Stepwise enzymatic degradation
Stepwise enzymatic degradation is the process of using exoglycosidases of known specificity to remove
residues one at a time from the intact oligosaccharide to determine the sequence of monosaccharide
units and potential branching points. Exoglycosidases are enzymes that specifically hydrolyze specific
monosaccharides from the nonreducing ends of oligosaccharides (analogous exopeptidases). The
enzyme β-1, 4-galactosidase cleaves the terminal β-glycosidic bond exclusively at galactose residues,
whereas α-1, 2-mannosidase does so with the α-anomers of mannose. Cleaving the intact
oligosaccharide with exoglycosidases of varying specificities often allows the deduction of the position
and stereochemistry of the linkages. The repetition of this process with the use of an array of enzymes
of different specificity will eventually reveal the sequence and configuration of anomeric carbons of the
oligosaccharide. The hydrolysis products can again be analyzed by mass spectrometry. However, the
processing enzymes are generally not available in sufficient purities to ensure the synthesis of uniform
products.
Mass spectrometry and NMR
Matrix-assisted laser desorption/ionization/ time-of-flight (MALDI-TOF) or other mass spectrometric
techniques (MALDI MS) are very sensitive methods for determining the mass of the molecular ion (the
entire oligosaccharide chain). Tandem mass spectrometry (MS/MS) reveals the mass of the molecular
ion and many of its fragments, which are usually the result of breakage of the glycosidic bonds. Although
all sugar isomers have identical molecular masses, they have characteristic fragmentation patterns.
NMR analysis of oligosaccharides of moderate size can provide detailed information about sequence,
linkage position, and anomeric carbon configuration.