An introduction to COVID-19 biologyAn introduction to COVID-19 biology Brad Berger DRDC – Suffield Research Centre The body of this CAN UNCLASSIFIED document does not contain the

Defence Research and Development Canada Reference Document

DRDC-RDDC-2020-D079

August 2020

CAN UNCLASSIFIED

CAN UNCLASSIFIED

An introduction to COVID-19 biology

Brad Berger DRDC – Suffield Research Centre

The body of this CAN UNCLASSIFIED document does not contain the required security banners according to DND security standards. However, it must be treated as CAN UNCLASSIFIED and protected appropriately based on the terms and conditions specified on the covering page.

CAN UNCLASSIFIED

Template in use: EO Publishing App for SR-RD-EC Eng 2018-12-19_v1 (new disclaimer).dotm © Her Majesty the Queen in Right of Canada (Department of National Defence), 2020

© Sa Majesté la Reine en droit du Canada (Ministère de la Défense nationale), 2020

CAN UNCLASSIFIED

IMPORTANT INFORMATIVE STATEMENTS

This document was reviewed for Controlled Goods by Defence Research and Development Canada (DRDC) using the Schedule to the Defence Production Act.

Disclaimer: This publication was prepared by Defence Research and Development Canada an agency of the Department of National Defence. The information contained in this publication has been derived and determined through best practice and adherence to the highest standards of responsible conduct of scientific research. This information is intended for the use of the Department of National Defence, the Canadian Armed Forces (“Canada”) and Public Safety partners and, as permitted, may be shared with academia, industry, Canada’s allies, and the public (“Third Parties”). Any use by, or any reliance on or decisions made based on this publication by Third Parties, are done at their own risk and responsibility. Canada does not assume any liability for any damages or losses which may arise from any use of, or reliance on, the publication.

Endorsement statement: This publication has been published by the Editorial Office of Defence Research and Development Canada, an agency of the Department of National Defence of Canada. Inquiries can be sent to: [email protected].

DRDC-RDDC-2020-D079 i

Abstract

The recent COVID-19 pandemic, caused by the SARS-CoV-2 coronavirus, has been associated with an

increased interest in the basic biology related to this virus and disease. This report reviews the fundamental

properties of SARS-CoV-2 and COVID-19 at a level suitable for a broad range of educational background.

ii DRDC-RDDC-2020-D079

Résumé

La récente pandémie de COVID-19, cause par le coronavirus SARS-CoV-2, a été associée à un intérêt

accru pour la biologie de base liée à ce virus et à cette maladie. Ce rapport passe en revue les propriétés

fondamentales du SARS-CoV-2 et de la COVID-19 pour une audience de niveau d’éducation diverse.

DRDC-RDDC-2020-D079 iii

Table of Contents

Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i

Résumé . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii

Table of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii

List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

2 Basic Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

2.1 DNA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

2.2 RNA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

2.3 Proteins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

2.4 Membranes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2.5 Information Processing in Human Cells . . . . . . . . . . . . . . . . . . 3

2.6 Viruses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

3 Coronaviruses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

3.1 General Information . . . . . . . . . . . . . . . . . . . . . . . . . 6

3.2 Genome and Products . . . . . . . . . . . . . . . . . . . . . . . . 7

3.3 Life Cycle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

3.3.1 Binding and Uptake . . . . . . . . . . . . . . . . . . . . . . 9

3.3.2 Transcription and Translation . . . . . . . . . . . . . . . . . . . 9

3.3.3 Assembly and Processing . . . . . . . . . . . . . . . . . . . . 10

4 Covid-Specific Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

4.1 Pathogenesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

4.2 Treatment and Prophylaxis . . . . . . . . . . . . . . . . . . . . . . . 12

4.3 Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

4.4 Relationships To Other Coronaviruses . . . . . . . . . . . . . . . . . . 16

4.5 Mutation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

iv DRDC-RDDC-2020-D079

List of Figures

Figure 1: The structure of DNA. . . . . . . . . . . . . . . . . . . . . . . . . . 18

Figure 2: The structure of RNA. . . . . . . . . . . . . . . . . . . . . . . . . . 19

Figure 3: The structure of proteins. . . . . . . . . . . . . . . . . . . . . . . . . 20

Figure 4: The structure of cell membranes. . . . . . . . . . . . . . . . . . . . . 21

Figure 5: DNA to RNA to proteins in mammalian cells. . . . . . . . . . . . . . . . 22

Figure 6: The Coronaviridae. . . . . . . . . . . . . . . . . . . . . . . . . . . 23

Figure 7: The structure of a coronavirus. . . . . . . . . . . . . . . . . . . . . . 24

Figure 8: The coronavirus genome. . . . . . . . . . . . . . . . . . . . . . . . . 25

Figure 9: The non-structural proteins 1a and 1ab. . . . . . . . . . . . . . . . . . . 25

Figure 10: S protein. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

Figure 11: M protein. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

Figure 12: E protein. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

Figure 13: N protein. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

Figure 14: The coronavirus life cycle. . . . . . . . . . . . . . . . . . . . . . . . 29

Figure 15: Ribosomal slipping between 1a and 1b. . . . . . . . . . . . . . . . . . . 30

Figure 16: Structures of antiviral compounds. . . . . . . . . . . . . . . . . . . . . 31

Figure 17: The reverse transcriptase real-time PCR assay for SARS-CoV-2. . . . . . . . . 32

Figure 18: Chemiluminescence assay for anti-SARS-CoV-2 antibodies. . . . . . . . . . 33

Figure 19: ELISA assay for anti-SARS-CoV-2 antibodies. . . . . . . . . . . . . . . . 34

Figure 20: Test strip assay for anti-SARS-CoV-2 antibodies. . . . . . . . . . . . . . . 35

Figure 21: Relationships amongst the betacoronaviruses. . . . . . . . . . . . . . . . 36

DRDC-RDDC-2020-D079 1

1 Introduction

The recent COVID-19 pandemic, caused by the severe acute respiratory syndrome coronavirus 2

(SARS-CoV-2), has been accompanied by a large increase in requests for information related to specific

aspects of the disease/virus. Such requests have been handled on an ad-hoc basis and include regular updates

on scientific and medical advances, support to forward planning, support to intelligence reporting, and

scientific triage of popular news reporting. Much of this assistance is for individuals with a limited

background in biology, or those whose knowledge base in the subject is old. This report aims to provide a

very basic foundation of key concepts needed to understand SARS-CoV-2 biology and its relevance to

COVID-19 disease for those who might like a better understanding of the material they are encountering

on a daily basis.

Section 2, below, provides a short introduction to some basic biological concepts, which are essential to

understand subsequent material. Section 3 covers coronaviruses and their structure and replication. Section

4 looks at the pathogenesis, treatment, detection, and other topics specific to COVID-19. As the report is

aimed at a broad spectrum of previous knowledge, feel free to skip over sections that you are already

comfortable with. Should anything later prove to be unclear, perhaps return to the initial sections to refresh

the relevant basic information. For the basic concepts, only those aspects needed to understand later

discussions related to coronaviruses is presented. Anything extra is omitted and readers should understand

that there is much more to even the basic concepts that can be enlarged upon by reading foundational

textbooks. Similarly, to improve the flow of the text, source citations are not provided for every single

scientific concept and are reserved for specific, recent SARS-CoV-2 issues. Where possible, there is a focus

on SARS-CoV-2, but information will also be presented relative to Severe Acute Respiratory Syndrome

(SARS) and Middle East Respiratory Syndrome (MERS) coronavirus as these two have been present for

much longer and have a more substantial experimental literature base. Due to the recent emergence of

SARS-CoV-2, there has been a flood of information that has not undergone the normal scientific peer

review and many contradictory ideas have been made public. It is possible that some of the information

related to COVID-19 presented below may be corrected by future studies or may be already subject to some

debate. The information cut-off date for this report is June 23, 2020.

2 DRDC-RDDC-2020-D079

2 Basic Concepts

2.1 Deoxyribonucleic acid (DNA)

Deoxyribonucleic acid (DNA) is the central information storage molecule for a wide range of organisms

ranging from some viruses and all bacteria to plants and humans. DNA is a polymer consisting of four

different nucleotides (Figure 1) each of which consists of the sugar deoxyribose, phosphate, and a

nitrogen-containing aromatic called a base. The phosphate position (referred to as 5') binds to the hydroxyl

end of another deoxyribose (referred to as 3') yielding a repeating phosphate-sugar-phosphate-sugar

backbone. The four bases are adenine, guanine, cytosine, and thymine (Figure 1) and these protrude from

the phosphosugar backbone. Adenine and thymine can form two hydrogen bonds between them while

guanine and cytosine can form three (Figure 1). Doing so stabilizes two complementary DNA strands

together, one in the 5'–3' direction (called the coding or sense strand) and the other in the 3'–5' direction

(called the complementary or antisense strand). This stable association is the famous double helix structure

(Figure 1).

2.2 Ribonucleic acid (RNA)

Ribonucleic acid (RNA) differs from DNA only in that the sugar is ribose (has an extra hydroxyl group)

and the base uracil replaces thymine (Figure 2). These small changes yield a dramatic difference in

properties. While the phosphosugar backbone assembles the same way, the strand remains single and does

not pair off with a complementary strand (Figure 2). The molecule can internally hydrogen bond if

complementary sequences exist, yielding complex secondary structures (Figure 2). A single RNA molecule

may have multiple possible secondary structures of varying stability, while some may stay in one specific

conformation. RNA can play many roles biologically. In human cells, it is a short-lived intermediate coding

material that passes information from DNA as a template for the creation of proteins (messenger RNA). It

also plays a direct role in the assembly of proteins by carrying individual amino acids (transfer RNA) and

assembling the amino acids (ribosomal RNA). In addition, some viruses have an RNA genome instead of

one made of DNA.

2.3 Proteins

Proteins consist of a series of amino acids that have an amino group (-NH2) and a carboxylic acid group

(-COOH) (Figure 3). The amino and carboxylate groups can stably bond giving a peptide bond

(-NH-COO-). Each amino acid also has a functional group, which can be acidic, basic, neutral, thiol, or

aromatic (Figure 3). A chain of amino acids thus ends up with a variety of local interactions (charge,

hydrogen bonding, hydrophobicity, disulfide bond formation), which can drive secondary and tertiary

structure formation (Figure 3). Proteins fold into the most stable conformation for their environment at a

given time and form a huge variety of complex shapes allowing them to act as structural materials, enzymes,

receptors, transporters, etc. Enzymes are proteins that can catalyze specific reactions. A specific class of

enzyme, proteases, will be referred to numerous times below. A protease is a protein that can cleave another

protein. Another important type of enzyme, polymerases, can replicate DNA or RNA from a template.

The sequence of amino acids in a protein is directly determined by the nucleotide sequence in DNA. Since

there are 20 amino acids found in proteins, each individual amino acid is coded by three sequential DNA

bases (for example, ATG encodes methionine). A triplet code allows for 64 combinations, so there is

redundancy in the system for some of the amino acids (for example AGA, AGG, CGA, CGC, CGG, and


CGT all code for arginine). A functional consequence of this redundancy is that it is possible to have a

mutational change in DNA sequence without changing the resulting amino acid in the encoded protein. It

is also possible that a DNA mutation can give rise to a major change in amino acid.

Since there are three nucleotides coding for one amino acid, there are three separate possibilities (or frames)

for where to begin reading the code. For example, take the sequence ACGACGACGACG: this will translate

as EEEE (glutamates). If you started on the second nucleotide CGACGACGACG: this will translate as

RRR (arginines). If you started on the third nucleotide GACGACGAC: this will translate as DDD

(aspartates). These are the three frames for this sequence. This concept is utilized by some viruses to

minimize the size of their genomes. Some genes will overlap and the resulting proteins are synthesized

from different frames. Also, changes in frame during protein production can be used as a loose mechanism

to regulate how much of a protein is produced (as described below).

2.4 Membranes

Individual cells are bounded by a lipid membrane (Figure 4). The main component are phospholipids, which

have a hydrophobic head and a hydrophilic tail. In a water-based environment, these will self-assemble into

a bilayer with the heads facing out and the tails facing inside (Figure 4). Membranes also contain other lipid

types, sterols (such as cholesterol), and specific types of protein. Membranes behave as a two-dimensional

fluid with the lipids, and items embedded in them, freely moving around on their side of the bilayer. Some

proteins may bridge both layers of the membrane. Human cells not only are bounded by an external

membrane (cytosolic membrane), but have numerous membranous bodies inside that compartmentalize cell

functions (some of which will be encountered below). The endoplasmic reticulum is an elongated vesicular

system where many, but not all, proteins are synthesized and modified in the human cell. Proteins move in

small vesicles from the endoplasmic reticulum to the Golgi body, which is a separate system of elongated

membranes where proteins are further modified. Small vesicles may internalize from the outer cell

membrane (endocytosis) or may fuse into it (exocytosis). Many viruses use the former route as a way into

target cells and use the latter as a way for progeny to escape.

2.5 Information Processing in Human Cells

Genetic information is stored long term in DNA and is organized in discrete portions called genes. Genes

have a number of important components, of which only a few need to be understood here. A gene will have

a transcriptional start and stop signal, which tells the protein enzyme RNA polymerase (more accurately

DNA-dependent RNA polymerase) where to start producing an RNA copy and where to stop doing so. The

gene will also have a promoter, which is a stretch of sequence before the transcriptional start site that

regulates when the gene is allowed to be transcribed to RNA. Control of the promoter can occur via a

number of different interactions, which are often driven by the binding or release of proteins to the promoter

site. Binding can be due to the presence of an important compound (such as a nutrient) or the lack of it, or

some change made to the protein, such as phosphorylation, during interaction with other proteins.

When conditions are such that the promoter is in the “on” state, the DNA around the gene will unwind and

RNA polymerase can bind to the transcriptional start site (Figure 5). The enzyme starts to copy the

anti-sense strand of the DNA with complementary RNA nucleotides and moves along the DNA strand

while doing so. During this process, the 5' end of the emerging RNA molecule is capped with an unusual

7-methylguanine nucleotide attached via 3 phosphate groups. When the transcriptional stop site is reached,

the polymerase comes off the DNA and releases the RNA strand. The latter immediately undergoes

processing at the 3' end where a string of adenosine nucleotides is added (the poly-A tail). This RNA is

now able to act as messenger RNA (mRNA).


The mRNA contains within it the coding sequence for a protein with a translational start site and termination

site. Some genes have additional on/off control at this stage due to the secondary conformation potentials

of RNA molecules (Figure 5). Under certain conditions the RNA folds to prevent access to the translational

start site, while under others it folds to permit such access. Alternatively, under certain conditions, proteins

can bind to the RNA, which block access to the translational start site. When translation is permitted,

ribosomes (a large complex of proteins and ribosomal RNA) bind to the mRNA and coordinate the insertion

of corresponding transfer RNAs, which are carrying the appropriate amino acid (Figure 5). The peptide

bond is created between adjacent amino acids and the ribosome moves down the RNA chain. A polypeptide

chain emerges until the ribosome hits the translational stop signal and drops off the mRNA.

The polypeptide chain undergoes folding, with or without the assistance of other proteins, to yield its normal

configuration (Figure 5). Many proteins have small stretches of sequence (usually at the amino end, but

sometimes at the carboxyl end or even in the middle), which signals where the protein needs to go in the

cell. This can be for excretion across the cytosolic membrane into the external environment, or it can mean

into a specific membranous sub-compartment of the cell.

2.6 Viruses

Viruses are the simplest biological entities known. At their most minimal, they consist only of nucleic acid

encoding as little genetic information as possible along with a protective protein shell. More complicated

viruses may have outer membranes derived from their host cells and additional proteins for binding and

invasion. The most complicated viruses may also carry a few extra enzymatic proteins they need to perform

specific functions that the host cell may not provide. Viruses are known as obligate parasites as they are

completely reliant on the machinery of the host cell to complete their life cycle. Outside of the host cell,

viruses might not even be considered as alive as they perform no active biological functions. All life forms

on earth, from bacteria to humans are subject to viral infection.

In general, viruses tend to be broadly classified based on their type of genome and replication strategy:

1. Replicates DNA to DNA.

a. Single-stranded DNA genome: such as Parvoviruses.

b. Double-stranded DNA genome: such as Herpesviruses, Orthopoxviruses, Adenoviruses.

2. Replicates DNA to RNA to DNA, or RNA to DNA to RNA.

a. Double-stranded DNA genome: such as Hepadnaviruses.

b. Positive, single-stranded RNA genome: such as Retroviruses.

3. Replicates RNA to RNA.

a. Positive, single stranded RNA genome: such as Coronaviruses, Picornaviruses, Flaviviruses.

b. Double-stranded RNA genome: such as Reoviruses.

c. Negative, single-stranded RNA genome: such as Filoviruses, Rhabdoviruses,

Paramyxoviruses.


In this classification, positive means the RNA strand is in the sense direction (like an mRNA), while

negative means the strand is in the antisense direction and cannot directly produce protein in the host cell.

The genome size of viruses is quite small and varies from as little as 1.7 kbases (kb) for Circovirus (which

infects pigs) to 2500 kb for Pandoravirus (which infects amoebae). For RNA genomes, the largest are those

of the Coronaviruses that can reach 30 kb. Compare this to the genome of Escherichia coli at 4,600 kb,

humans at 3,300,000 kb, and wheat at 17,000,000 kb. Physically, viruses are also very small and are a

fraction of the size of bacteria. In fact, viruses were first discovered as infectious agents that passed through

filters, which screened out bacteria. Viruses have a variety of structural forms, but the most common way

protein assembles to protect the nucleic acid is by forming an icosahedral shell with the genome inside or

by attaching directly to the genome to form a beads-on-a-string appearance.

Viruses have a very wide range of strategies for organizing their genome, producing protein, replicating,

and spreading. As a generality, a virus infects its host cell by specific interaction with a surface receptor.

The distribution of the specific receptor on the surface of varying cell types explains the host cell range the

virus can infect. For example, polio virus binds to CD155, found only on primate epithelial cells, which

explains its primary infection of the intestinal epithelium. Influenza virus binds to any cell surface

displaying the carbohydrate sialic acid, which explains its ability to infect birds and mammals. Once inside

the cell, a virus uncoats its genome and hijacks the host cell apparatus to replicate the genome, produce

protein, and assemble complete virions.


3 Coronaviruses

3.1 General Information

The first coronaviruses were discovered in the 1930s from chickens suffering from bronchitis, pigs with

gastroenteritis, and mice with hepatitis. In the 1960s, electron microscopy allowed the first visualization of

these viruses, which were found to share a unique morphology that resembled a solar corona. For many

subsequent years, it was thought that only two coronaviruses infected humans and were the cause of a

percentage of (but not all) common colds. More recently, two more human coronaviruses have been found

that can cause the common cold. In 2002, a completely new disease, SARS, was found to be caused by a

novel coronavirus. This virus mysteriously disappeared from human circulation within one year. In 2012,

another novel coronavirus was found to cause the new disease Middle East respiratory syndrome (MERS),

and this virus has continued to circulate at low levels to the present day. In late 2019, yet another novel

disease has erupted (COVID-19), which is again caused by a novel coronavirus. The long-term persistence

of this virus is not yet clear.

Coronaviruses belong to the Nidovirales, which contain the Coronaviridae, Arterioviridae, Roniviridae,

and Mesoniviridae. The latter three groupings are not of concern for human health. All of these viruses

share several common traits (all of which will be enlarged upon later):

1. The same genomic organization.

2. The expression of a large polyprotein via ribosomal slipping/frameshifting.

3. Specific enzymatic activities within the polyprotein.

4. The production of nested sub-genomic mRNAs.

The Coronaviridae are divided into the Coronavirinae and the Toronivirinae, the latter of which are no

concern to human health. Examination of the complete genome sequences of members of the Coronavirinae

clearly shows that there are four distinct groupings of coronavirus: Alphacoronavirus, Betacoronavirus,

Gammacoronavirus, and Deltacoronavirus (Figure 6). Among the alphacoronaviruses, the type specimen

is the pig transmissible gastroenteritis virus, and the group also contains the human coronaviruses 229E and

NL63, which cause the common cold. Among the betacoronaviruses, the type specimen is the mouse

hepatitis virus, and the group also includes the human coronaviruses HKU1 and OC43 (which cause the

common cold), SARS-CoV, MERS-CoV, and SARS-CoV-2 (which cause severe respiratory syndromes).

Both Alphacoronavirus and Betacoronavirus also contain a large number of bat viruses. Among the

gammacoronaviruses, the type specimen is the chicken infectious bronchitis virus, and there are no

human-infecting members. For the deltacoronaviruses, the type specimen is Bulbul virus HKU11, and there

are no human-infecting members.

Coronaviruses have a distinctive structure (Figure 7). The outer layer of the virus is a membrane derived

from the infected host cell. More specifically, it is an inside-out portion of the endoplasmic reticulum Golgi

intermediate compartment. This is a membranous vesicle moving from the endoplasmic reticulum (where

many proteins get made) to the Golgi (where many proteins get modified). It is presumed that the

constituents of the membrane are typical for the host cell (e.g., lipid types, cholesterol content) but it is

difficult to find any information to confirm or contradict this assumption. In this membrane are embedded


three viral proteins: the spike (S) protein, which gives the coronavirus its halo appearance; the membrane

(M) protein; and the envelope (E) protein. Some betacoronaviruses, such as human coronavirus OC43 and

HKU1, have an additional membrane protein called hemagglutinin-esterase (HE). SARS-CoV,

MERS-CoV, and SARS-CoV-2 do not have the HE protein. Internal to the membrane is the genomic RNA

covered with the nucleocapsid (N) protein. The average virus is about 120 nm in diameter, which includes

the 20 nm the S protein extends beyond the membrane.

3.2 Genome and Products

The positive, single-stranded RNA genome of coronaviruses range from 25 to 32 kb, with SARS-CoV-2

being 29.9 kb in length, SARS-CoV 29.8 kb, and MERS-CoV 30.1 kb. The genomic RNA has a

5'-methylguanine cap and a poly-A tail (as described above), and can thus act directly as an mRNA in the

infected cell. The genomes have a highly similar, but not identical structure (Figure 8). In addition to the

polyprotein (rep 1a and rep1b) and structural genes (S, E, M, and N), there are a variable number of

additional genes known as accessory genes. All accessory genes appear to be expressed during the virus

life cycle, but the functions are not always clear. The polyprotein and structural genes are absolutely

essential for completion of the viral life cycle, but deletion of accessory genes has no effect on completion

of the SARS-CoV life-cycle in vitro. However, deletion of accessory proteins does appear to have negative

effects on viral replication in vivo, suggesting some modulatory effect on the host.

The very large genes 1a and 1b produce what is called the replicase-transcriptase complex. Through a

mechanism known as ribosomal slipping (discussed further below), the genes are translated as two large

polyproteins: 1a and 1ab (Figure 9). The SARS-CoV-2 1a protein is 4044 amino acids in length while the

1ab is 7095 amino acids. The polyproteins consist of a number of segments labelled nsp1-nsp16. A protease

domain within nsp3 cuts between nsp's 1&2, 2&3, 3&4 to free those four as separate proteins. A different

protease in nsp5 cuts all the remaining junctions. These processed nsp's self-assemble to form the

replicase-transcriptase complex with several components (nsp3, nsp4, nsp6) embedding in the membrane

of the endoplasmic reticulum in the host cell to create special double-membrane vesicles dedicated to viral

replication. This complex is responsible for the copying of the genomic RNA and also the production of

sub-genomic RNAs needed for the production of S, M, E, and N proteins. Of the proteins in the complex,

nsp12 protein is the RNA-dependent RNA polymerase and binds the genomic RNA with nsp7 and nsp8.

The polymerase catalyzes the formation of a complementary, negative strand of RNA. The nsp13 protein

has helicase activity for unwinding RNA and also has an RNA 5'-triphosphatase involved in the synthesis

of the 5'-cap. The nsp14 protein and nsp16 protein are methyltransferases for making the methylguanine

found in the 5'-cap. The nsp14 protein also has an exonuclease activity that allows proof-reading of the

synthesized RNA product. Such an RNA proof-reading ability is highly unusual amongst RNA virus, and

coronaviruses have a much lower mutation rate than viruses such as influenza as errors can be fixed by the

exonuclease. The nsp1 protein is an inhibitor of host cell protein synthesis that allows resources to be used

by the viral replicase. The nsp15 protein is an RNA nuclease that has an unclear function. It may be involved

in degrading incompletely copied or excess viral RNAs.

The spike (S) protein found in the outer membrane of the virus binds to specific host cell surface receptors

to initiate cell infection. The S protein from SARS-CoV-2 is 1273 amino acids in length, SARS-CoV 1255

amino acids, and MERS-CoV 1353 amino acids. A single S protein molecule consists of two distinct

domains of equal size, called S1 and S2 (Figure 10). S1 is quite variable across coronaviruses and represents

the receptor-binding area of the protein, while S2 is quite highly conserved and provides the membrane

anchor. Three S monomers self-assemble as a trimeric complex in the membrane, with the S1 trimers

forming a bulb-like shape, and the S2 trimers forming a stem. During synthesis and assembly, the S protein

is glycosylated by the addition of sugars (predominantly sialic acid) to free amino groups on the outside of


the protein. The protein also contains sites for cleavage by the host cell surface protease furin. Some, but

not all, coronaviruses have a furin site at the border of the S1 and S2 domains. Cleavage of this site during

virus production aids, but is not essential for, virus uptake into the host cell. There is a different furin site

in the S2 domain (called S2’), which must be cleaved during receptor binding for virus uptake to occur.

The membrane (M) protein is the most abundant of the proteins found in the membrane. The M protein

from SARS-CoV-2 is 222 amino acids in length, SARS-CoV 221 amino acids, and MERS-CoV 219 amino

acids. Only a small portion of the M protein protrudes outside the membrane, with the bulk of the protein

extending under the membrane into the virion (Figure 11). This internal portion has two important regions,

one of which interacts with a similar area on the S protein, and the other of which interacts with the N

protein coating the genomic RNA. Like the S protein, the external portion of the M protein is modified by

glycosylation. The M protein appears to be assembled as a dimer into the membrane.

The envelope (E) protein is the least abundant of the membrane proteins. The E protein from SARS-CoV-2

is 75 amino acids in length, SARS-CoV 76 amino acids, and MERS-CoV 82 amino acids. There is only a

very small portion of the E protein that protrudes outside of the membrane, and, unlike S and M, is not

glycosylated (Figure 12). The E protein has a large internal portion extending under the membrane into the

virion. The E protein appears to assemble into oligomers, and a pentameric assembly has been determined

for SARS-CoV. In vitro, the E protein form SARS-CoV has been shown to act as an ion channel for sodium

and potassium ions. The role of the E protein is not completely understood, but it is essential for formation

of the membrane envelope of the virus. Whether oligomerization is necessary, or whether ion channel

activity is needed is not clear.

The nucleocapsid (N) protein is the only protein that binds to the genomic RNA of the virus. The N protein

from SARS-CoV-2 is 419 amino acids in length, SARS-CoV 422 amino acids, and MERS-CoV 411 amino

acids. The N protein contains two RNA binding domains (Figure 13), and is modified by phosphorylation

at a small number of threonine and serine sites. It is hypothesized, but not proven, that phosphorylation

enhances RNA binding of the protein. The N protein coats the RNA in a helical manner, but the N-RNA

complex appears to remain flexible. The C-terminal RNA binding domain is known to form dimers and

then these dimers can form higher order oligomers. It is thought that this C-terminal domain forms an

internal scaffold for the RNA to bind around, with the N-terminal domain acting as an outside “cover”

(Figure 13).

Only the M and E proteins are absolutely essential to form the virus envelope. Eukaryotic cells that have

been genetically modified to express only the coronavirus M and E proteins will produce empty membrane

envelopes known as virus-like particles (VLP). Additional expression of the N protein will enhance the

production of VLP and the expression of S protein will allow insertion of S into the VLP envelope. In all

these cases, there is no viral genome and the VLP are non-infectious.

As mentioned above, there are a number of additional accessory genes in the genome of coronaviruses. The

function of many, but not all, has been studied in SARS-CoV:

3a: Up-regulates pro-inflammatory cytokines during infection, and up-regulates fibrinogen levels

in the lungs. The former may be involved in “cytokine storm” seen in some patients while the

latter may explain the clotting syndrome seen in others.

3b: Up-regulates cytokines.

6: Supresses interferon responses during infection. These responses are known to have a general

antiviral effect.


7a: Up-regulates cytokines.

7b: Unknown.

8a: Enhances efficiency of viral replication in some manner. Induces cell death of host cells. In

vitro, the protein has been shown to form an ion channel.

8b: Induces inflammation in the host.

9b: Induces cell death of host cells. Induced cell death may be a mechanism to avoid a wider

antiviral response in the host.

The accessory proteins in SARS-CoV-2 are not all identical to those seen in SARS-CoV and their function

has not yet been elucidated. The SARS-CoV-2 accessory protein genes are designated 3a, 6, 7a, 7b, 8, and

10. Based on structure, 3a, 6, and 7a are thought to perform the same tasks as the corresponding proteins in

SARS-CoV. Very recent work [1] has cast doubt on whether 10 is produced.

3.3 Life Cycle

3.3.1 Binding and Uptake

The infectious coronavirus binds via the S protein to a specific cell-surface receptor. In the case of both

SARS-CoV-2 and SARS-CoV the receptor is the human angiotensin-converting enzyme 2 (ACE2). The

ACE2 enzyme is an important cell-surface protease involved in the regulation of blood-pressure and is

expressed primarily on epithelial cells of the airway and the small intestine, with lower levels found on the

heart and kidney. MERS-CoV S protein binds to the human dipeptidyl peptidase 4 (DPP4), which is also

called CD26. DPP4 is a cell surface protease with a much wider tissue distribution.

After initial binding, the host cell-surface protease furin (more specifically the furin TMPRSS2) cleaves

the S2’ furin site in the S2 domain (Figure 14). This cleavage changes the configuration of the S protein to

expose membrane fusion domains and brings the virus outer membrane and the host cell membrane into

contact. The lipid bilayers mix and the viral membrane is effectively incorporated into the host cell

membrane allowing the nucleocapsid to enter the cell cytoplasm.

As an alternative uptake route, the entire virus may be endocytosed after making contact with the ACE2

cell surface receptor. Inside the cell, furin in the vesicle membrane will cleave the S2’ furin site and the

conformation change of the S protein will bring the viral membrane into contact with the vesicle membrane.

The membranes fuse and the nucleocapsid enters the cell cytoplasm.

3.3.2 Transcription and Translation

Once inside the cell, the N protein comes off the genomic RNA (Figure 14). Due to the presence of a

5'-cap and 3'-polyA tail on the genomic RNA, it is recognized by the host cell ribosomes as an mRNA.

Starting at the 5' end of the genomic RNA, the host cell ribosomes synthesize the 1a and 1ab polyproteins.

These are the only proteins directly produced from the genomic RNA. The production of two different

lengths of polyprotein from one large gene by ribosomal shifting is believed to be an easy way to regulate

the production of proteins that are needed in larger amounts from those need in lesser amounts. Polyprotein

1a is produced about 75% of the time and 1ab about 25% of the time, yielding about 4 times as many “a”

proteins as “b” proteins.


Ribosomal slipping (Figure 15) is a translational strategy that is not very common but is effective for viruses

where they do not wish to encode for more complex regulatory systems. Ribosomal frameshifting has two

elements: a “slippery” sequence of nucleotides (in the case of SARS-CoV-2 it is UUUAAAC) just before

a semi-stable RNA secondary structure, which can hide the 1a stop signal in one conformation. The

ribosome comes along in the 1a frame making polyprotein 1a, causes the secondary structure to relax (75%

of the time) and hits the stop signal, which is in the 1a frame. The ribosome then falls off and the polyprotein

is done as 1a. In the other 25% of the time, the ribosome comes along making 1a and is obstructed by the

RNA secondary structure. The ribosome steps back 1 nucleotide in the “slippery” sequence entering the 1b

frame. The secondary structure then resolves but the stop signal is no longer in frame and the ribosome

continues on its way adding 1b to 1a. The functional consequence of this approach is that the proteins need

for cutting the polyproteins and assembling the replicase complex (nsp's 1–10) are in higher abundance

than the proteins need to replicate the genome (nsp's 12–16).

Having now assembled the replicase-transcriptase complex, the genomic RNA is now replicated back and

forth positive strand to negative strand to positive strand. Full length positive-stranded genomic RNAs are

then 5'-capped and 3'-polyadenylated. The virus now also uses another approach to regulate abundance of

proteins and genomes. The RNA-dependent RNA polymerase rarely makes it the full length of the RNA

template to make a complete genome copy. At specific sites at the end of each gene, the RNA polymerase

can come off the template RNA. In this manner, it makes a series of nested sub-genomic RNAs (sgRNA)

that encode for an increasing number of genes but found in a decreasing abundance. The sgRNA for N is

the most abundant, then the sgRNA for N-M, then the sgRNA for N-M-E, then the sgRNA for N-M-E-S,

and lastly the whole genomes (omitting, for clarity, the accessory genes). The sgRNAs are then used as

mRNA templates by the host cell ribosomes to make the structural and accessory proteins for the virus.

This nested transcriptional approach is another low-burden mechanism to regulate which proteins will get

made in the highest abundance. In this case, it would be N, which is needed in large numbers to coat the

genomic RNA.

3.3.3 Assembly and Processing

The N proteins that have been made coat the entirety of complete copies of the positive-stranded genomic

RNA (Figure 14). The newly synthesized M, E, and S proteins are inserted into the inner face of the

endoplasmic reticulum and get glycosylated. Vesicles with these three embedded proteins move from the

endoplasmic reticulum towards the Golgi body, becoming the endoplasmic reticulum Golgi intermediate

compartment. It is at this point that the nucleocapsid interacts with the viral membrane proteins and the

membrane closes around the nucleocapsid. The precise mechanism by which this occurs is not clear. M and

E proteins are essential for the formation of a closed viral membrane, and the M protein is known to have

a domain, which directly interacts with the N protein. How the process avoids incorporating incomplete

positive-stranded RNAs (including sgRNA) or any of the negative-stranded intermediates produced during

replication is not known. It is possible that there is a packaging signal present at the 5' end of a complete

genome, but this has not been conclusively demonstrated for most betacoronaviruses (including

SARS-CoV-2).

The fully assembled virus now sits within a membranous vesicle that moves towards the outer cell

membrane (Figure 14). The process of targeting to the outer membrane and the subsequent release are not

well understood. The vesicle membrane merges with the outer cell membrane and the progeny virus is

released into the extracellular environment to go and infect another cell.


4 Covid-Specific Issues

4.1 Pathogenesis

In comparison to the common cold coronaviruses, which infect respiratory epithelial cell of the upper

respiratory tract (nose, throat, trachea), SARS-CoV, MERS-CoV, and SARS-CoV-2 initiate an infection in

the upper respiratory epithelium and then spread into the lower respiratory tract (lungs) to cause severe

illness. The disease is spread via respiratory droplets produced during coughing, sneezing, talking, spitting,

etc. These droplets can be encountered directly as an airborne aerosol or indirectly after settling/adhering

to objects that are touched. In the former case droplets are inhaled or contact mucous membranes, and in

the latter, after touching a contaminated object, an individual makes contact with their own face. The

infection becomes established in the upper airway, which may be associated with mild or no symptoms.

The virus may become established in the lower respiratory tract and other tissues that express the ACE2

cell-surface protein. In the lung these cells are primarily type II pneumocytes and vascular endothelium,

while the proximal tubular epithelium in the kidney, myocardium in the heart, and enterocytes in the

intestinal tract may also become infected.

Cell death from the virus and the over-production of inflammatory cytokines leads to fever and muscle

aches, while white-blood cell infiltration in the lungs leads to airway irritation and coughing. Loss of taste

and/or smell may occur. In more severe cases, fluid accumulates in the lung alveoli leading to lowered

blood O2 saturation, breathing difficulties, and pneumonia. Myocardial cell damage can lead to heart

arrythmia, and kidney infection can lead to acute organ damage. An out-of-control inflammatory/cytokine

response can lead to shock. Peripheral blot clots have also been reported in some patients. These symptoms

may be exacerbated by pre-existing conditions such as hypertension, heart disease, diabetes, or lung disease

(including damage from smoking or vaping). Age is an important factor in the development of the disease,

with the majority of deaths occurring in the elderly and most children suffering minor symptoms. In fatal

cases, death may be from pneumonia, severe respiratory distress, or heart or kidney failure. The fatality rate

is often given as around 1–3% of clinical cases, but the true fatality rate may be substantially lower if one

takes into account asymptomatic or mild symptom infections. SARS was estimated to have a case fatality

rate of around 10–14% whereas MERS has one around 30%, so COVID-19 is clearly less pathogenic than

the previous acute respiratory syndromes.

The length of time from initial infection to the onset of symptoms (incubation period) is generally 5–6 days,

although 2–14 days can occur. Using the nasopharyngeal swab test and reverse-transcriptase PCR assay for

the presence of the viral genome, patients generally stop producing positive tests 14 days after the end of

symptoms. However, cases where positive tests persist for months after the end of symptoms have been

reported. A recent report [2] that examined viral loads in throat swabs calculated that patients were

infectious as early as 2.3 days before the onset of symptoms and viral loads peaked at 0.7 days before the

onset of symptoms. If this work is confirmed, it is possible for people to be infectious before the onset of

symptoms and this state may play a role in the rapid spread of the disease. A study [3] that isolated infectious

viruses via tissue culture found an inability to do so 8 days after the onset of symptoms and calculated

(based on their sample size) that infectious virus is shed for about 10 days after the onset of symptoms.

The issue of asymptomatic or very mild cases is important but not completely understood. A recent study

in the city of Geneva [4] using a validated serological immunoassay for antibodies against SARS-CoV-2

found that there were ten times as many seropositive individuals as those who were identified as “cases” in

the city's clinics. Another recent study in Belgium found that there were 15 times as many seropositive


individuals as those who were identified as “cases” during hospital admission [5]. These results suggest

that the rate of mild infections is substantial and may also contribute to the rapid spread of the disease. If

the approximate 10:1 ratio holds true, then there will have been substantially more transmission of

SARS-CoV-2 than the numbers reported as clinical cases. Wider use of serological testing will help clarify

this issue.

In the natural progression of the disease, the initial host response is from the innate immune system. Viral

RNA should be recognized as foreign, leading to interferon production that stimulates cytokine production

and white blood cell activation (specifically T cells and macrophages). The T cells play a large role in

destroying infected cells and removing circulating virus. The body needs to balance this response with a

corresponding “dampening” to prevent excessive inflammation and cytokine overproduction. After this

initial antiviral response, the adaptive immune system, where antibodies are produced against foreign

proteins, will prevent recrudescence of the virus. As mentioned above, coronaviruses produce a number of

proteins that can interfere with the innate immune response. A recent study [6] suggests that

SARS-CoV-2 is particularly good at preventing the production of interferons while stimulating

inflammatory cytokine release.

4.2 Treatment and Prophylaxis

Current treatment of COVID-19 is entirely based on alleviating symptoms and prevention upon social

distancing and avoidance of transmission. There is, however, an intense interest and much research into

drugs to treat the disease and vaccines to prevent it. Antiviral drug therapy is difficult when compared to

other types of pathogen due to the fact that viruses produce very few targets due to their organizational and

structural simplicity. In the case of SARS-CoV-2, there are the two proteases needed to cleave protein 1ab,

the RNA-dependent RNA polymerase, the proof-reading exonuclease, a helicase, and two

methyltransferases. In addition, one can interfere with the normal transiting of the virus in the host cell or

physically prevent binding of the virus to the cell-surface receptor. This last possibility is very tricky for

SARS-CoV-2 as the receptor is ACE2, which has an important role in blood-pressure regulation. It would

be necessary to find a compound that could interfere with viral S protein–ACE2 interactions without

disturbing ACE2–angiotensin interactions.

The most obvious target for an antiviral is the unique RNA-dependent RNA polymerase needed to replicate

the genome. RNA polymerases have been successfully targeted in other viruses such as influenza and Ebola

virus, where favipiravir has been approved for use. For coronaviruses, the compound remdesivir (Figure

16) has been demonstrated to inhibit the polymerase and also virus replication in vitro. Based on work done

with SARS-CoV and MERS-CoV polymerases, remdesivir is known to be incorporated into the growing

RNA chain by the polymerase, but then prevents further addition of other nucleotides and prevents genome

replication. This type of inhibition is known as chain termination. Moreover, the compound was used in

clinical trial during the 2014–16 west African Ebola virus outbreak and was found to be safe (albeit not

effective enough against Ebola virus). Remdesivir is currently undergoing clinical trial for use in

COVID-19 infections and early indications are that it does decrease both the duration of symptoms and

mortality rate [7]. The older antiviral ribavirin (Figure 16) interferes with the polymerase and also with the

formation of the 5'-cap on the genomic and sgRNAs. Ribavirin has in vitro antiviral activity against

SARS-CoV-2 and is in clinical trial, particularly in combination with protease inhibitors. However, past

clinical experience with ribavirin has shown that the drug is poorly tolerated by many patients.

Viral protease inhibition is a mainstay of anti-HIV therapy. The coronavirus polyprotein proteases are

essential for formation of the replicase-transcriptase complex and viral replication cannot occur without

cleavage of the polyprotein. As such, the proteases are attractive targets for drug therapy. The combination


of lopinavir/ritonavir (Figure 16) can inhibit viral replication in vitro and are undergoing clinical trial on

their own and in combination with other types of antiviral. Early reports [8] seem to indicate that use of

lopinavir/ritonavir did not yield a noticeable shortening of clinical progression nor fatality rate.

The nsp13 helicase is not as well studied as an antiviral target, but work has been done with the SARS-CoV

enzyme. Compounds were discovered that inhibited helicase activity and also inhibit viral proliferation in

vitro. These compounds have yet to be further tested in vivo. There are a number of FDA approved

compounds that target various helicases, but these have not been screened against SARS-CoV-2. The nsp14

methyltransferase from SARS-CoV has been similarly studied and compounds that inhibit enzyme activity

in vitro have been discovered.

Compounds that exert antiviral effect by interfering with the normal transport of the virus into, within, or

out of the host cell have received the most public attention, albeit not always in the most positive manner.

In 2004 it was shown that chloroquine (Figure 16) could prevent viral proliferation of SARS-CoV in vitro

and subsequently was found to prevent the course of disease in mouse models of SARS.

Hydroxychloroquine is a closely related analogue that is better tolerated in people. As these two antimalarial

compounds are known to alter the pH within certain membranous cell compartments, it was thought that

the antiviral activity was due to this pH change preventing normal transiting of the virus in vesicles within

the infected cell. Regardless of the lack of clarity on antiviral mechanism of action, the compounds have

been examined in numerous clinical trials. Unfortunately, the risk of potentially lethal cardiac side-effects

was found to be too great and the amount of clinical improvement too little to warrant further use [9].

Hydroxychloroquine was also found to have little effect as a prophylactic in preventing the development

of COVID-19 after exposure [10]. The antihelminthic drug ivermectin has similarly been shown to exert

antiviral activity in vitro and is currently being examined in clinical trial. Early indications have been

promising, with ivermectin treatment associated with a decrease in the fatality rate [11].

One issue to keep in mind for the development of antiviral drugs is that, to date, the greatest successes have

come against the treatment of chronic viral infections such as cold-sores or HIV. It appears to be quite

difficult to create an effective antiviral drug for acute viral infections that last a short period of time (two

weeks for SARS-CoV-2). By the time people know they are sick, there is often a substantial viral load and

a short time to natural resolution. As an illustration, one can see the process with Tamiflu (oseltamivir for

influenza). This was an effective antiviral in vitro and in animal models. However, in clinical use, the drug

had to be taken as soon as symptoms were noticeable and then only led to a 50% chance of shortening the

length of symptoms. Similarly, favipiravir performed very well in vitro against Ebola virus, but only gave

a marginal improvement in the course of infection during clinical trial in the 2014–16 west African

outbreak.

When compared to antivirals, vaccines for the prevention of acute viral diseases have a much more

successful track record. There are no existing vaccines for any human coronaviruses, as the common-cold

coronaviruses are considered a minor nuisance, SARS disappeared, and MERS has occurred in very low

numbers. In terms of commercial veterinary vaccines, there is a whole killed virus vaccine for canine enteric

coronavirus (an alphacoronavirus), an attenuated live virus vaccine for feline infectious peritonitis virus (an

alphacoronavirus), an attenuated live virus vaccine for bovine coronavirus (a betacoronavirus), a whole

killed virus vaccine for porcine epidemic diarrhea virus (an alphacoronavirus), and an attenuated live virus

vaccine for avian infectious bronchitis virus (a gammacoronavirus). So, there is no reason, beyond safety,

that a vaccine is not possible for SARS-CoV-2.

The entire basis for vaccine efficacy against coronaviruses is the antibody response against the S protein of

the virus. Natural infection with the common cold coronaviruses leads to a robust immune response [12],


albeit of fairly short duration (antibodies are detected one week post-infection, peak two weeks after

infection, and are insufficient to prevent reinfection by one year). This is why areas can be re-infected with

the common cold coronavirus on a regular basis. Studies on survivors of SARS showed that a longer

immune response was obtained with neutralizing antibodies against the viral S protein: antibodies appear

10–15 days post-infection, peak around four months and are still detectable in 84% of patients at 36 months

[13, 14]. About 10% of patients were antibody-negative between 16–24 months post-infection. It is not

clear if the antibody levels at 36 months would be sufficient to prevent reinfection. Very recent work with

survivors of COVID-19 [15] show that antivirus antibodies peak shortly after the end of infection (day 17)

and remain at that level out to day 49. The ultimate length of protection is yet unknown. A study in Chinese

COVID-19 patients has found that asymptomatic cases yield a lower immune response that is shorter-lived

[16]. Approximately 40% of these asymptomatic cases were antibody-negative by eight weeks post

infection. Therefore, it is possible that immunity from natural infection may be limited in duration. It is also

possible that vaccine formulations can be found that provide a longer-lived immune response, or it may be

necessary to be re-vaccinated on a regular basis. Individuals that work with anthrax are used to being

boosted on a yearly basis and many people are happy receiving influenza shots yearly. Therefore, a short-

term vaccine should still be more than suitable although many may complain. Once a vaccine platform has

been approved and used successfully for SARS-CoV-2, then production of the vaccine for the next

unknown severe coronavirus will be possible on a much shorter time-frame.

There are a large number of vaccine candidates undergoing clinical trial. The most advanced utilizes a

chimpanzee adenovirus modified to express the SARS-CoV-2 S protein [17]. This platform was previously

used to create a test vaccine for MERS-CoV that has just recently started clinical trials and had some

preliminary human safety data, which allowed it to move quickly forward for SARS-CoV-2. Very recent

early results on this vaccine are promising [17], but multi-dose administration (prime-boost) is already

being examined to improve the immune response [18]. In addition, there are a number of competing

adenovirus-based vaccines, vesicular-stomatitis virus modified to carry the SARS-CoV-2 S protein (this

platform was also used to create the Ebola vaccine), virus-like particles (see above) with the S protein,

injected mRNA encoding the S protein, DNA encoding the S protein, killed whole virus vaccines, and

attenuated live vaccines. In addition, work is being done on testing additives (adjuvants), which can boost

the strength and duration of the immune response.

4.3 Detection

Infection by SARS-CoV-2 is currently determined by a molecular assay that detects the presence of

coronavirus RNA, known as real-time reverse transcriptase polymerase chain reaction (rtPCR; Figure 17).

Test subjects are swabbed using a deep nasopharyngeal swab that is sent to a properly equipped laboratory.

The total RNA (both human and coronavirus) is isolated from the sample using any of a number of

commercial RNA extraction kits. The RNA is treated with an enzyme called reverse transcriptase, which

copies the RNA to complementary DNA strands. This DNA is then subject to real-time PCR where specific

nucleotide primers that target coronavirus genomic sequence are added. Control primers that target a

specific human sequence are also included to confirm that RNA was properly extracted and added to the

test reaction. Repetitive cycles of DNA amplification then occur, which also incorporate a fluorescent dye.

Eventually, enough copies of the targets are produced to allow fluorescent detection in the test device. The

number of amplification cycles that it takes for fluorescence to become detectable is directly related to the

number of initial copies of the target in the reaction mix.

At present, there is no single set of agreed upon target sequences in this type of assay. Many countries or

individual health jurisdictions may use different target sequences. The human control target is usually the

RNAse P gene. Two or three different SARS-CoV-2 targets are normally used in an individual test to


decrease the likelihood of a false positive or negative. The USA CDC test targets two sequences in the N

gene, while the French Pasteur Institute test targets sequence in the 1b and E genes. One commercial test

mixture, for example, targets 1ab, S, and N genes. In general, the tests are very accurate and specific (in

excess of 98%) when tested on mock samples and are highly sensitive (detecting as few as 10 gene copies

per reaction). The main places where issues arise are improper swabbing, storage, and transport problems,

or nucleic acid extraction problems.

One key issue with the rtPCR results is that it does not assay virus viability. The amplified regions in the

test are quite small (50–100 bases), so fragmented viral RNA can still give a positive reaction. It is thought

that this may be the explanation for why some patients keep producing positive tests long after the end of

COVID-19 symptoms. Fragmented viral RNA may be persisting in their bodies at very low levels.

Unfortunately, with PCR, the longer a target sequence being amplified, the harder it is to reproducibly

amplify it and clinical testing requires very high reproducibility.

Another issue with rtPCR testing is that it is unable to detect people that have been previously infected but

are now cured. This type of testing is important in terms of tracking the true spread of any disease and also

in determining the degree and length of immunity that might result. Normally, post-infection monitoring is

via serological testing, where one assays the levels in the blood of antibodies specific against the virus.

Since COVID-19 is a new disease, such tests need to be created and then studied for their accuracy,

specificity, and sensitivity. A large number of tests are currently undergoing this process and a few have

been approved by regulatory bodies.

In Canada, to date, the only serological tests that have been approved for use are laboratory-based

automated chemiluminescence assays on proprietary platforms. In these systems, magnetic beads are coated

with SARS-CoV-2 S protein and added to a blood sample (Figure 18). Antibodies in the blood that bind

the S protein will attach to the beads, which are then held magnetically and washed. An anti-human IgG or

anti-human IgM antibody, which has been conjugated with a luminescent reagent, are then added and bind

to any SARS-CoV-2 antibodies on the beads. The beads are then washed again and the remaining

luminescence reagents added. Light is produced in direct proportion to the number of antibodies on the

beads and is quantified by the machine. The antibody titres can then be quantitatively determined.

The most useful generic serological test is the enzyme-linked immunoassay (ELISA), which requires

laboratory capabilities but yields quantifiable results. Some ELISA tests have been approved for clinical

application in other parts of the world, but are not yet in wide spread use. In an ELISA (Figure 19), a test

plate is coated with the SARS-CoV-2 S protein (or the portion of it where antibodies normally bind), blood

samples are added and then washed away. Antibodies in the blood that bind the S protein will stay attached

and then a secondary antibody is added that binds to human antibodies. This secondary antibody carries a

tag that can produce colour or fluorescence allowing detection, and the intensity of the detected signal is

directly proportionate to the amount of specific anti-S antibody present in the blood sample. Therefore,

antibody titres can then be quantitatively determined. Eventually, with enough research data, we will know

how high the antibody titres need to be in order to protect against reinfection.

A simpler serological test system is the lateral flow immunochromatography strip (Figure 20) where the

blood sample is added to a small wick that has immobilized S protein at a target spot. As the liquid flows

by, the antibodies in the blood will bind the S protein and also become immobilized. A secondary antibody

that carries a colloidal gold tag can then bind to the first antibody giving a visible colored band. This type

of test does not require any special laboratory capability, but questions have been raised as to whether the

accuracy and sensitivity is sufficient. It also does not quantify the level of antibodies present and provides

a yes/no answer. Due to its high production levels during infection, antibodies are also produced against


the N protein and several test strip manufacturers have opted to use this protein in their product. Antibodies

against N are not protective, so their presence may be diagnostic but not indicative of immunity. Also, the

N protein is more highly conserved than the S protein, increasing the possibility that infection with other

coronaviruses could lead to a false positive reaction.

4.4 Relationships To Other Coronaviruses

One of the key questions relating to SARS-CoV-2 is how the virus arose and entered into humans. It is

known that a large number of betacoronaviruses infect bats and many such viruses have been isolated from

wild bat populations in China. In the case of SARS and MERS, very closely related viruses have been

isolated from civet cats and camels, respectively, suggesting that those viruses spread from bats to

civets/camels to humans. In the case of SARS-CoV-2 such a closely related virus has not yet been

discovered. The closest virus, at 96.0% sequence identity, is the bat coronavirus RaTG13, which was

discovered in the intermediate horseshoe bat in Yunnan China in 2013. More recently, several slightly less

related viruses have been discovered in Malaysian pangolins trafficked into China [19]. Of these viruses,

the closest is M789, which is 89.1% identical. It has been suggested that SARS-CoV-2 migrated from bats

to pangolins and then into people. While this might be true, no closer intermediate to SARS-CoV-2 has

been isolated from pangolins to date.

In examining the relationship between the betacoronaviruses based on the genomic sequence (Figure 21),

it can be seen that SARS-CoV-2 is more closely related to RaTG13 and pangolin coronaviruses than to

SARS-CoV (79.4% identical) and is even more distantly related to MERS-CoV (55.4% identical) or the

common cold coronaviruses OC43 (54.2% identical) and HKU1 (54.8% identical). It is clear that

SARS-CoV-2 is not a direct derivative from SARS-CoV nor MERS-CoV but represents a unique

emergence. A paper [20] has made the case that the common cold coronavirus OC43 emerged from bovine

coronavirus in the late 1800s and may have initially been associated with a severe disease event. It is very

likely that more novel coronaviruses will cross into the human population in the future. Based on the limited

data of SARS-MERS-Covid, perhaps an approximate 10-year period before the next unique human

coronavirus?

Recombination is a biological process where two genomes (or portions of a genome) that have stretches of

sequence in common exchange intervening sequence. In the case of coronaviruses, this exchange can

happen in a host that is infected with two different coronaviruses at the same time. As noted above,

coronaviruses have portions of the genome that are highly conserved and regions that are much more

variable. Very recent publications [21] have suggested that SARS-CoV-2 represents recombination

between bat and pangolin coronaviruses. The bulk of the SARS-CoV-2 genome is derived from bat, but

two specific regions (including a part of the S gene important for binding to ACE2) are more closely related

to pangolin. Reference texts state that recombination is an important driver of coronavirus evolution.

One specific piece of sequence that does not fit with this whole story is the presence of a 12-nucleotide

insertion (CCTCGGCGGGCA) at the junction of the S1 and S2 domains in the S protein. This insertion

yields the additional amino acids PRRA in the S protein. Together with the next two amino acids (RS), the

sequence PRRARS results. This sequence is a furin cleavage site, allowing the S protein to be cleaved at

the S1-S2 junction by the host cell. None of the closest relatives to SARS-CoV-2 discovered so far have

this furin site (Figure 21). In examining the corresponding sequence in Figure 21, it would appear that the

furin site has arisen or been eliminated multiple times in the evolution of betacoronaviruses. This stretch of

the S protein appears to be highly tolerant of sequence modification and may be one of the central drivers

of viral host range. The presence of the furin site relative to bat RaTG and pangolin M789 has fueled some

undue speculation that SARS-CoV-2 arose due to deliberate manipulation. However, as noted, this stretch


of sequence is highly variable and it is simply likely that the whole range of betacoronaviruses is yet to be

discovered. A very recent paper [22] of a different, closely related bat betacoronavirus has found a different

motif altogether at this site in the S protein.

4.5 Mutation

With regard to the ongoing COVID-19 pandemic, one important question is whether mutant viruses are

arising and whether this makes them more or less dangerous. The mutation rate for coronaviruses is

estimated at 2 x 10-6 – 9 x 10-7 per site per replication (compared with 10-3 – 10-5 for most RNA viruses and

10-9 – 10-11 for humans) [23]. Given the large number of viral progeny generated during an infection it

follows that mutant viruses are always being created. In most cases the alteration is redundant (meaning it

leads to no change in amino acid sequence of the resulting protein) or conservative (meaning that there is

an amino acid change but it is unlikely to make any functional difference). In other cases, the alteration is

sufficiently deleterious to the virus (such as loss of function of a critical protein) that it is an immediate

dead end. Finally, in some cases, there is a change that the virus tolerates that also has some functional

effect. During the COVID-19 pandemic many virus isolates have been fully sequenced, providing the best

experience to date of following virus alteration in close to real time. The initial sequenced virus from

Wuhan, China and the main one circulating in Europe have some differences. In the United States, the

Chinese sequence entered the west coast and the European one entered the east coast. The east coast of the

USA has experienced more rapid spread and higher mortality than the west coast, leading to speculation

that sequence differences may be important.

In examining the key differences in the two viral populations, one change in particular has drawn attention:

D614G in the S protein [24]. This means that position 614 of the S protein has been changed from an

aspartate to a glycine. This position is thought have a role in the binding of the S protein to ACE2 and it

has been suggested that the selection and spread of this particular mutation is due to an increased

transmissibility of the mutant virus. Others have published that this mutation is not associated with a change

in transmissibility [25]. More recent studies have shown that this mutation appears to be associated with

increased cell infectivity in tissue culture [26] and higher virus titres in clinical patients [27].

The virus will continue to mutate at a natural rate and variants will continue to emerge. It is possible that

prolonged continuous transmission may ultimately yield a lower pathogenicity virus (essentially a new

common cold coronavirus) but this outcome is not guaranteed or even, necessarily, likely. To date, there

has been no regularly circulating human coronavirus that causes severe disease. SARS-CoV disappeared

and MERS-CoV has a very low incidence rate. Whether SARS-CoV-2 follows either of these examples is

yet to be seen.


Figure 1: The structure of DNA. (A) The four nucleoside bases (shown as monophosphates). (B) How the

nucleosides are attached in a strand and how two complementary strands associate via hydrogen bonds.

(C) The secondary, helical structure of DNA (public domain image).

Deoxyadenosine monophosphate

(A)

Deoxythymidine monophosphate

(T)

Deoxyguanosine monophosphate

(G)

Deoxycytidine monophosphate

(C)

A

B

5’

5’

3’

3’

C


Adenosine monophosphate

(A)

Uridine monophosphate

(U)

Guanosine monophosphate

(G)

Cytidine monophosphate

(C)

A

B

5’

3’

(public domain image).

(B) How the nucleosides attach to form a single strand and an example of an RNA secondary structure Figure 2: The structure of RNA. (A) The four nucleoside bases (shown as monophosphates).


Figure 3: The structure of proteins. (A) The individual amino acids that make up proteins. (B) An

example of a polypeptide, with the peptide bonds shown in red. (C) The levels of protein structure. A

polypeptide can fold via hydrogen bonds to give two main types of secondary structure. Mixtures of

secondary structure give rise to tertiary structure.

A

B

Lys-Gly-Asp-Glu-Glu-Ser-Leu-Ala (KGDEESLA)

C

Primary Structure

Secondary Structure

Tertiary Structure

Beta Sheet Alpha Helix


Figure 4: The structure of cell membranes. A phospholipid bilayer is shown with two proteins. Outside of

the cell is the top and inside the cell is the bottom of the bilayer in this example. Below is shown the

chemical structure of the most common phospholipid found in cell membranes.

Phospholipid(Phosphatidylcholine)

HydrophilicHead

HydrophobicTail

carbohydrate

proteins

cholesterol

phospholipidbilayer


Figure 5: DNA to RNA to proteins in mammalian cells. The descriptions go from the top to the bottom.

(1) A DNA strand. (2) The DNA partially unwinds revealing transcriptional start (green) and stop (red)

site. (3) RNA polymerase (brown oval) binds to the transcriptional start site on the antisense strand of the

DNA. (4) RNA starts to be synthesized (blue). (5) The emerging RNA strand gets modified by the addition

of a 5’-cap (blue circle). (6) The RNA polymerase hits the transcriptional stop site. (7) The polymerase

comes off the DNA strand and the full length RNA is released. (8) The RNA is modified by the addition of

a 3’ poly-A tail. (9) The mRNA has translational start (green) and stop (red) sites, where the start site can

be blocked by RNA secondary structure formation. (10) The ribosome (yellow ovals) binds to the

translational start site. (11) The ribosome starts to assemble amino acids into protein (purple). (12) The

ribosome moves down the mRNA extending the protein. (13) The ribosome hits the translational stop site.

(14) The ribosome comes off the mRNA and the full length protein is released. The protein goes on to get

folded and the mRNA gets degraded.

AAAAA

AAAAA

AAAAA

AAAAA

AAAAA

AAAAA

AAAAA

AAAAA

RNA Degradation

Protein Folding

(1)

(2)

(3)

(4)

(5)

(6)

(7)

(8)

(9)

(10)

(11)

(12)

(13)

(14)


Figure 6: The Coronaviridae. Shown are the type strains for each of the four families of Coronavirus

plus all of the human infective members.

Bulbul Bulbul Virus HKU11

MERS-CoV

SARS-CoV

SARS-CoV-2

Common Cold

HKU1

Mouse Hepatitis

Virus

Common Cold

OC43

Avian Infectious

Bronchitis Virus

Pig Infectious

Enteritis Virus

Common Cold NL63

Common Cold 229E

δ

γ

β

α


Figure 7: The structure of a coronavirus. (A) A computer model of a coronavirus (public domain image).

(B) A cross-sectional view of a coronavirus.

A

B

S

M

Membrane

E

N + Genomic RNA


Figure 8: The coronavirus genome. Shown are the full length and final 30% of the genomes for the three

coronaviruses causing severe acute respiratory syndrome in humans. The genes encoded are labelled.

Figure 9: The non-structural proteins 1a and 1ab. The organization of the two polyproteins is shown with

nsp 1-16 labelled. The open triangles are cut sites for nsp 3 while the closed triangles are cut sites for

nsp 5. The functions of the resulting nsp’s are described in Section 3.2.

5’ MeG-PPP

0 10 20 30 kb

SARS-CoV-2 (29.9 kb)

SARS-CoV (29.8 kb)

MERS-CoV (30.1 kb)

AAAAA 3’

AAAAA 3’

AAAAA 3’

5’ MeG-PPP

5’ MeG-PPP

20 30 kb 21 22 23 24 25 26 27 28 29

SARS-CoV-2

SARS-CoV

MERS-CoV

AAAAA 3’

AAAAA 3’

AAAAA 3’

1a 1b S

1a 1b S

1a 1b S

S 3a E M 6 7a 7b

8 N 10

S 3a E M N 3b

6 7a 7b

8a

8b 9b

S 3a E M N 4a 4b

5 8b

9

1a

1ab

1 2 3 4 5 6 7 8 9 10

11

1 2 3 4 5 7 8 10

6 12 13 14 15 16


Figure 10: S protein. (Top) The organization of the S protein. The open triangle is the S1–S2 furin site

while the closed triangle is the S2’ furin site. (Middle left) How the S protein trimer sits

in the viral membrane. (Right) Side and top-down view of the actual structure of the S protein ectodomain

(portion that sits outside of the viral membrane) as determined by X-ray crystallography

(image from NCBI structural database, public domain).

S1 S2

signal peptide receptor bindingdomain

fusion peptide

transmembranedomain

endodomain(M interacting)

S1

S2

membrane

S1

S2ectodomain

membrane


Figure 11: M protein. (Top) The organization of the M protein. (Bottom) How the protein might fit into

the viral membrane. The physical structure of the M protein has not been able to be determined to date.

Figure 12: E protein. (Top) The organization of the E protein. (Lower left) How the protein fits into the

viral membrane. The view is a cross-section and the E inserts as a pentomer. (Lower right) The actual

structure of most of the E pentomer shown from the side and looking down at the top (where a pore is

visible in the middle). The structure was determined by NMR spectroscopy (image from NCBI structural

database, public domain).

transmembrane domains

ectodomain endodomain

N interacting

S interacting

membrane

ectodomain endodomain

transmembrane domain

membrane


Figure 13: N protein. (Top) Organization of the N protein. (Middle left) A view of an N dimer interacting

with RNA strands. (Middle right) A view of stacked N dimers from the side and from the top looking

down. The RNA strand is under the NTD portion (lighter colour). (Bottom) The actual structure of the N

protein NTD (left) and a CTD dimer (right) as determined by X-ray crystallography (image from NCBI

structural database, public domain). The protein regions that join NTD and CTD have no stable

secondary structure and do not crystallize.

N-terminal RNA binding domain

(NTD)

C-terminal RNA binding domain

(CTD)

M interacting

genomic RNA strand


Figure 14: The coronavirus life cycle. The host cell membranes are shown in light blue. The virus binds to

ACE2 (green) and is processed in one of two ways. On the cell surface, the S protein may be cleaved by the

host enzyme furin (black), which brings the viral membrane into contact with the host cell membrane and

leads to membrane fusion. Alternatively, the bound virus may be endocytosed into an intracellular vesicle

where furin cleaves the S protein and membrane fusion occurs. In both cases, the nucleocapsid-coated

genomic RNA (brown) enters the host cell cytoplasm. The nucleocapsid protein comes off, leaving the naked

genomic RNA (dark blue). Ribosomes (light brown ovals) bind the genomic RNA and produce the polyproteins

1a and 1ab (light green). The polyproteins are cleaved into nsp 1–16. The nsp 3, 4, 6 proteins embed into the

host cell endoplasmic reticulum and cause membrane remodelling to give rise to double-membrane vesicles.

The nsp 8–16 proteins attach to these vesicles and the genomic RNA, and then produce copies of the genomic

RNA (dark blue) and also sub-genomic RNAs (red). The sub-genomic RNAs act as templates for the

production of S, E, M, N, and accessory proteins. The N protein coats the full-length genomic RNA copies,

while the S, E, and M proteins embed into the inner face of the endoplasmic reticulum. As this portion of the

endoplasmic reticulum moves towards the Golgi body, the nucleocapsid-coated genomic RNA interacts with

the embedded S and M proteins. The membrane pinches off and the complete progeny virus is now within a

cell vesicle. This vesicle fuses with the outer cell membrane and the virus is released to infect a new cell.

AAAAA

AAAAA

1a1ab

nsp1

inhibit host cellprotein synthesis

nsp3nsp4nsp6

ER

nsp8tonsp16

AAAAA

AAAAA

AAAAA

AAAAA

AAAAA

AAAAA

AAAAA

AAAAA

ERGIC

ER


Figure 15: Ribosomal slipping between 1a and 1b. The production of polyprotein 1a, which occurs about

75% of the time. The ribosome, making the 1a protein, approaches a semi-stable RNA hairpin (known as

a knot) and causes the knot to resolve into a linear RNA stretch. The ribosome continues until it hits a

translational stop site (UAA, red) and then falls off, leaving protein 1a. The production of polyprotein

1ab, which occurs about 25% of the time. The ribosome, making protein 1a, approaches the RNA knot,

which does not immediately resolve. The ribosome steps back 1 nucleotide in the “slippery sequence”

(blue) and is now in a new frame. The RNA knot resolves into a linear RNA stretch and the ribosome

continues on, but now in the 1b frame. In this frame it does not align with the stop site and continues

onward making the full-length 1ab.

... UUUAAACGGGU.......

GAAUG

...DAQSFL


GAAUG

...DAQSFLN

...UUUAAACGGGUUUGCGGUGUAAG...

...DAQSFLN


...DAQSFLNG


...DAQSFLNGFAV


...DAQSFLNGFAV


GAAUG

...DAQSFL


GAAUG

...DAQSFLN


GAAUG

...DAQSFLN

step back 1 base


...DAQSFLN


...DAQSFLNR


...DAQSFLNRVCGV


...DAQSFLNRVCGVS


...DAQSFLNRVCGVS...1b keeps going to complete 1ab by adding 3045 more amino acids

X only makes 1a

A

B


Figure 16: Structures of antiviral compounds. The drugs shown here have all been used in clinical trial

for treatment of Covid-19. Some of them have been found to be ineffective.

Remdesivir

Favipiravir Ribavirin

Lopinavir Ritonavir

Chloroquine Hydroxychloroquine

Ivermectin


Figure 17: The reverse transcriptase real-time PCR assay for SARS-CoV-2. Total RNA is isolated from a

test swab and converted to DNA using the enzyme reverse transcriptase. This DNA is used in a real-time

PCR reaction. The DNA is heated to separate the strands, and then specific primers are allowed to bind.

The middle, probe, primer has a fluorescent compound and a quenching compound attached. When the

two are close together spatially, the quencher prevents the fluorophore from being fluorescent. As the

PCR reaction proceeds, the extending polymerase digests the probe primer and releases the fluorophore

and quencher. No longer in close proximity to the quencher, the fluorophore fluoresces. The PCR

reaction occurs repetitively and the number of products increases exponentially. On the bottom right is a

graph of how this reaction would appear, with a positive reaction in green and a negative in red. The

more SARS-CoV-2 RNA present means the green curve rises up sooner.

nasopharyngeal swab

total RNA purification kit

RNA molecules

reverse transcriptase

enzyme

DNA copies of the RNA

real-time polymerase

chain reaction

denature 95 oC

anneal primers 58 oC

5’

5’ 3’

3’

5’

5’ 3’

3’

polymerase extension 58

oC

= fluorescent tag = quencher, prevents fluorescence when near

5’

5’ 3’

3’ polymerase digests internal probe frees tag from the quencher fluorescence occurs

PCR Cycle Number

Flu

ore

scen

ce


Figure 18: Chemiluminescence assay for anti-SARS-CoV-2 antibodies. A number of proprietary

platforms use this approach to measure specific antibody levels in blood samples. The blood and special

magnetic beads are mixed in a tube. The beads are coated with the SARS-CoV-2 S protein. Antibodies in

the blood that bind the S protein (red) will attach, while those that are specific for other proteins (green,

blue) will not. The beads are then magnetically held in place while unbound material is washed away.

Animal antibodies that bind to human antibodies (for example mouse anti-human IgG) are then added.

These antibodies are tagged with a luminol derivative. The beads are again washed and then a second

chemical reagent is added, causing a reaction that produces visible light. The amount of light produced is

measured and is directly related to the amount of anti-SARS-CoV-2 antibody is present. There may some

differences in the set-up of this process from platform to platform.

blood magnetic

beads


Figure 19: ELISA assay for anti-SARS-CoV-2 antibodies. Blood is added to microtitre plates that have

been coated with SARS-CoV-2 S protein. Antibodies in the blood that bind the S protein (red) will attach,

while those that are specific for other proteins (green, blue) will not. The plate is washed and then animal

antibodies that bind to human antibodies (for example mouse anti-human IgG) are then added. These

antibodies are tagged with a chromophore or a fluorophore. The plate is again washed and then a second

chemical reagent is added, causing a reaction that produces visible colour or fluorescence. The amount

of colour or fluorescence produced is measured and is directly related to the amount of anti-SARS-CoV-2

antibody is present. In addition to what is shown here, there are other ways to set up ELISA plates.

blood

colour or fluorescence or luminescence


Figure 20: Test strip assay for anti-SARS-CoV-2 antibodies. In an immunochromatographic test strip,

blood is added to a well, which has an excess of animal antibodies that bind to human antibodies (such as

mouse anti-human IgG). The animal antibodies are tagged with colloidal gold, which is visibly red when

present in sufficient amounts. The animal antibodies bind to all the human antibodies present. Liquid then

causes all the well material to wick up the test strip. The liquid front crosses immobilized SARS-CoV-2 S

protein and then immobilized antibody that binds the previous animal antibody (such as rabbit anti-

mouse IgG). Antibodies in the blood that bind the S protein (red) will attach at that place on the strip.

Antibodies in the blood that bind other proteins (blue) will wash past both binding zones. Free, tagged

animal antibody from the well will bind to the second binding zone, which acts a control. If enough

material binds, you will see a red line from the colloidal gold. In addition to what is shown here, there

are other ways to set up test strips.

T C

blood

T

C


Figure 21: Relationships amongst the betacoronaviruses. Shown are all the betacoronaviruses that have

a complete genome sequence. An exception is for four additional pangolin viruses that are nearly

identical to GX-5L, which are omitted to prevent over-representation. (A) Neighbor-joining tree of the

alignment of the complete genome at the nucleotide level. (B) Neighbor-joining tree of the alignment of

the entire S protein at the amino acid level. The formation of nearly identical trees demonstrates that the

S protein is not evolving independently of the rest of the genome and no large-scale sequence transfer has

occurred in the S protein. The blue sequence is that of the location of the S1-S2 furin site. The most

parsimonious explanation in this tree is that there was a primary differentiation between furin-positive

and furin-negative viruses, followed by three separate losses of the site in the former and two acquisitions

of the site in the latter.


References

General References

For a better understanding of basic molecular and cell biology a variety of excellent texts are available. Any

introductory biology text will also provide suitable background. For this document, “Essential Cell Biology,

3rd Edition, by Alberts, Bray, Hopkin, Johnson, Lewis, Raff, Roberts, and Walter, 2009, Garland Science,

NY, USA” was used as it was at hand.

For the basics of virology “Principles of Virology, 4th Edition, by Flint, Racaniello, Rall, and Skalka, 2015,

ASM Press, Washington, USA” is a clear understandable text that was published before COVID-19 existed.

For more detailed, in-depth discussion of virology “Field's Virology, 6th Edition, by Knipe and Howley,

2013, Wolters Kluwer, Philadelphia, USA” is authoritative but also published before COVID-19.

For specifics on coronaviruses, the new edition of Field's Virology (7th edition, 2020) is starting to be

released and the chapter “Coronaviridae: The Viruses and Their Replication, by Perlman and Masters” is

present in Volume 1 released February 2020. While this release is post-emergence of Covid, the chapter,

unfortunately, does not have any information on that virus.

Specific References

[1] Kim, D, JY Lee, JS Yang, JW Kim, VN Kim, H Chang. 2020. The architecture of SARS-CoV-2

transcriptome. Cell 181:914.

[2] He, X, EHY Lau, P Wu, X Deng, et al. 2020. Temporal dynamics in viral shedding and

transmissibility of Covid-19. Nat Med 26:672.

[3] Wolfel, R, VM Corman, W Guggemos et al. 2020. Virological assessment of hospitalized patients

with Covid-19. Nature 581:465.

[4] Stringhini, S, A Wisniak, G Piumatti, et al. 2020. Repeated seroprevalence of anti-SARS-CoV-2 IgG

antibodies in a population-based sample. medRxiv doi.org/10.1101/2020.05.02. 20088898.

[5] Sereina, H, DB Jessie, A Steven, et al. 2020. Seroprevalence of IgG antibodies against SARS

coronavirus 2 in Belgium – a prospective cross-sectional study of residual samples. medRxiv

doi.org/10.1101/2020.06.08.20125179.

[6] Blanco-Melo, D, BE Nilsson-Payant, WC Liu, et al. 2020. Imbalanced host response to SARS-CoV-2

drives development of Covid-19. Cell 181:1036.

[7] Beigel, JH, KM Tomashek, LE Dodd, et al. 2020. Remdesivir for the treatment of

Covid-19 – preliminary report. New Engl J Med 10.1056/NEJMoa2007764.

[8] Cao, B, Y Wang, D Wen, et al. 2020. A trial of lopinavir-ritonavir in adults hospitalized with severe

Covid-19. New Engl J Med 10.1056/NEJMoa2001282.


[9] Mahevas, M, VT Tran, M Roumier, et al. 2020. Clinical efficacy of hydroxychloroquine in patients

with Covid-19 pneumonia who require oxygen: observational comparative study using routine care

data. Br Med J 369:m1844.

[10] Boulware, DR, MF Pullen, AS Bangdiwala, et al. 2020. A randomized trial of hydroxychloroquine

as postexposure prophylaxis for Covid-19. New Engl J Med 10.1056/NEJMoa2016638.

[11] Rajter, JC, MS Sherman, N Fatteh, F Vogel, J Sacks, JJ Rajter. 2020. ICON (Ivermectin in Covid

Nineteen) study: use of ivermectin is associated with lower mortality in hospitalized patients with

Covid-19. medRxiv doi/org/10.1101/2020.06.06.20124461.

[12] Callow, KA, HF Parry, M Sergeant, DAJ Tyrrell. 1990. The time course of the immune response to

experimental coronavirus infection in man. Epidemiol Infect 105:435.

[13] Cao, WC, W Liu, PH Zhang, F Zhang, JH Richardus. 2007. Disappearance of antibodies to

SARS-associated coronavirus after recovery. New Engl J Med 357:1162.

[14] Huang, AT, B Garcia-Carreras, MDT Hitchings, et al. 2020. A systematic review of antibody

mediated immunity to coronaviruses: antibody kinetics, correlates of protection, and association of

antibody responses with severity of disease. medRxiv doi.org/10.1101/2020.04.14.20065771

[15] Tan, W, Y Lu, J Zhang, et al. 2020. Viral kinetics and antibody responses in patients with Covid-19.

medRxiv doi.org/10.1101/2020.03.24.20042382.

[16] Long, QX, XJ Tang, QL Shi, et al. 2020. Clinical and immunological assessment of asymptomatic

SARS-CoV-2 infections. Nat Med doi.org/10.1038/s41591-020-0965-6.

[17] van Doremalen, N, T Lambe, A Spencer, et al. 2020. ChAdOx1 nCoV-19 vaccination prevents

SARS-CoV-2 pneumonia in rhesus macaques. bioRxiv doi.org/10.1101/2020.05.13.093195.

[18] Graham, SP, RK McLean, AJ Spencer, et al. 2020. Evaluation of the immunogenicity of

prime-boost vaccination with the replication-deficient viral vectored COVID-19 vaccine candidate

ChAdOx1 nCoV-19. BioRxiv doi.org/10.1101/2020.06.20.

[19] Lam, TTY, MHH Shum, HC Zhu, et al. 2020. Identifying SARS-CoV-2 related coronaviruses in

Malayan pangolins. Nature doi.org/10.1038/s41586-2169-0.

[20] Vijgen, L, E Keyaerts, E Moes, et al. 2005. Complete genome sequence of human coronavirus

OC43: molecular clock analysis suggests a relatively recent zoonotic coronavirus transmission

event. J Virol 79:1595.

[21] Li, X, EE Giorgi, MH Marichannegowda, et al. 2020. Emergence of SARS-CoV-2 through

recombination and strong purifying selection. Sci Adv 10.1126/sciadv.abb9153.

[22] Zu, H, X Chen, T Hu, et al. 2020. A novel bat coronavirus closely related to SARS-CoV-2 contains

natural insertions at the S1/S2 cleavage site of the spike protein. Curr Biol doi.org/10.1016/

j.cub.2020.05.023.


[23] Smith, EC, NR Sexton, MR Denison. 2014. Thinking outside the triangle: replication fidelity of the

largest RNA viruses. Ann Rev Virol 1:111.

[24] Korber, B, WM Fischer, S Gnanakaran, et al. 2020. Spike mutation pipeline reveals the emergence

of a more transmissible form of SARS-CoV-2. bioRxiv doi.org/10.1101/ 2020.04.29.069054.

[25] van Dorp, L, D Richard, CCS Tan, et al. 2020. No evidence for increased transmissibility from

recurrent mutations in SARS-CoV-2. bioRxiv doi.org/10.1101/2020.05.21.108506.

[26] Zhang, L, CB Jackson, H Mou, et al. 2020. The D614G mutation in the SARS-CoV-2 spike protein

reduces S1 shedding and increases infectivity. bioRxiv doi.org/10.1101/ 2020.06.12.148726.

[27] Lorenzo-Redondo, R, HH Nam, SC Roberts, et al. 2020. A unique clade of SARS-CoV-2 viruses is

associated with lower viral loads in patient upper airways. medRxiv doi.org/10.1101/

2020.05.19.20107144.

DOCUMENT CONTROL DATA *Security markings for the title, authors, abstract and keywords must be entered when the document is sensitive

1. ORIGINATOR (Name and address of the organization preparing the document. A DRDC Centre sponsoring a contractor's report, or tasking agency, is entered in Section 8.)

DRDC – Suffield Research Centre Defence Research and Development Canada P.O. Box 4000, Station Main Medicine Hat, Alberta T1A 8K6 Canada

2a. SECURITY MARKING (Overall security marking of the document including special supplemental markings if applicable.)

CAN UNCLASSIFIED

2b. CONTROLLED GOODS

NON-CONTROLLED GOODS DMC A

3. TITLE (The document title and sub-title as indicated on the title page.)

An introduction to COVID-19 biology

4. AUTHORS (Last name, followed by initials – ranks, titles, etc., not to be used)

Berger, B.

5. DATE OF PUBLICATION (Month and year of publication of document.)

August 2020

6a. NO. OF PAGES

(Total pages, including Annexes, excluding DCD, covering and verso pages.)

43

6b. NO. OF REFS

(Total references cited.)

27

7. DOCUMENT CATEGORY (e.g., Scientific Report, Contract Report, Scientific Letter.)

Reference Document

8. SPONSORING CENTRE (The name and address of the department project office or laboratory sponsoring the research and development.)

DRDC – Suffield Research Centre Defence Research and Development Canada P.O. Box 4000, Station Main Medicine Hat, Alberta T1A 8K6 Canada

9a. PROJECT OR GRANT NO. (If appropriate, the applicable research and development project or grant number under which the document was written. Please specify whether project or grant.)

00ca - SDS3 Project 00ca

9b. CONTRACT NO. (If appropriate, the applicable number under which the document was written.)

10a. DRDC PUBLICATION NUMBER (The official document number by which the document is identified by the originating activity. This number must be unique to this document.)

DRDC-RDDC-2020-D079

10b. OTHER DOCUMENT NO(s). (Any other numbers which may be assigned this document either by the originator or by the sponsor.)

11a. FUTURE DISTRIBUTION WITHIN CANADA (Approval for further dissemination of the document. Security classification must also be considered.)

Public release

11b. FUTURE DISTRIBUTION OUTSIDE CANADA (Approval for further dissemination of the document. Security classification must also be considered.)

12. KEYWORDS, DESCRIPTORS or IDENTIFIERS (Use semi-colon as a delimiter.)

Coronavirus; COVID-19; Review

13. ABSTRACT (When available in the document, the French version of the abstract must be included here.)

The recent COVID-19 pandemic, caused by the SARS-CoV-2 coronavirus, has been associated with an increased interest in the basic biology related to this virus and disease. This report reviews the fundamental properties of SARS-CoV-2 and COVID-19 at a level suitable for a broad range of educational background.

La récente pandémie de COVID-19, cause par le coronavirus SARS-CoV-2, a été associée à un intérêt accru pour la biologie de base liée à ce virus et à cette maladie. Ce rapport passe en revue les propriétés fondamentales du SARS-CoV-2 et de la COVID-19 pour une audience de niveau d’éducation diverse.

Documents

An introduction to COVID-19 biologyAn introduction to COVID-19 biology Brad Berger DRDC – Suffield Research Centre The body of this CAN UNCLASSIFIED document does not contain the