Upload
others
View
5
Download
0
Embed Size (px)
Citation preview
1
Title: Strains used in whole organism Plasmodium falciparum vaccine trials differ in 1
genome structure, sequence, and immunogenic potential 2
3
Authors: Kara A. Moser1‡, Elliott F. Drábek1, Ankit Dwivedi1, Jonathan Crabtree1, Emily 4
M. Stucke2, Antoine Dara2, Zalak Shah2, Matthew Adams2, Tao Li3, Priscila T. 5
Rodrigues4, Sergey Koren5, Adam M. Phillippy5, Amed Ouattara2, Kirsten E. Lyke2, Lisa 6
Sadzewicz1, Luke J. Tallon1, Michele D. Spring6, Krisada Jongsakul6, Chanthap Lon6, 7
David L. Saunders6¶, Marcelo U. Ferreira4, Myaing M. Nyunt2§, Miriam K. Laufer2, Mark 8
A. Travassos2, Robert W. Sauerwein7, Shannon Takala-Harrison2, Claire M. Fraser1, B. 9
Kim Lee Sim3, Stephen L. Hoffman3, Christopher V. Plowe2§, Joana C. Silva1,8 10
11
Corresponding author: Joana C. Silva, [email protected] 12
Affiliations: 13 1 Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, 14
MD, 21201 15 2 Center for Vaccine Development and Global Health, University of Maryland School of 16
Medicine, Baltimore, MD, 21201 17 3 Sanaria, Inc. Rockville, Maryland, 20850 18 4 Department of Parasitology, Institute of Biomedical Sciences, University of São Paulo, 19
São Paulo, Brazil 20 5 Genome Informatics Section, Computational and Statistical Genomics Branch, 21
National Human Genome Research Institute, Bethesda, Maryland, USA 20892 22 6 Armed Forces Research Institute of Medical Sciences, Department of Bacterial and 23
Parasitic Diseases, Bangkok, Thailand 24 7 Department of Medical Microbiology, Radboud University Medical Center, Nijmegen, 25
Netherlands 26 8 Department of Microbiology and Immunology, University of Maryland School of 27
Medicine, Baltimore, MD, 21201 28
29 ‡Current address: Institute for Global Health and Infectious Diseases, University of 30
North Carolina Chapel Hill 31 ¶Current address: Warfighter Expeditionary Medicine and Treatment, US Army Medical 32
Material Development Activity 33 §Current address: Duke Global Health Institute, Duke University, Durham, North 34
Carolina, 27708 35
36
Author emails (in order of author list above): 37
[email protected], [email protected], [email protected], 38
[email protected], [email protected], [email protected], 39
[email protected], [email protected], [email protected], 40
[email protected] , [email protected], [email protected], 41
[email protected], [email protected], 42
[email protected], [email protected], 43
[email protected], [email protected], [email protected], 44
[email protected], [email protected], [email protected], 45
certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted June 27, 2019. ; https://doi.org/10.1101/684175doi: bioRxiv preprint
2
[email protected] , [email protected], 46
[email protected], [email protected], 47
[email protected], [email protected], [email protected], 48
[email protected], [email protected] 49
50
51
52
Abstract 53
Background: Plasmodium falciparum (Pf) whole-organism sporozoite vaccines have 54
provided excellent protection against controlled human malaria infection (CHMI) and 55
naturally transmitted heterogeneous Pf in the field. Initial CHMI studies showed 56
significantly higher durable protection against homologous than heterologous strains, 57
suggesting the presence of strain-specific vaccine-induced protection. However, 58
interpretation of these results and understanding of their relevance to vaccine efficacy 59
(VE) have been hampered by the lack of knowledge on genetic differences between 60
vaccine and CHMI strains, and how these strains are related to parasites in malaria 61
endemic regions. 62
Methods: Whole genome sequencing using long-read (Pacific Biosciences) and short-63
read (Illumina) sequencing platforms was conducted to generate de novo genome 64
assemblies for the vaccine strain, NF54, and for strains used in heterologous CHMI (7G8 65
from Brazil, NF166.C8 from Guinea, and NF135.C10 from Cambodia). The assemblies 66
were used to characterize sequence polymorphisms and structural variants in each strain 67
relative to the reference Pf 3D7 (a clone of NF54) genome. Strains were compared to 68
each other and to clinical isolates from South America, Sub-Saharan Africa, and 69
Southeast Asia. 70
certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted June 27, 2019. ; https://doi.org/10.1101/684175doi: bioRxiv preprint
3
Results: While few variants were detected between 3D7 and NF54, we identified tens of 71
thousands of variants between NF54 and the three heterologous strains both genome-72
wide and within regulatory and immunologically important regions, including in pre-73
erythrocytic antigens that may be key for sporozoite vaccine-induced protection. 74
Additionally, these variants directly contribute to diversity in immunologically important 75
regions of the genomes as detected through in silico CD8+ T cell epitope predictions. Of 76
all heterologous strains, NF135.C10 consistently had the highest number of unique 77
predicted epitope sequences when compared to NF54, while NF166.C8 had the lowest. 78
Comparison to global clinical isolates revealed that these four strains are representative 79
of their geographic region of origin despite long-term culture adaptation; of note, 80
NF135.C10 is from an admixed population, and not part of recently-formed drug resistant 81
subpopulations present in the Greater Mekong Sub-region. 82
Conclusions: These results are assisting the interpretation of VE of whole-organism 83
vaccines against homologous and heterologous CHMI, and may be useful in informing 84
the choice of strains for inclusion in region-specific or multi-strain vaccines. 85
86
Keywords: P. falciparum, malaria, genome assembly, PfSPZ Vaccine, whole-sporozoite 87
vaccine 88
89
90
certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted June 27, 2019. ; https://doi.org/10.1101/684175doi: bioRxiv preprint
4
Main Text 91
Introduction 92
The flattening levels of mortality and morbidity due to malaria in recent years (1), 93
which follow a decade in which malaria mortality was cut in half (2), highlight the pressing 94
need for new tools to control the disease. A highly efficacious vaccine against 95
Plasmodium falciparum, the deadliest malaria parasite, would be a critical development 96
for control and elimination efforts. Several variations of a highly promising pre-97
erythrocytic, whole organism malaria vaccine based on P. falciparum sporozoites (PfSPZ) 98
are under development, all based on the same P. falciparum strain, NF54 (3), thought to 99
be of West African origin, and which use different mechanisms for attenuation of PfSPZ. 100
Of these vaccine candidates, Sanaria® PfSPZ Vaccine, based on radiation-attenuated 101
sporozoites, has progressed furthest in clinical trial testing (4–10). Other whole-organism 102
vaccine candidates, including chemo-attenuated (Sanaria® PfSPZ-CVac), transgenic 103
and genetically attenuated sporozoites, are in earlier stages of development (11–13). 104
PfSPZ Vaccine showed 100% short-term protection against homologous 105
controlled human malaria infection (CHMI) in an initial phase 1 clinical trial (6), and 106
subsequent trials have confirmed that high levels of protection can be achieved against 107
both short-term (8) and long-term (7) homologous CHMI. However, depending on the 108
immunization regimen, sterile protection can be significantly lower (8-83%) against 109
heterologous CHMI using the 7G8 Brazilian clone (8,9), and against infection in malaria-110
endemic regions with intense seasonal malaria transmission (29% and 52% by 111
proportional and time to event analysis, respectively) (10). Heterologous CHMI in 112
chemoprophylaxis with sporozoite (CPS) studies, in which immunization is by infected 113
certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted June 27, 2019. ; https://doi.org/10.1101/684175doi: bioRxiv preprint
5
mosquito bite of individuals undergoing malaria chemoprophylaxis, have been conducted 114
with NF135.C10 from Cambodia (14) and NF166.C8 from Guinea (15), and have had 115
lower efficacy than against homologous CHMI (16,17). One explanation for the lower 116
efficacy seen against heterologous P. falciparum strains is the extensive genetic diversity 117
in this parasite species, which is particularly high in genes encoding antigens (18), which 118
combined with low vaccine efficacy against non-vaccine alleles (19–21) reduces overall 119
protective efficacy and complicates the design of broadly efficacious vaccines (22,23). 120
The lack of a detailed genomic characterization of the P. falciparum strains used in CHMI 121
studies, and the unknown genetic basis of the parasite targets of PfSPZ Vaccine- and 122
PfSPZ CVac-induced protection, have precluded a conclusive statement regarding the 123
cause(s) of variable vaccine efficacy outcomes. 124
The current PfSPZ Vaccine strain, NF54, was isolated from a patient in the Netherlands 125
who had never left the country and is considered a case of “airport malaria”; the exact origin of 126
NF54 is unknown (3), but thought to be from Africa (24,25). NF54 is also the isolate from which 127
the P. falciparum 3D7 reference strain was cloned (26), and hence, despite having been 128
separated in culture for over 30 years, NF54 and 3D7 are assumed to be genetically identical, 129
and 3D7 is often used in homologous CHMI (6,8). Several issues hinder the interpretation of both 130
homologous and heterologous CHMI experiments conducted to date. It remains to be confirmed 131
that 3D7 has remained genetically identical to NF54 genome-wide, or that the two are at least 132
identical immunogenically. Indeed, NF54 and 3D7 have several reported phenotypic differences 133
when grown in culture, including the variable ability to produce gametocytes (27). In addition, 134
7G8, NF166.C8, and NF135.C10 have not been rigorously compared to each other or to NF54 to 135
confirm that they are adequate heterologous strains, even though they do appear to have distinct 136
infectivity phenotypes when used as CHMI strains (15). While the entire sporozoite likely offers 137
multiple immunological targets, no high-confidence correlates of protection currently exist. In part 138
certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted June 27, 2019. ; https://doi.org/10.1101/684175doi: bioRxiv preprint
6
because of the difficulty of studying hepatic parasite forms and their gene expression profiles in 139
humans, it remains unclear which parasite proteins are recognized by the human immune system 140
during that stage, and elicit protection, upon immunization with PfSPZ vaccines. Both humoral 141
and cell-mediated responses have been associated with protection against homologous CHMI 142
(6,7), although studies in rodents and non-human primates point to a requirement for cell-143
mediated immunity (specifically through tissue-resident CD8+ T cells) in long-term protection 144
(5,9,28,29). In silico identification of CD8+ epitopes in all strains could highlight critical differences 145
of immunological significance between strains. Finally, heterologous CHMI results cannot be a 146
reliable indicator of efficacy against infection in field settings unless the CHMI strains used are 147
characteristic of the geographic region from which they originate. These issues could impact the 148
use of homologous and heterologous CHMI, and the choice of strains for these studies, to predict 149
the efficacy of PfSPZ-based vaccines in the field (30). 150
These knowledge gaps can be addressed through a rigorous description and 151
comparison of the genome sequence of these strains. High-quality de novo assemblies 152
allow characterization of genome composition and structure, as well as the identification 153
of genetic differences between strains. However, the high AT content and repetitive 154
nature of the P. falciparum genome greatly complicates genome assembly methods (31). 155
Recently, long-read sequencing technologies have been used to overcome some of these 156
assembly challenges, as was shown with assemblies for 3D7, 7G8, and several other 157
culture-adapted P. falciparum strains generated using Pacific Biosciences (PacBio) 158
technology (32–34). However, NF166.C8 and NF135.C10 still lack whole-genome 159
assemblies; in addition, while an assembly for 7G8 is available (33), it is important to 160
characterize the specific 7G8 clone used in heterozygous CHMI, from Sanaria’s working 161
bank, as strains can undergo genetic changes over time in culture (35). Here, reference 162
certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted June 27, 2019. ; https://doi.org/10.1101/684175doi: bioRxiv preprint
7
assemblies for NF54, 7G8, NF166.C8, and NF135.C10 (hereafter referred to as PfSPZ 163
strains) were generated using approaches to take advantage of the resolution power of 164
long-read sequencing data and the low error rate of short-read sequencing platforms. 165
These de novo assemblies allowed for the thorough genetic and genomic characterization 166
of the PfSPZ strains and will aid in the interpretation of results from CHMI studies. 167
Materials and Methods 168
Study design and samples 169
This study characterized and compared the genomes of four P. falciparum strains 170
used in whole organism malaria vaccines and controlled human malaria infections using 171
a combination of long- and short-read whole genome sequencing platforms (see below). 172
In addition, these strains were compared to P. falciparum clinical isolates collected from 173
patients in malaria-endemic regions globally, using short-read whole genome sequencing 174
data. Genetic material for the four PfSPZ strains were provided by Sanaria, Inc. Clinical 175
P. falciparum isolates from Brazil, Mali, Malawi, Myanmar, Thailand, Laos, and Cambodia 176
were collected between 2009 and 2016 from cross-sectional surveys of malaria burden, 177
longitudinal studies of malaria incidence, and drug efficacy studies done in collaboration 178
with the Division of Malaria Research at the University of Maryland Baltimore, or were 179
otherwise provided by collaborators (Supplemental Dataset 1). All samples met the 180
inclusion criteria of the initial study protocol with prior approval from the local ethical 181
review board. Parasite genomic sequencing and analyses were undertaken after prior 182
approval of the University of Maryland School of Medicine Institutional Review Board. 183
These isolates were obtained by venous blood draws; almost all samples were processed 184
using leukocyte depletion methods to improve the parasite-to-human ratio before 185
certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted June 27, 2019. ; https://doi.org/10.1101/684175doi: bioRxiv preprint
8
sequencing. The exceptions were samples from Brazil and Malawi, which were not 186
leukocyte depleted upon collection. These samples underwent a selective whole genome 187
amplification step before sequencing, modified from (36) (the main modification being a 188
DNA dilution and filtration step using vacuum filtration prior to selective whole genome 189
amplification). Cambodia isolates had been sequenced for a previous study (37). In 190
addition to these samples, samples for which whole genome short-read sequencing has 191
previously been generated were obtained from NCBI’s Short Read Archive (SRA) to 192
supplement malaria-endemic regions not represented in our data set (Peru, Columbia, 193
French Guiana, Guinea, Papua New Guinea) and regions where PfSPZ trials are ongoing 194
(Burkina Faso, Kenya, and Tanzania) (Supplemental Dataset 1). 195
Whole genome sequencing 196
Genetic material for whole genome sequencing of the PfSPZ strains was 197
generated from a cryovial of each strain’s cell bank with the following identifiers: NF54 198
Working Cell Bank (WCB): SAN02-073009; 7G8 WCB: SAN02-199
021214; NF135.C10 WCB: SAN07-010410; NF166.C8 Mother Cell Bank: SAN30-200
020613. Each cryovial was thawed and maintained in human O+ red blood cells 201
(RBCs) at 2% hematocrit (Hct) in complete growth medium (RPMI 1649 with L-glutamine 202
and 25 mM HEPES supplemented with 10% human O+ serum and hypoxanthine) in a 6-203
well plate in 5% O2, 5% CO2 and 90% N2 at 204
37°C. The cultures were then further expanded by adding fresh RBCs every 3-4 205
days and increased culture hematocrit (Hct) to 5% Hct using a standard method (38). 206
The complete growth medium was replaced daily. When the PfSPZ strain culture volume 207
reached 300-400 mL and a parasitemia of more than 1.5%, the culture suspensions 208
certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted June 27, 2019. ; https://doi.org/10.1101/684175doi: bioRxiv preprint
9
were collected and the parasitized RBCs were pelleted down by centrifugation at 1800 209
rpm for 5 min. Aliquots of 0.5 mL per cryovial of the parasitized RBCs were stored at -210
80°C prior to extraction of genomic DNA. Genomic DNA was extracted using the Qiagan 211
Blood DNA Midi Kit (Valencia, CA, USA). Pacific Biosciences (PacBio) sequencing was 212
done for each PfSPZ strain. Total DNA was prepared for PacBio sequencing using the 213
DNA Template Prep Kit 2.0 (Pacific Biosciences, Menlo Park, CA). DNA was fragmented 214
with the Covaris E210, and the fragments were size selected to include those >15 Kbp in 215
length. Libraries were prepared per the manufacturer’s protocol. Four SMRT cells were 216
sequenced per library, using P6C4 chemistry and a 120-minute movie on the PacBio RS 217
II (Pacific Biosystems, Menlo Park, CA). 218
Short-read sequencing was done for each PfSPZ strain and for our collection of 219
clinical isolates using the Illumina HiSeq 2500 or 4000 platforms. Prepared genomic DNA, 220
extracted from cultured parasites, leukocyte-depleted samples, or from samples that 221
underwent sWGA (see above), was used to construct DNA libraries for sequencing on 222
the Illumina platform using the KAPA Library Preparation Kit (Kapa Biosystems, Woburn, 223
MA). DNA was fragmented with the Covaris E210 or E220 to ~200 bp. Libraries were 224
prepared using a modified version of manufacturer’s protocol. The DNA was purified 225
between enzymatic reactions and the size selection of the library was performed with 226
AMPure XT beads (Beckman Coulter Genomics, Danvers, MA). When necessary, a PCR 227
amplification step was performed with primers containing an index sequence of six 228
nucleotides in length. Libraries were assessed for concentration and fragment size using 229
the DNA High Sensitivity Assay on the LabChip GX (Perkin Elmer, Waltham, MA). Library 230
concentrations were also assessed by qPCR using the KAPA Library Quantification Kit 231
certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted June 27, 2019. ; https://doi.org/10.1101/684175doi: bioRxiv preprint
10
(Complete, Universal) (Kapa Biosystems, Woburn, MA). The libraries were pooled and 232
sequenced on a 100 bp paired-end Illumina HiSeq 2500 or 4000 run (Illumina, San Diego, 233
CA). 234
Assembly generation and characterization of PfSPZ strains 235
Canu (v1.3) (39) was used to correct and assemble the PacBio reads 236
(corMaxEvidenceErate=0.15 for AT-rich genomes, and using default parameters 237
otherwise). To optimize downstream assembly correction processes and parameters, the 238
percentage of total differences (both in bp and by proportion of the 3D7 genome not 239
captured by the NF54 assembly) between the NF54 assembly and the 3D7 reference 240
(PlasmoDBv24) was calculated after each round of correction. Quiver (smrtanalysis v2.3) 241
(40) was run iteratively with default parameters to reach a (stable) maximum reduction in 242
percent differences between the two genomes and the assemblies were further corrected 243
with Illumina data using Pilon (v1.13) (41) with the following parameters: --fixbases, --244
mindepth 5, --K 85, --minmq 0, and --minqual 35. 245
Assemblies were compared to the 3D7 reference (PlasmoDBv24) using 246
MUMmer’s nucmer (42), and the show-snps function was used to generate a list of SNPs 247
and small (< 50 bp) indels between assemblies. Coding and non-coding variants were 248
classified by comparing the show-snps output with the 3D7 gff3 file using custom scripts. 249
Structural variants, defined as indels, deletions, and tandem or repeat expansion and 250
contractions each greater than 50 bp in length were identified using the nucmer-based 251
Assemblytics tool (43) (unique anchor length: 1 Kbp). Translocations were identified by 252
eye through inspection of mummerplots. Translocations and other large structural 253
variants were confirmed using read-mapping data of the PacBio reads against either the 254
certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted June 27, 2019. ; https://doi.org/10.1101/684175doi: bioRxiv preprint
11
3D7 reference genome or the respective PfSPZ strain assembly using the NGLMR aligner 255
(44) and visualized using Ribbon (45). Reconstructed exon 1 sequences for var genes, 256
encoding P. falciparum erythrocyte membrane protein 1 (PfEMP1) antigens, for each 257
PfSPZ strain were recovered using the ETHA package (46). As a check for var exon 1 258
sequences that were missed during the generation of the strain’s assembly, a targeted 259
read capture and assembly approach was done using a strain’s Illumina data, wherein 260
var-like reads for each PfSPZ strains were identified by mapping reads against a 261
database of known var exon 1 sequences (47) using bowtie2 (48). Reads that mapped to 262
a known exon 1 sequence were then assembled with Spades (v3.9.0) (49), and the 263
assembled products were blasted against the PacBio reads to identify if they were truly 264
missed exon 1 sequences by the de novo assembly process, or if instead they were 265
chimeras reconstructed by the targeted assembly process. To identify homologs of the 266
61 var exon 1 sequences found in the 3D7 genome, each PfSPZ strain’s set of exon 1 267
amino acid sequences were compared to the 3D7 set using blastp. 268
In silico MHC I epitope predictions 269
Given the reported importance of CD8+ T cell responses towards immunity to 270
whole sporozoites, MHC class 1 epitopes of length 9 amino acids (9mers) were predicted 271
with NetMHCpan (v3.0) (50) for each PfSPZ strain, using inferred amino acid sequences 272
in each genome. HLA types common to African countries where PfSPZ or PfSPZ-CVac 273
trials are ongoing were used for epitope predictions based on frequencies in the Allele 274
Frequency Net Database (51) or from the literature (52,53) (Table S1). Shared epitopes 275
between NF54 and the three heterologous PfSPZ strains were calculated by first 276
identifying epitopes in each gene, and then removing duplicate epitope sequence entries 277
certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted June 27, 2019. ; https://doi.org/10.1101/684175doi: bioRxiv preprint
12
(caused by recognition by multiple HLA types). Identical epitope sequences that were 278
identified in two or more genes were treated as distinct epitope entries, and all unique 279
epitope-gene combinations were included when calculating the number of shared 280
epitopes between strains. 281
Epitopes were predicted in several gene sets. The first was all protein-coding 282
genes in which no premature stop codons were detected across all four PfSPZ strains 283
(n=4,814). The second was a subset of the above that are expressed as proteins during 284
the sporozoite stage of the parasite in the mosquito salivary glands, as supported by 285
proteomic data (n=1,974) (54,55). Lastly, a list of pre-erythrocytic genes of interest was 286
generated, either from a review of the literature, or genes whose products were 287
recognized by sera from protected vaccinees of whole organism malaria vaccine trials 288
(both PfSPZ and PfSPZ-CVac) (n=42) (11,56). These gene sequences were inspected 289
manually in each assembly, and corrected for any remaining PacBio sequencing error, 290
particularly one bp indels in homopolymer runs. Even though the 42 genes listed above 291
were detected through antibody responses, many have also been shown to have T cell 292
epitopes, such as CSP and LSA-1. 293
Read mapping and SNP calling 294
For the full collection of clinical isolates that had whole genome short-read 295
sequencing data (generated either at IGS or downloaded from SRA), reads were aligned 296
to the 3D7 reference genome (PlasmoDBv24) using bowtie2 (v2.2.4) (48). Samples with 297
less than 10 million reads mapping to the reference were excluded, as samples with less 298
than this amount had reduced coverage across the genome. Bam files were processed 299
according to GATK’s Best Practices documentation (57–59). Joint SNP calling was done 300
certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted June 27, 2019. ; https://doi.org/10.1101/684175doi: bioRxiv preprint
13
using Haplotype Caller (v4.0). Because clinical samples may be polyclonal (that is, more 301
than one parasite strain may be present), diploid calls were initially allowed, followed by 302
calling the major allele at positions with heterozygous calls. If the major allele was 303
supported by >70% of reads at a heterozygous position, the major allele was assigned 304
as the allele at that position (otherwise, the genotype was coded as missing). Additional 305
hard filtering was done to remove potential false positives based on the following filter: 306
DP < 12 || QUAL < 50 || FS > 14.5 || MQ < 20, and variants were further filtered to remove 307
those for which the non-reference allele was not present in at least three samples 308
(frequency less than ~0.5%), and those with more than 10% missing genotype values 309
across all samples. 310
Principal Coordinates Analyses and Admixture Analyses 311
A matrix of pairwise genetic distances was constructed from biallelic non-312
synonymous SNPs identified from the above pipeline (n=31,761) across all samples 313
(n=654) using a custom Python script, and principal coordinates analyses (PCoAs) were 314
done to explore population structure using cmdscale in R. Additional population structure 315
analyses were done using Admixture (v1.3) (60) on two separate data sets: South 316
America and Africa clinical isolates plus NF54, NF166.C8, and 7G8 (n=461), and 317
Southeast Asia and Oceania plus NF135.C10 (n=193). The data sets were additionally 318
pruned for sites in linkage disequilibrium (window size of 20 Kbp, window step of 2 Kbp, 319
R2 0.1). The final South America/Africa and Southeast Asia/Oceania data set used for 320
the admixture analysis consisted of 16,802 and 5,856 SNPs, respectively. The number of 321
populations, K, was tested for values between K=1 to K=15 and run with 10 replicates for 322
each K. For each population, the cross-validation (CV) error from the replicate with the 323
certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted June 27, 2019. ; https://doi.org/10.1101/684175doi: bioRxiv preprint
14
highest log-likelihood value was plotted, and the K with the lowest CV value was chosen 324
as the final K. 325
To compare subpopulations identified in our Southeast Asia/Oceania admixture 326
analysis with previously described ancestral, resistant, and admixed subpopulations from 327
Cambodia (61), the above non-synonymous SNP set was used before pruning for LD 328
(n=11,943), and was compared to a non-synonymous SNP dataset (n=21,257) from 167 329
samples used by Dwivedi et al. (62) to describe eight Cambodian subpopulations, in an 330
analysis that included a subset of samples used by Miotto et al. (61) (who first 331
characterized the population structure in Cambodia). There were 5,881 shared non-332
synonymous SNPs between the two datasets, 1,649 of which were observed in 333
NF135.C10. A pairwise genetic distance matrix (estimated as the proportion of base-pair 334
differences between pairs of samples, not including missing genotypes) was generated 335
from the 5,881 shared SNP set, and a dendrogram was built using Ward minimum 336
variance methods in R (Ward.D2 option of the hclust function). 337
Results 338
Generation of assemblies and validation of the assembly protocol 339
To characterize genome-wide structural and genetic diversity of the PfSPZ strains, 340
genome assemblies were generated de novo using whole genome long-read (PacBio) 341
and short-read (Illumina) sequence data (Methods; Table S2; Table S3). Taking 342
advantage of the parent isolate-clone relationship between NF54 and 3D7, we used NF54 343
as a test case to derive the assembly protocol, by adopting, at each step, approaches 344
that minimized the difference to 3D7. The raw assembly of the NF54 genome, while 345
containing 99.9% of the 3D7 genome in 30 contigs, showed ~78K sequence variants 346
certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted June 27, 2019. ; https://doi.org/10.1101/684175doi: bioRxiv preprint
15
relative to 3D7 (Figure S1). Many of these differences were small (1-3) base pair (bp) 347
indels (insertions or deletions), particularly in AT-rich regions and homopolymer runs, 348
strongly indicative of remaining sequencing errors (63). To maximize error removal with 349
existing tools, Quiver was run iteratively on the NF54 assembly; two consecutive Quiver 350
runs consistently and substantially reduced the number of differences between NF54 and 351
3D7 (Figure S1). Illumina reads from NF54 were used to further polish the assembly 352
using Pilon. The resulting NF54 assembly is 23.4 Mb in length (Figure 1) and, when 353
compared to the 3D7 reference, contains 8,396 indels and 1,386 single nucleotide 354
polymorphisms (SNPs), a rate of ~59 SNPs per million base pairs, the vast majority of 355
which occurred in noncoding regions (Table 1). In fact, only 17 non-synonymous 356
mutations were detected in 15 intact protein-coding single-copy genes between the two 357
genomes (Supplemental Dataset 2). We applied the same assembly protocol to the 358
remaining PfSPZ strains. The resulting assemblies for NF166.C8, 7G8, and NF135.C10 359
are structurally very complete, with the 14 nuclear chromosomes represented by 30, 20, 360
and 19 nuclear contigs, respectively, with each chromosome in the 3D7 reference 361
represented by one to three contigs (Figure 1). Several shorter contigs in NF54 (67,501 362
bps total), NF166.C8 (224,502 bps total), and NF135.C10 (80,944 bps total) could not be 363
unambiguously assigned to a homologous segment in the 3D7 reference genome; gene 364
annotation showed that these contigs mostly contain members of multi-gene families, and 365
therefore are likely part of sub-telomeric regions. The cumulative lengths of the four 366
assemblies ranged from 22.8 Mbp to 23.5 Mbp (Table 1), indicating variation in genome 367
size among P. falciparum strains. In particular, the 7G8 assembly was several hundred 368
thousand base-pairs smaller than the other three assemblies. To confirm that this was 369
certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted June 27, 2019. ; https://doi.org/10.1101/684175doi: bioRxiv preprint
16
not an assembly error, we compared 7G8 to a previously published 7G8 PacBio-based 370
assembly (PlasmoDBv41) (33). The two assemblies were extremely close in overall 371
genome structure, differing only by 2 Kbp in cumulative length, and also shared a very 372
similar number of SNP and small indel variants relative to 3D7 (Table S4). 373
Structural variations in the genomes of the PfSPZ strains 374
Many structural variants (defined as indels or tandem repeat contractions or 375
expansions, greater than 50 bp) were identified in each assembly by comparison to the 376
3D7 genome, impacting a cumulative length of 199.0 Kbp in NF166.C8 to 340.9 Kbp in 377
NF135.C10 (Table S5). Many smaller variants fell into coding regions (including known 378
pre-erythrocytic antigens), often representing variation in repeat units (Supplemental 379
Dataset 3). Several larger structural variants (>10 Kbp) exist in 7G8, NF166.C8, and 380
NF135.C10 relative to 3D7. Many of these regions contain members of multi-gene 381
families, and as expected the number of these vary between each assembly (Figure S2). 382
While all 61 3D7 var exon 1 sequences were present as almost identical sequences in 383
the NF54 assembly (with the exception of small indels in several exon 1 sequences), the 384
number of 3D7 exon 1 var sequences with identifiable orthologs in the three heterologous 385
PfSPZ strains was much lower. Only a few of the identified orthologs had amino acid 386
sequence identity higher than 75% (Figure S2), reflecting the exceptional sequence 387
diversity that occurs in this gene family in and among P. falciparum strains. A recently 388
characterized PfEMP1 protein expressed on the surface of NF54 sporozoites 389
(NF54varsporo) was shown to be involved in hepatocyte invasion (Pf3D7_0809100), and 390
antibodies to this PfEMP1 blocked hepatocyte invasion (64). No ortholog to NF54varsporo 391
was identified in the var repertoire of 7G8, NF166.C8, or NF135.C10 (Figure S2). It is 392
certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted June 27, 2019. ; https://doi.org/10.1101/684175doi: bioRxiv preprint
17
possible that a different, strain-specific, var gene fulfills a similar role in each of the 393
heterologous PfSPZ strains. 394
Several other large structural variants impact regions housing non-multi-gene 395
family members, although none known to be involved in pre-erythrocytic immunity. 396
Examples include a 31 Kbp-long tandem expansion of a region of chromosome 12 in the 397
7G8 assembly (also present in the previously published assembly for 7G8 (33)) and a 398
22.7 Kbp-long repeat expansion of a region of chromosome 5 in NF135.C10, both of 399
which are supported by ~200 PacBio reads. The former is a segmental duplication 400
containing a vacuolar iron transporter (PF3D7_1223700), a putative citrate/oxoglutarate 401
carrier protein (PF3D7_1223800), a putative 50S ribosomal protein L24 402
(PF3D7_1223900), GTP cyclohydrolase I (PF3D7_1224000), and three conserved 403
Plasmodium proteins of unknown function (PF3D7_1223500, PF3D7_1223600, 404
PF3D7_1224100). The expanded region in NF135.C10 represents a tandem expansion 405
of a segment housing the gene encoding the Pf multidrug resistance protein, PfMDR1 406
(PF3D7_0523000), resulting in a total of four copies of this gene in NF135.C10. Other 407
genes in this tandem expansion include those encoding an iron-sulfur assembly protein 408
(PF3D7_0522700), a putative pre-mRNA-splicing factor DUB31 (PF3D7_0522800), a 409
putative zinc finger protein (PF3D7_0522900) and a putative mitochondrial-processing 410
peptidase subunit alpha protein (PF3D7_0523100). In addition, the NF135.C10 assembly 411
contained a large translocation involving chromosomes 7 (3D7 coordinates ~520,000 to 412
~960,000) and 8 (start to coordinate ~440,000) (Figure S3); the boundaries of the 413
translocated regions contain members of multi-gene families. Independent assemblies 414
and mapping patterns observed by aligning the PacBio reads against the NF135.C10 and 415
certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted June 27, 2019. ; https://doi.org/10.1101/684175doi: bioRxiv preprint
18
3D7 assemblies support the translocation (Figure S4). Furthermore, the regions of 416
chromosomes 7 and 8 where the translocation occurs are documented recombination 417
hotspots that were identified specifically in isolates from Cambodia, the site of origin of 418
NF135.C10 (65). 419
Several structural differences in genic regions were also identified between the 420
NF54 assembly and the 3D7 genome (Supplemental Dataset 3); if real, these structural 421
variants would have important implications in the interpretation of trials using 3D7 as a 422
homologous CHMI strain. For example, a 1,887 bp tandem expansion was identified in 423
the NF54 assembly on chromosome 10, which overlapped the region containing liver 424
stage antigen 1 (PfLSA-1, PF3D7_1036400). The structure of this gene in the NF54 strain 425
was reported when PfLSA-1 was first characterized, with unique N- and C-terminal 426
regions flanking a repetitive region consisting of several dozen repeats of a 17 amino acid 427
motif (66,67). The CDS of PfLSA-1 in the NF54 assembly was 5,406 bp in length but only 428
3,489 bp-long in the 3D7 reference. To determine if this was an assembly error in the 429
NF54 assembly, the PfLSA-1 locus from a recently published PacBio-based assembly of 430
3D7 (32) was compared to that of NF54. The two sequences were identical, likely 431
indicative of incorrect collapsing of the repeat region of PfLSA-1 in the 3D7 reference; 432
NF54 and 3D7 PacBio-based assemblies had 79 units of the 17-mer amino acid repeat, 433
compared to only 43 in the 3D7 reference sequence, a result further validated by the 434
inconsistent depth of mapped Illumina reads from NF54 between the PfLSA repeat region 435
and its flanking unique regions in the 3D7 reference (Figure S5). Using the same 436
comparative rationale as above, where structural differences relative to the 3D7 437
reference, but common to both PabBio-based 3D7 and NF54 assemblies, are considered 438
certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted June 27, 2019. ; https://doi.org/10.1101/684175doi: bioRxiv preprint
19
true, all structural variations identified in NF54 likely represent existing structural errors in 439
the 3D7 reference genome, including 15 variants that affect genic regions (Figure S6). 440
Small sequence variants between PfSPZ strains and the reference 3D7 genome 441
Very few small sequence variants were identified in NF54 compared to the 3D7 442
reference; 17 non-synonymous mutations were present in 15 single-copy non-443
pseudogene-encoding loci (Supplemental Dataset 2). Short indels were detected in 185 444
genes; many of these indels have length that is not multiple of three and occur in 445
homopolymer runs, possibly representing remaining PacBio sequencing error. However, 446
some may be real, as a small indel causing a frameshift in PF3D7_1417400, a putative 447
protein-coding pseudogene that has previously been shown to accumulate premature 448
stop codons in laboratory-adapted strains (68), and some may be of biologic importance, 449
such as those seen in two histone-related proteins (PF3D7_0823300 and 450
PF3D7_1020700). It has been reported that some clones of 3D7, unlike NF54, are unable 451
to consistently produce gametocytes in long-term culture (27,69); no polymorphisms were 452
observed within or directly upstream of PfAP2-G (PF3D7_1222600) (Table S6), which 453
has been identified as a transcriptional regulator of sexual commitment in P. falciparum 454
(70). However, a non-synonymous mutation from arginine to proline (R1286P) was 455
observed in a AP2-coincident C-terminal domain of PfAP2-L (PF3D7_0730300), a gene 456
associated with liver stage development (71); a proline was also present at the 457
homologous position in 7G8, NF166.C8, and NF135.C10. Two additional putative AP2 458
transcription factor genes (PF3D7_0613800 and PF3D7_1317200) contained a three bp 459
deletion in NF54 relative to 3D7. Although knock-downs of PF3D7_1317200 have 460
resulted in a loss of gametocyte production (72), neither indel was located within predicted 461
certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted June 27, 2019. ; https://doi.org/10.1101/684175doi: bioRxiv preprint
20
AP2 domains, and so their biological relevance remains to be determined. 7G8, NF66.C8, 462
and NF135.C10 also have numerous non-synonymous mutations and indels within 463
putative AP2 genes, a small number of which fell in predicted AP2 domains. Finally, 464
NF135.C10 has an insertion almost 200 base-pairs in length relative to 3D7 in the 3’ end 465
of PfAP2-G; the insertion also carries a premature stop codon, leading to a considerably 466
different C-terminal end for the transcription factor (Figure S7). This allele is also present 467
in previously published assemblies for clones from Southeast Asia (33), including the 468
culture-adapted strain DD2, and variations of this indel (without the in-frame stop codon) 469
are also found in several non-human malaria Plasmodium species) (Figure S7), 470
suggesting an interesting evolutionary trajectory of this sequence. 471
Given that no absolute correlates of protection are known for whole organism P. 472
falciparum vaccines, genetic differences were assessed both across the genome and in 473
pre-erythrocytic genes of interest in the three heterologous CHMI strains. As expected, 474
the number of mutations between 3D7 and these three PfSPZ strains was much higher 475
than observed for NF54, with ~40K-55K SNPs and as many indels in each pairwise 476
comparison, roughly randomly distributed among intergenic regions, silent and non-477
synonymous sites (Table 1, Figure 2), and corresponding to a pairwise SNP density 478
relative to 3D7 of 1.9, 2.1 and 2.2 SNPs/Kbp for 7G8, NF166.C8 and NF135.C10, 479
respectively. NF135.C10 had the highest number of unique SNPs genome-wide (SNPs 480
not shared with other PfSPZ strains), with 5% more unique SNPs than NF166.C8 and 481
33% more than 7G8 (Figure S8). Similar trends were seen when restricting the analyses 482
to non-synonymous SNPs (5% and 26% more than NF166.C8 and 7G8, respectively). 483
The lower number of unique SNPs in 7G8 may be due in part to the smaller genome size 484
certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted June 27, 2019. ; https://doi.org/10.1101/684175doi: bioRxiv preprint
21
of this strain. Small indels with length multiple of three (but not two) were also common in 485
coding regions across the genome, as expected from purifying selection on mutations 486
that disrupt the reading frame, whereas indels in non-coding regions were primarily of 487
lengths multiple of two, from unequal crossover or replication slippage in the regions of 488
dinucleotide repeats common in the P. falciparum genome (Figure S9), confirming what 489
has been shown in previous work using short-read whole genome sequencing data (73). 490
SNPs were also common in a panel of 42 pre-erythrocytic genes known or 491
suspected to be implicated in immunity to liver-stage parasites (see Methods; Table S7). 492
While the sequence of all these loci was identical between NF54 and 3D7, there was a 493
wide range in the number of sequence variants per locus between 3D7 and the other 494
three PfSPZ strains, with some genes being more conserved than others. For example, 495
PfCSP showed 8, 7 and 6 non-synonymous mutations in 7G8, NF166.C8, and 496
NF135.C10, respectively, relative to 3D7. However, PfLSA-1 had over 100 non-497
synonymous mutations in all three heterologous strains relative to 3D7 (many in the 498
repetitive region of this gene), in addition to significant length differences in the internal 499
repeat region (Figure S10). 500
Immunological relevance of genetic variation among PfSPZ strains 501
The sequence variants mentioned above may impact the ability of the immune 502
system primed with NF54 to recognize the other PfSPZ strains, impairing vaccine efficacy 503
against heterologous CHMI. Data from murine and non-human primate models 504
(5,28,29,74) demonstrate that CD8+ T cells are required for protective efficacy; therefore, 505
the identification of shared and unique CD8+ T cell epitopes across the genome in all four 506
PfSPZ strains may help interpret the differential efficacy seen in heterologous relative to 507
certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted June 27, 2019. ; https://doi.org/10.1101/684175doi: bioRxiv preprint
22
homologous CHMI. To create a comparable set of genes in which we were confident of 508
the amino acid sequence, we subset the 5,548 protein-coding genes in the 3D7 genome 509
annotation to 4,814 genes that (1) did not have premature stop codons, 2) were annotated 510
with an expected number of mRNA fragments, and (3) met the above requirements in all 511
four PfSPZ strains (in practice, this removed most, if not all, members of multi-gene 512
families, such as vars, rifins, stevors) (Figure 3A). Strong-binding MHC class I epitopes 513
in the protein sequences from these loci were identified using in silico epitope predictions 514
based on HLA types common in sub-Saharan Africa populations (Table S1). 515
Similar total number of epitopes (sum of unique epitopes, regardless of the HLA-516
type, across genes) were identified in the three heterologous CHMI strains, with 517
NF166.C8 having the fewest (492,291) and NF135.C10 having the most (495,285) 518
(Figure 3B). More epitopes were identified in NF54 (n=512,172) than in the three 519
heterologous CHMI strains. This slightly higher number might be due to a more accurate 520
mapping of the 3D7 genome annotation to the NF54 assembly compared to the other 521
three strains (Figure 3A). More than 80% of epitopes detected in each strain were 522
identical (by sequence and by locus) across all four strains, reflecting epitopes predicted 523
in conserved regions of the 4,814 genes used in this analysis. NF54 had the highest 524
proportion of unique epitopes (9,002, or 1.76%) (Figure 3A), i.e., epitopes not shared 525
with the other PfSPZ strains, followed by NF135.C10 (4,123 or 0.83%), 7G8 (3,500 or 526
0.71%), and NF166.C8 (2,878, or 0.58%). After direct comparisons of NF54 to each 527
heterologous PfSPZ strain, NF166 had the lowest proportion of epitopes not shared with 528
NF54 (1.97%, compared to 2.24% and 2.27% for 7G8 and NF135.C10, respectively), as 529
certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted June 27, 2019. ; https://doi.org/10.1101/684175doi: bioRxiv preprint
23
would follow from a positive correlation between geographic distance and genetic 530
similarity. 531
To focus the analysis on genes that might be important for the protective efficacy 532
of a whole sporozoite vaccine, the above epitope predictions were stratified by two 533
additional gene sets: 1) 1,974 genes that were found to be expressed in the sporozoite 534
stage of the parasite in mosquito salivary glands (54,55), and (2) the 42 pre-erythrocytic 535
genes of interest mentioned above. In the sporozoite-expressed gene set, similar patterns 536
as in the genome-wide gene set above were seen, with NF135.C10 being the 537
heterologous CHMI strain that had the highest number of epitopes not shared with any 538
other strain (1,243, or 0.6%), and NF166 having the smallest (884, or 0.4%) (Figure 3C). 539
Within the 42 pre-erythrocytic gene set, there were 117, 121, and 153 such unique 540
epitopes present in 7G8, NF166.C8, and NF135.C10, respectively (Figure S11, Table 541
S8). Genes where unique epitopes were identified in this gene set were also, on average, 542
more polymorphic (Table S7). 543
Some of these variations in epitope sequences are relevant for the interpretation 544
of the outcome of PfSPZ Vaccine trials. For example, while all four strains are identical in 545
sequence composition in a B cell epitope potentially relevant for protection recently 546
identified in the circumsporozoite protein (PfCSP, PF3D7_0304600) (75), another B cell 547
epitope that partially overlaps it (76) contained an A98G amino acid difference in 7G8 and 548
NF135.C10 relative to NF54 and NF166.C8. There was also variability in CD8+ T cell 549
epitopes recognized in the Th2R region of the protein, some known for decades (77). 550
Specifically, the PfCSP encoded by the 3D7/NF54 allele was predicted to bind to both 551
HLA-A and HLA-C allele types, but the homologous protein segments in NF166.C8 and 552
certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted June 27, 2019. ; https://doi.org/10.1101/684175doi: bioRxiv preprint
24
NF135.C10 were only recognized by HLA-A allele types, and no epitope was detected 553
(within the HLA types studied) at that position in PfCSP encoded in 7G8 (Figure 4). 554
Expansion of the analyses to additional HLA types revealed an allele (HLA-08:01) that is 555
predicted to bind to the Th2R region of the 7G8-encoded PfCSP; however, HLA-08:01 is 556
much more frequent in European populations (10-15%) than in African populations (1-557
6%) (51). Therefore, if CD8+ T cell epitopes in the Th2R region of 7G8 are important for 558
protection, which is currently unknown, the level of protection against CHMI with 7G8 559
observed in volunteers of European descent may not be correlated with PfSPZ Vaccine 560
efficacy in Africa. Indels and repeat regions also sometimes affected the number of 561
predicted epitopes in each antigen; for example, an insertion in 7G8 near amino acid 562
residue 1,600 in PfLISP-2 (PF3D7_0405300) contained additional predicted epitopes 563
(Figure S12). Similar patterns in variation in epitope recognition and frequency were 564
found in other pre-erythrocytic genes of interest, including PfLSA-3 (PF3D7_0220000), 565
PfAMA-1 (PF3D7_1133400) and PfTRAP (PF3D7_1335900) (Figure S12). 566
PfSPZ strains and global parasite diversity 567
The four PfSPZ strains have been adapted and kept in culture for extended periods 568
of time. To determine if they are still representative of the malaria-endemic regions from 569
which they were collected, we compared these strains to over 600 recent (2007-2014) 570
clinical isolates from South America, Africa, Southeast Asia, and Oceania (Supplemental 571
Dataset 4), using principal coordinates analysis (PCoA) based on SNP calls generated 572
from Illumina whole genome sequencing data. The results confirmed the existence of 573
global geographic differences in genetic variation previously reported (78,79), including 574
clustering by continent, as well as a separation of East from West Africa and of the 575
certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted June 27, 2019. ; https://doi.org/10.1101/684175doi: bioRxiv preprint
25
Amazonian region from that West of the Andes (Figure 5). The PfSPZ strains clustered 576
with others from their respective geographic regions, both at the genome-wide level and 577
when restricting the data set to SNPs in the panel of 42 pre-erythrocytic antigens, despite 578
the long-term culturing of some of these strains (Figure 5). An admixture analysis of 579
South American and African clinical isolates confirmed that NF54 and NF166.C8 both 580
have the genomic background characteristic of West Africa, while 7G8 is clearly a South 581
American strain (Figure S13). 582
NF135.C10 was isolated in the early 1990s (14), at a time when resistance to 583
chloroquine and sulfadoxine-pyrimethamine resistance was entrenched and resistance 584
to mefloquine was emerging (80,81), and carries signals from this period of drug pressure. 585
Four copies of MDR-1 were identified in NF135.C10 (Table S9); however, two of these 586
copies appeared to have premature stop codons introduced by SNPs and/or indels, 587
leaving potentially only two functional copies in the genome. While NF135.C10 also had 588
numerous point mutations relative to 3D7 in genes such as PfCRT (conveying 589
chloroquine resistance), and PfDHPS and PfDHR (conveying sulfadoxine-pyrimethamine 590
resistance), NF135.C10 was isolated before the widespread deployment of artemisinin-591
based combination therapies (ACTs), and had the wild-type allele in the locus that 592
encodes the kelch protein in chromosome 13 (PfK13), with no mutations known to convey 593
artemisinin resistance detected in the propeller region (Table S10). 594
The emergence in Southeast Asia of resistance to antimalarial drugs, including 595
artemisinins and drugs used in artemisinin-based combination treatments, is thought to 596
underlie the complex and dynamic parasite population structure in the region (82). 597
Several relatively homogeneous subpopulations, whose origin is likely linked to the 598
certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted June 27, 2019. ; https://doi.org/10.1101/684175doi: bioRxiv preprint
26
emergence and rapid spread of drug resistance mutations, exist in parallel with a sensitive 599
subpopulation that reflects the ancestral population in the region (referred to as KH1), 600
and another subpopulation of admixed genomic background (referred to as KHA), 601
possibly the source of the drug-resistant subpopulations or the result of a secondary mix 602
of resistant subpopulations (61,62,83,84). This has been accompanied by reports of 603
individual K13 mutations conferring artemisinin resistance occurring independently on 604
multiple genomic backgrounds (85). To determine the subpopulation to which NF135.C10 605
belongs, an admixture analysis was conducted using isolates from Southeast Asia and 606
Oceania, including NF135.C10. Eleven total populations were detected, of which seven 607
contained Cambodian isolates (Figure 6). Both admixture and hierarchical clustering 608
analyses suggest that NF135.C10 is representative of the previously described admixed 609
KHA subpopulation (61,62) (Figure 6), implying that NF135.C10 is representative of a 610
long-standing admixed population of parasites in Cambodia rather than one of several 611
subpopulations thought to have arisen more recently following the emergence of drug 612
resistance. 613
Discussion 614
Whole organism sporozoite vaccines have provided variable levels of protection in 615
initial clinical trials; the radiation-attenuated PfSPZ Vaccine has been shown to protect 616
>90% of subjects against homologous CHMI at 3 weeks after the last dose in 5 clinical 617
trials in the U.S. (6,8) and Germany (11). However, efficacy has been lower against 618
heterologous CHMI (8,9), and in field studies in a region of intense transmission, in Mali, 619
at 24 weeks (10). Interestingly, for the exact same immunization regimen, protective 620
efficacy by proportional analysis was greater in the field trial in Mali (29%) than it was 621
certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted June 27, 2019. ; https://doi.org/10.1101/684175doi: bioRxiv preprint
27
against heterologous CHMI with Pf 7G8 in the US at 24 weeks after last dose of vaccine 622
(8%) (8,10). While evidence shows that whole organism-based vaccine efficacy can be 623
improved by adjusting the vaccine dose and schedule (11), further optimization of such 624
vaccines will be facilitated by a thorough understanding of the genotypic and immunologic 625
differences among the PfSPZ strains and between them and parasites in malaria endemic 626
regions. 627
A recent study examined whole genome short-read sequencing data to 628
characterize NF166.C8 and NF135.C10 through SNP calls, and identified a number of 629
non-synonymous mutations at a few loci potentially important for the efficacy of CPS, the 630
foundation for PfSPZ-CVac (17). The analyses described here, using high-quality de novo 631
genome assemblies, expand the analysis to hard-to-call regions, such as those 632
containing gene families, repeats and other low complexity sequences. The added 633
sensitivity enabled the thorough genomic characterization of these and additional 634
vaccine-related strains, and revealed a considerably higher number of sequence variants 635
than can be called using short read data alone, as well as indels and structural variants 636
between assemblies. For example, the insertion close to the 3’ end of PfAP2-G detected 637
in NF135.C10 and shared by DD2 has not, to the best of our knowledge, been reported 638
before, despite the multiple studies highlighting the importance of this gene in sexual 639
commitment in P. falciparum strains, including DD2 (70). Long-read sequencing also 640
confirmed that differences observed between NF54 and 3D7 in a major liver stage 641
antigen, PfLSA-1, result from one of a small number of errors lingering in the reference 642
3D7 genome, which is being continually updated and improved (34). Confirmation that 643
NF54 and 3D7 are identical at this locus is critical when 3D7 has been used as a 644
certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted June 27, 2019. ; https://doi.org/10.1101/684175doi: bioRxiv preprint
28
homologous CHMI in whole sporozoite, NF54-based vaccine studies. Furthermore, the 645
comprehensive sequence characterization of variant surface antigen-encoding loci, such 646
as PfEMP1-encoding genes, will enable the use of the PfSPZ strains to study the role of 647
these protein families in virulence, naturally acquired immunity and vaccine-induced 648
protection (86). 649
The comprehensive genetic and genomic studies reported herein were designed 650
to provide insight into the outcome of homologous and heterologous CHMI studies, and 651
to determine whether the CHMI strains can be used as a proxy for strains present in the 652
field. Comparison of genome assemblies confirmed that NF54 and 3D7 have remained 653
genetically very similar over time, and that 3D7 is an appropriate homologous CHMI 654
strain. As expected, 7G8, NF166.C8, and NF135.C10 were genetically very distinct from 655
NF54 and 3D7, with thousands of differences across the genome including dozens in 656
known pre-erythrocytic antigens. The identification of sequence variants (both SNPs and 657
indels) within transcriptional regulators, such as the AP2 family, may assist in the study 658
of different growth phenotypes in these strains. NF166.C8 and NF135.C10 merozoites 659
enter the blood stream several days earlier than those of NF54 (15), suggesting that NF54 660
may develop more slowly in hepatocytes than do the other two strains. Therefore, 661
mutations in genes associated with liver-stage development (as was observed with 662
PfAP2-L) may be of interest to explore further. Finally, comparison of the PfSPZ strains 663
to whole genome sequencing data from clinical isolates shows that, at the whole genome 664
level, they are indeed representative of their geographical regions of origin. 665
These results can assist in the interpretation of CHMI studies in multiple ways. 666
First, in the 42 pre-erythrocytic genes of interest, the Brazilian strain 7G8, one of the first 667
certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted June 27, 2019. ; https://doi.org/10.1101/684175doi: bioRxiv preprint
29
strains to be used in heterologous CHMI studies, and the Guinean strain NF166.C8 share 668
a similar number of epitopes with NF54. Furthermore, both NF166.C8 and NF135.C10 669
encode more unique non-synonymous SNPs genome-wide (19% and 33%, respectively) 670
and epitopes in these genes relative to NF54 than does 7G8. These results show that the 671
practice of equating geographic distance to genetic differentiation is not always valid, and 672
that the interpretation of CHMI studies should rest upon thorough genome-wide 673
comparisons. Lastly, of all PfSPZ strains, NF135.C10 is the most genetically distinct from 674
NF54, including in point mutations, unique epitopes, and its placement within the global 675
parasite population. We do not know whether the peptide sequences of the actual epitope 676
targets of vaccine-induced protective immunity are equally distinct in NF135.C10. 677
However, if proteome-wide genetic divergence is the primary determinant of differences 678
in protection against different parasites, the extent to which NF54-based immunization 679
protects against CHMI with NF135.C10 is important in understanding the ability of PfSPZ 680
Vaccine and other whole-organism malaria vaccines to protect against diverse parasites 681
present world-wide. These conclusions are drawn from genome-wide analyses and from 682
subsets of genes for which a role in whole-sporozoite-induced protection is suspected but 683
not experimentally established. Conclusive statements regarding cross-protection will 684
require the additional knowledge of the genetic basis of whole-organism vaccine 685
protection. 686
Without more information on the epitope targets of protective immunity induced by 687
PfSPZ vaccines, it is difficult to rationally design multi-strain PfSPZ vaccines. However, 688
these data and analysis approach has the potential to be applied to rationally design multi-689
strain sporozoite-based vaccines once knowledge of those critical epitope sequences is 690
certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted June 27, 2019. ; https://doi.org/10.1101/684175doi: bioRxiv preprint
30
available, and characterization of a variety of P. falciparum strains may facilitate the 691
development of region-specific or multi-strain vaccines with greater protective efficacy. 692
Support for a genomics-guided approach to guide such next-generation vaccines can be 693
found in other whole organism parasitic vaccines. Field trials testing the efficacy of first-694
generation whole killed-parasite vaccines against Leishmania had highly variable results 695
(87). While most studies failed to show protection, indicating that killed, whole-cell 696
vaccines for leishmaniasis may not produce the necessary protective response, a trial 697
demonstrating significant protection utilized a multi-strain vaccine, with strains collected 698
from the immediate area of the trial (88), highlighting the importance of understanding the 699
distribution of genetic diversity in pathogen populations. In addition, a highly efficacious 700
non-attenuated, three-strain, whole organism vaccine exists against Theileria parva, a 701
protozoan parasite that causes East coast fever in cattle. This vaccine, named Muguga 702
Cocktail, consists of a mix of three live strains of T. parva that are administered in an 703
infection-and-treatment method, similar to the approach utilized by PfSPZ-CVac. It has 704
been shown recently that two of the strains are genetically very similar, possibly clones 705
of the same isolates (89). Despite this, the vaccine remains highly efficacious and in high 706
demand (90). In addition, the third vaccine strain in the Muguga Cocktail is quite distinct 707
from the other two, with ~5 SNPs/kb (89), or about twice the SNP density seen between 708
NF54 and other PfSPZ strains. These observations suggest that an efficacious multi-709
strain vaccine against a highly variable parasite species does not need to be contain a 710
large number of strains, but that the inclusion of highly divergent strains may be 711
warranted. These results also speak to the promise of multi-strain vaccines against highly 712
diverse pathogens, including apicomplexans with large genomes and complex life cycles. 713
certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted June 27, 2019. ; https://doi.org/10.1101/684175doi: bioRxiv preprint
31
Next-generation whole genome sequencing technology has opened many 714
avenues for infectious disease research and holds great promise for informing vaccine 715
design. While most malaria vaccine development has occurred before the implementation 716
of regular use of whole genome sequencing, the tools now available allow the precise 717
characterization and informed selection of vaccine strains early in the development 718
process. These results will greatly assist in the interpretation of clinical trials using these 719
strains for CHMI. 720
References and Notes: 721
722
1. World Malaria Report 2018. Geneva: World Health Organization; 2018 p. 1–210. 723
2. Bhatt S, Weiss DJ, Cameron E, Bisanzio D, Mappin B, Dalrymple U, et al. The 724
effect of malaria control on Plasmodium falciparum in Africa between 2000 and 725
2015. Nature. 2015 Oct 8;526(7572):207–11. 726
3. Delemarre BJ, van der Kaay HJ. Tropical malaria contracted the natural way in the 727
Netherlands. Ned Tijdschr Geneeskd. 1979 Nov 17;123(46):1981–2. 728
4. Hoffman SL, Billingsley PF, James E, Richman A, Loyevsky M, Li T, et al. 729
Development of a metabolically active, non-replicating sporozoite vaccine to 730
prevent Plasmodium falciparum malaria. Hum Vaccin. 2010 Jan;6(1):97–106. 731
5. Epstein JE, Tewari K, Lyke KE, Sim BKL, Billingsley PF, Laurens MB, et al. Live 732
Attenuated Malaria Vaccine Designed to Protect Through Hepatic CD8+ T Cell 733
Immunity. Science. 2011 Oct 28;334(6055):475–80. 734
6. Seder RA, Chang L-J, Enama ME, Zephir KL, Sarwar UN, Gordon IJ, et al. 735
Protection Against Malaria by Intravenous Immunization with a Nonreplicating 736
Sporozoite Vaccine. Science. 2013 Sep 20;341(6152):1359–65. 737
7. Ishizuka AS, Lyke KE, DeZure A, Berry AA, Richie TL, Mendoza FH, et al. 738
Protection against malaria at 1 year and immune correlates following PfSPZ 739
vaccination. Nat Med. 2016 Jun;22(6):614–23. 740
8. Epstein JE, Paolino KM, Richie TL, Sedegah M, Singer A, Ruben AJ, et al. 741
Protection against Plasmodium falciparum malaria by PfSPZ Vaccine. JCI Insight 742
[Internet]. 2017 Jan 12 [cited 2017 Jan 14];2(1). Available from: 743
http://insight.jci.org/articles/view/89154 744
certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted June 27, 2019. ; https://doi.org/10.1101/684175doi: bioRxiv preprint
32
9. Lyke KE, Ishizuka AS, Berry AA, Chakravarty S, DeZure A, Enama ME, et al. 745
Attenuated PfSPZ Vaccine induces strain-transcending T cells and durable 746
protection against heterologous controlled human malaria infection. Proc Natl Acad 747
Sci. 2017 Mar 7;114(10):2711–6. 748
10. Sissoko MS, Healy SA, Katile A, Omaswa F, Zaidi I, Gabriel EE, et al. Safety and 749
efficacy of PfSPZ Vaccine against Plasmodium falciparum via direct venous 750
inoculation in healthy malaria-exposed adults in Mali: a randomised, double-blind 751
phase 1 trial. Lancet Infect Dis. 2017 May 1;17(5):498–509. 752
11. Mordmüller B, Surat G, Lagler H, Chakravarty S, Ishizuka AS, Lalremruata A, et al. 753
Sterile protection against human malaria by chemoattenuated PfSPZ vaccine. 754
Nature. 2017 Feb 23;542(7642):445–9. 755
12. Vaughan AM, Sack BK, Dankwa D, Minkah N, Nguyen T, Cardamone H, et al. A 756
Plasmodium Parasite with Complete Late Liver Stage Arrest Protects against 757
Preerythrocytic and Erythrocytic Stage Infection in Mice. Infect Immun. 2018;86(5). 758
13. Mendes AM, Machado M, Gonçalves-Rosa N, Reuling IJ, Foquet L, Marques C, et 759
al. A Plasmodium berghei sporozoite-based vaccination platform against human 760
malaria. NPJ Vaccines [Internet]. 2018 [cited 2019 Jun 25];3. Available from: 761
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6109154/ 762
14. Teirlinck AC, Roestenberg M, Vegte-Bolmer M van de, Scholzen A, Heinrichs MJL, 763
Siebelink-Stoter R, et al. NF135.C10: A New Plasmodium falciparum Clone for 764
Controlled Human Malaria Infections. J Infect Dis. 2013 Feb 15;207(4):656–60. 765
15. McCall MBB, Wammes LJ, Langenberg MCC, Gemert G-J van, Walk J, Hermsen 766
CC, et al. Infectivity of Plasmodium falciparum sporozoites determines emerging 767
parasitemia in infected volunteers. Sci Transl Med. 2017 Jun 21;9(395):eaag2490. 768
16. Schats R, Bijker EM, van Gemert G-J, Graumans W, van de Vegte-Bolmer M, van 769
Lieshout L, et al. Heterologous Protection against Malaria after Immunization with 770
Plasmodium falciparum Sporozoites. PLoS ONE. 2015 May 1;10(5):e0124243. 771
17. Walk J, Reuling IJ, Behet MC, Meerstein-Kessel L, Graumans W, van Gemert G-J, 772
et al. Modest heterologous protection after Plasmodium falciparum sporozoite 773
immunization: a double-blind randomized controlled clinical trial. BMC Med. 2017 774
Sep 13;15:168. 775
18. Volkman SK, Hartl DL, Wirth DF, Nielsen KM, Choi M, Batalov S, et al. Excess 776
Polymorphisms in Genes for Membrane Proteins in Plasmodium falciparum. 777
Science. 2002 Oct 4;298(5591):216–8. 778
19. Thera MA, Doumbo OK, Coulibaly D, Laurens MB, Ouattara A, Kone AK, et al. A 779
Field Trial to Assess a Blood-Stage Malaria Vaccine. N Engl J Med. 2011 Sep 780
15;365(11):1004–13. 781
certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted June 27, 2019. ; https://doi.org/10.1101/684175doi: bioRxiv preprint
33
20. Ouattara A, Takala-Harrison S, Thera MA, Coulibaly D, Niangaly A, Saye R, et al. 782
Molecular Basis of Allele-Specific Efficacy of a Blood-Stage Malaria Vaccine: 783
Vaccine Development Implications. J Infect Dis. 2013 Feb;207(3):511–9. 784
21. Neafsey DE, Juraska M, Bedford T, Benkeser D, Valim C, Griggs A, et al. Genetic 785
Diversity and Protective Efficacy of the RTS,S/AS01 Malaria Vaccine. N Engl J 786
Med. 2015 Oct 21;0(0):null. 787
22. Takala SL, Plowe CV. Genetic diversity and malaria vaccine design, testing, and 788
efficacy: Preventing and overcoming “vaccine resistant malaria.” Parasite Immunol. 789
2009 Sep;31(9):560–73. 790
23. Barry AE, Arnott A. Strategies for Designing and Monitoring Malaria Vaccines 791
Targeting Diverse Antigens. Front Immunol [Internet]. 2014 Jul 28;5. Available 792
from: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4112938/ 793
24. Preston MD, Campino S, Assefa SA, Echeverry DF, Ocholla H, Amambua-Ngwa A, 794
et al. A barcode of organellar genome polymorphisms identifies the geographic 795
origin of Plasmodium falciparum strains. Nat Commun [Internet]. 2014 Jun 13 [cited 796
2015 Jun 22];5. Available from: 797
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4082634/ 798
25. Drakeley CJ, Duraisingh MT, Póvoa M, Conway DJ, Targett GAT, Baker DA. 799
Geographical distribution of a variant epitope of Pfs4845, a Plasmodium falciparum 800
transmission-blocking vaccine candidate. Mol Biochem Parasitol. 1996 Oct 801
30;81(2):253–7. 802
26. Walliker D, Quakyi IA, Wellems TE, McCutchan TF, Szarfman A, London WT, et al. 803
Genetic Analysis of the Human Malaria Parasite Plasmodium falciparum. Science. 804
1987 Jun 26;236(4809):1661–6. 805
27. Delves MJ, Straschil U, Ruecker A, Miguel-Blanco C, Marques S, Dufour AC, et al. 806
Routine in vitro culture of P. falciparum gametocytes to evaluate novel 807
transmission-blocking interventions. Nat Protoc. 2016 Sep;11(9):1668–80. 808
28. Schofield L, Villaquiran J, Ferreira A, Schellekens H, Nussenzweig R, 809
Nussenzweig V. γ Interferon, CD8+ T cells and antibodies required for immunity to 810
malaria sporozoites. Nature. 1987 Dec 23;330(6149):664–6. 811
29. Weiss WR, Sedegah M, Beaudoin RL, Miller LH, Good MF. CD8+ T cells 812
(cytotoxic/suppressors) are required for protection in mice immunized with malaria 813
sporozoites. Proc Natl Acad Sci. 1988 Jan 1;85(2):573–6. 814
30. Chattopadhyay R, Pratt D. Role of controlled human malaria infection (CHMI) in 815
malaria vaccine development: A U.S. food & drug administration (FDA) 816
perspective. Vaccine. 2017 May 15;35(21):2767–9. 817
certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted June 27, 2019. ; https://doi.org/10.1101/684175doi: bioRxiv preprint
34
31. Gardner MJ, Hall N, Fung E, White O, Berriman M, Hyman RW, et al. Genome 818
sequence of the human malaria parasite Plasmodium falciparum. Nature. 2002 Oct 819
3;419(6906):498–511. 820
32. Vembar SS, Seetin M, Lambert C, Nattestad M, Schatz MC, Baybayan P, et al. 821
Complete telomere-to-telomere de novo assembly of the Plasmodium falciparum 822
genome through long-read (>11 kb), single molecule, real-time sequencing. DNA 823
Res. 2016 Jun 26;dsw022. 824
33. Otto TD, Böhme U, Sanders M, Reid A, Bruske EI, Duffy CW, et al. Long read 825
assemblies of geographically dispersed Plasmodium falciparum isolates reveal 826
highly structured subtelomeres. Wellcome Open Res [Internet]. 2018 May 3 [cited 827
2018 Jul 16];3. Available from: 828
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5964635/ 829
34. Böhme U, Otto TD, Sanders M, Newbold CI, Berriman M. Progression of the 830
canonical reference malaria parasite genome from 2002–2019. Wellcome Open 831
Res [Internet]. 2019 Mar 29 [cited 2019 Jun 1];4. Available from: 832
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6484455/ 833
35. Yeda R, Ingasia LA, Cheruiyot AC, Okudo C, Chebon LJ, Cheruiyot J, et al. The 834
Genotypic and Phenotypic Stability of Plasmodium falciparum Field Isolates in 835
Continuous In Vitro Culture. PLoS ONE. 2016 Jan 11;11(1):e0143565. 836
36. Oyola SO, Ariani CV, Hamilton WL, Kekre M, Amenga-Etego LN, Ghansah A, et al. 837
Whole genome sequencing of Plasmodium falciparum from dried blood spots using 838
selective whole genome amplification. Malar J. 2016;15:597. 839
37. Agrawal S, Moser KA, Morton L, Cummings MP, Parihar A, Dwivedi A, et al. 840
Association of a Novel Mutation in the Plasmodium falciparum Chloroquine 841
Resistance Transporter With Decreased Piperaquine Sensitivity. J Infect Dis. 2017 842
Aug 15;216(4):468–76. 843
38. Trager W, Jensen JB. Human malaria parasites in continuous culture. Science. 844
1976 Aug 20;193(4254):673–5. 845
39. Koren S, Walenz BP, Berlin K, Miller JR, Bergman NH, Phillippy AM. Canu: 846
scalable and accurate long-read assembly via adaptive k-mer weighting and repeat 847
separation. Genome Res. 2017 Mar 15;gr.215087.116. 848
40. Chin C-S, Alexander DH, Marks P, Klammer AA, Drake J, Heiner C, et al. 849
Nonhybrid, finished microbial genome assemblies from long-read SMRT 850
sequencing data. Nat Methods. 2013 Jun;10(6):563–9. 851
41. Walker BJ, Abeel T, Shea T, Priest M, Abouelliel A, Sakthikumar S, et al. Pilon: An 852
Integrated Tool for Comprehensive Microbial Variant Detection and Genome 853
Assembly Improvement. PLoS ONE. 2014 Nov 19;9(11):e112963. 854
certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted June 27, 2019. ; https://doi.org/10.1101/684175doi: bioRxiv preprint
35
42. Delcher AL, Kasif S, Fleischmann RD, Peterson J, White O, Salzberg SL. 855
Alignment of whole genomes. Nucleic Acids Res. 1999 Jan 1;27(11):2369–76. 856
43. Nattestad M, Schatz MC. Assemblytics: a web analytics tool for the detection of 857
variants from an assembly. Bioinformatics. 2016 Oct 1;32(19):3021–3. 858
44. Sedlazeck FJ, Rescheneder P, Smolka M, Fang H, Nattestad M, Haeseler A, et al. 859
Accurate detection of complex structural variations using single-molecule 860
sequencing. Nat Methods. 2018 Apr 30;1. 861
45. Nattestad M, Chin C-S, Schatz MC. Ribbon: Visualizing complex genome 862
alignments and structural variation. bioRxiv. 2016 Oct 20;082123. 863
46. Dara A, Drábek EF, Travassos MA, Moser KA, Delcher AL, Su Q, et al. New var 864
reconstruction algorithm exposes high var sequence diversity in a single 865
geographic location in Mali. Genome Med. 2017;9:30. 866
47. Rask TS, Hansen DA, Theander TG, Pedersen AG, Lavstsen T. Plasmodium 867
falciparum Erythrocyte Membrane Protein 1 Diversity in Seven Genomes – Divide 868
and Conquer. PLOS Comput Biol. 2010 Sep 16;6(9):e1000933. 869
48. Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nat 870
Methods. 2012 Apr;9(4):357–9. 871
49. Bankevich A, Nurk S, Antipov D, Gurevich AA, Dvorkin M, Kulikov AS, et al. 872
SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell 873
Sequencing. J Comput Biol. 2012 May;19(5):455–77. 874
50. Nielsen M, Andreatta M. NetMHCpan-3.0; improved prediction of binding to MHC 875
class I molecules integrating information from multiple receptor and peptide length 876
datasets. Genome Med. 2016 Mar 30;8:33. 877
51. González-Galarza FF, Takeshita LYC, Santos EJM, Kempson F, Maia MHT, Silva 878
ALS da, et al. Allele frequency net 2015 update: new features for HLA epitopes, 879
KIR and disease and HLA adverse drug reaction associations. Nucleic Acids Res. 880
2015 Jan 28;43(D1):D784–8. 881
52. de Pablo R, García-Pacheco J m., Vilches C, Moreno M e., Sanz L, Rementería M 882
c., et al. HLA class I and class II allele distribution in the Bubi population from the 883
island of Bioko (Equatorial Guinea). Tissue Antigens. 1997 Dec 1;50(6):593–601. 884
53. Cao K, Moormann A m., Lyke K e., Masaberg C, Sumba O p., Doumbo O k., et al. 885
Differentiation between African populations is evidenced by the diversity of alleles 886
and haplotypes of HLA class I loci. Tissue Antigens. 2004 Apr 1;63(4):293–325. 887
54. Lasonder E, Janse CJ, Gemert G-J van, Mair GR, Vermunt AMW, Douradinha BG, 888
et al. Proteomic Profiling of Plasmodium Sporozoite Maturation Identifies New 889
certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted June 27, 2019. ; https://doi.org/10.1101/684175doi: bioRxiv preprint
36
Proteins Essential for Parasite Development and Infectivity. PLOS Pathog. 2008 890
Oct 31;4(10):e1000195. 891
55. Lindner SE, Swearingen KE, Harupa A, Vaughan AM, Sinnis P, Moritz RL, et al. 892
Total and Putative Surface Proteomics of Malaria Parasite Salivary Gland 893
Sporozoites. Mol Cell Proteomics. 2013 May 1;12(5):1127–43. 894
56. Felgner P, Hoffman SL, Seder R. Serum antibody assay for determining protection 895
from malaria, and pre-erythrocytic subunit vaccines [Internet]. US20160216276A1, 896
2016 [cited 2019 May 20]. Available from: 897
https://patents.google.com/patent/US20160216276A1/en 898
57. McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, et al. The 899
Genome Analysis Toolkit: A MapReduce framework for analyzing next-generation 900
DNA sequencing data. Genome Res. 2010 Sep 1;20(9):1297–303. 901
58. DePristo MA, Banks E, Poplin R, Garimella KV, Maguire JR, Hartl C, et al. A 902
framework for variation discovery and genotyping using next-generation DNA 903
sequencing data. Nat Genet. 2011 May;43(5):491–8. 904
59. Van der Auwera GA, Carneiro MO, Hartl C, Poplin R, del Angel G, Levy-Moonshine 905
A, et al. From FastQ data to high confidence variant calls: the Genome Analysis 906
Toolkit best practices pipeline. Curr Protoc Bioinforma Ed Board Andreas 907
Baxevanis Al. 2013 Oct 15;11(1110):11.10.1-11.10.33. 908
60. Alexander DH, Novembre J, Lange K. Fast model-based estimation of ancestry in 909
unrelated individuals. Genome Res. 2009 Sep 1;19(9):1655–64. 910
61. Miotto O, Almagro-Garcia J, Manske M, MacInnis B, Campino S, Rockett KA, et al. 911
Multiple populations of artemisinin-resistant Plasmodium falciparum in Cambodia. 912
Nat Genet. 2013 Jun;45(6):648–55. 913
62. Dwivedi A, Reynes C, Kuehn A, Roche DB, Khim N, Hebrard M, et al. Functional 914
analysis of Plasmodium falciparum subpopulations associated with artemisinin 915
resistance in Cambodia. Malar J. 2017 Dec 19;16:493. 916
63. Carneiro MO, Russ C, Ross MG, Gabriel SB, Nusbaum C, DePristo MA. Pacific 917
biosciences sequencing technology for genotyping and variation discovery in 918
human data. BMC Genomics. 2012 Aug 5;13:375. 919
64. Zanghì G, Vembar SS, Baumgarten S, Ding S, Guizetti J, Bryant JM, et al. A 920
Specific PfEMP1 Is Expressed in P. falciparum Sporozoites and Plays a Role in 921
Hepatocyte Infection. Cell Rep. 2018 Mar 13;22(11):2951–63. 922
65. Mu J, Myers RA, Jiang H, Liu S, Ricklefs S, Waisberg M, et al. Plasmodium 923
falciparum genome-wide scans for positive selection, recombination hot spots and 924
resistance to antimalarial drugs. Nat Genet. 2010 Mar;42(3):268–71. 925
certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted June 27, 2019. ; https://doi.org/10.1101/684175doi: bioRxiv preprint
37
66. Guerin-Marchand C, Druilhe P, Galey B, Londono A, Patarapotikul J, Beaudoin RL, 926
et al. A liver-stage-specific antigen of Plasmodium falciparum characterized by 927
gene cloning. Nature. 1987 Sep 10;329(6135):164–7. 928
67. Fidock DA, Gras-Masse H, Lepers JP, Brahimi K, Benmohamed L, Mellouk S, et al. 929
Plasmodium falciparum liver stage antigen-1 is well conserved and contains potent 930
B and T cell determinants. J Immunol. 1994 Jul 1;153(1):190–204. 931
68. Claessens A, Affara M, Assefa SA, Kwiatkowski DP, Conway DJ. Culture 932
adaptation of malaria parasites selects for convergent loss-of-function mutants. Sci 933
Rep. 2017 Jan 24;7:41303. 934
69. Gebru T, Lalremruata A, Kremsner PG, Mordmüller B, Held J. Life-span of in vitro 935
differentiated Plasmodium falciparum gametocytes. Malar J. 2017 Aug 11;16:330. 936
70. Kafsack BFC, Rovira-Graells N, Clark TG, Bancells C, Crowley VM, Campino SG, 937
et al. A transcriptional switch underlies commitment to sexual development in 938
malaria parasites. Nature. 2014 Mar 13;507(7491):248–52. 939
71. Iwanaga S, Kaneko I, Kato T, Yuda M. Identification of an AP2-family Protein That 940
Is Critical for Malaria Liver Stage Development. PLOS ONE. 2012 Nov 941
7;7(11):e47557. 942
72. Ikadai H, Saliba KS, Kanzok SM, McLean KJ, Tanaka TQ, Cao J, et al. Transposon 943
mutagenesis identifies genes essential for Plasmodium falciparum 944
gametocytogenesis. Proc Natl Acad Sci. 2013 Apr 30;110(18):E1676–84. 945
73. Miles A, Iqbal Z, Vauterin P, Pearson R, Campino S, Theron M, et al. Indels, 946
structural variation, and recombination drive genomic diversity in Plasmodium 947
falciparum. Genome Res. 2016 Sep 1;26(9):1288–99. 948
74. Doolan DL, Hoffman SL. The Complexity of Protective Immunity Against Liver-949
Stage Malaria. J Immunol. 2000 Aug 1;165(3):1453–62. 950
75. Kisalu NK, Idris AH, Weidle C, Flores-Garcia Y, Flynn BJ, Sack BK, et al. A human 951
monoclonal antibody prevents malaria infection by targeting a new site of 952
vulnerability on the parasite. Nat Med. 2018 Apr;24(4):408–16. 953
76. Tan J, Sack BK, Oyen D, Zenklusen I, Piccoli L, Barbieri S, et al. A public antibody 954
lineage that potently inhibits malaria infection by dual binding to the 955
circumsporozoite protein. Nat Med. 2018 May;24(4):401–7. 956
77. Doolan DL, Saul AJ, Good MF. Geographically restricted heterogeneity of the 957
Plasmodium falciparum circumsporozoite protein: relevance for vaccine 958
development. Infect Immun. 1992 Feb;60(2):675–82. 959
certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted June 27, 2019. ; https://doi.org/10.1101/684175doi: bioRxiv preprint
38
78. Manske M, Miotto O, Campino S, Auburn S, Almagro-Garcia J, Maslen G, et al. 960
Analysis of Plasmodium falciparum diversity in natural infections by deep 961
sequencing. Nature. 2012 Jul 19;487(7407):375–9. 962
79. Yalcindag E, Elguero E, Arnathau C, Durand P, Akiana J, Anderson TJ, et al. 963
Multiple independent introductions of Plasmodium falciparum in South America. 964
Proc Natl Acad Sci. 2012 Jan 10;109(2):511–6. 965
80. Wongsrichanalai C, Pickard AL, Wernsdorfer WH, Meshnick SR. Epidemiology of 966
drug-resistant malaria. Lancet Infect Dis. 2002 Apr 1;2(4):209–18. 967
81. Plowe CV. The evolution of drug-resistant malaria. Trans R Soc Trop Med Hyg. 968
2009 Apr 1;103(Supplement 1):S11–4. 969
82. Parobek CM, Parr JB, Brazeau NF, Lon C, Chaorattanakawee S, Gosi P, et al. 970
Partner-Drug Resistance and Population Substructuring of Artemisinin-Resistant 971
Plasmodium falciparum in Cambodia. Genome Biol Evol. 2017 Jun 1;9(6):1673–86. 972
83. MalariaGEN Plasmodium falciparum Community (last). Genomic epidemiology of 973
artemisinin resistant malaria. eLife. 2016 Mar 4;5:e08714. 974
84. Cerqueira GC, Cheeseman IH, Schaffner SF, Nair S, McDew-White M, Phyo AP, et 975
al. Longitudinal genomic surveillance of Plasmodium falciparum malaria parasites 976
reveals complex genomic architecture of emerging artemisinin resistance. Genome 977
Biol. 2017;18:78. 978
85. Takala-Harrison S, Jacob CG, Arze C, Cummings MP, Silva JC, Dondorp AM, et 979
al. Independent Emergence of Artemisinin Resistance Mutations Among 980
Plasmodium falciparum in Southeast Asia. J Infect Dis. 2015 Mar 1;211(5):670–9. 981
86. Kraemer SM, Smith JD. A family affair: var genes, PfEMP1 binding, and malaria 982
disease. Curr Opin Microbiol. 2006 Aug 1;9(4):374–80. 983
87. Noazin S, Modabber F, Khamesipour A, Smith PG, Moulton LH, Nasseri K, et al. 984
First generation leishmaniasis vaccines: A review of field efficacy trials. Vaccine. 985
2008 Dec 9;26(52):6759–67. 986
88. Armijos RX, Weigel MM, Aviles H, Maldonado R, Racines J. Field Trial of a 987
Vaccine against New World Cutaneous Leishmaniasis in an At-Risk Child 988
Population: Safety, Immunogenicity, and Efficacy during the First 12 Months of 989
Follow-Up. J Infect Dis. 1998 May 1;177(5):1352–7. 990
89. Norling M, Bishop RP, Pelle R, Qi W, Henson S, Drábek EF, et al. The genomes of 991
three stocks comprising the most widely utilized live sporozoite Theileria parva 992
vaccine exhibit very different degrees and patterns of sequence divergence. BMC 993
Genomics. 2015;16:729. 994
certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted June 27, 2019. ; https://doi.org/10.1101/684175doi: bioRxiv preprint
39
90. Nene V, Kiara H, Lacasta A, Pelle R, Svitek N, Steinaa L. The biology of Theileria 995
parva and control of East Coast fever - Current status and future trends. Ticks Tick-996
Borne Dis. 2016;7(4):549–64. 997
998
999
Declarations 1000
Ethics approval and consent to participate: Not applicable 1001
Consent for publication: Not applicable 1002
Availability of data and material: Raw sequencing data and assemblies generated for 1003
this manuscript have been uploaded to NCBI. See Dataset 1 for BioSample 1004
identification numbers for assemblies and whole genome sequence data for all samples 1005
used in these analyses, including publicly available data. 1006
Competing interests: T.L., B.K.L.S., and S.L.H. are salaried employees of Sanaria 1007
Inc., the developer and owner of PfSPZ Vaccine, and the supplier of the PfSPZ 1008
strains sequenced in this study. In addition, S.L.H. and B.K.L.S. have a financial 1009
interest in Sanaria Inc. All other authors declare no competing financial interests. 1010
Funding: National Institutes of Health (NIH) awards U19 AI110820 and R01 AI141900, 1011
and a “Dean’s Challenge Award”, an intramural program from the University of Maryland 1012
School of Medicine, provided funds to sequence the PfSPZ strains and Pf clinical 1013
isolates, and support for K.A.M., E.F.D., A.D., M.A., A.O., K.L., L.S., L.T., M.T., S.T.H., 1014
C.M.F., C.V.P. and J.C.S.. A.O. and C.V.P. were also supported by the Howard Hughes 1015
Medical Institute. The parasites provided by Sanaria were produced in part by funding 1016
from two SBIR grants from National Institute of Allergy and Infectious Diseases, NIH, 1017
2R44AI058375 and 5R44AI055229-09A1. Collection of clinical isolates that were 1018
sequenced for this study were funded by U19 AI089683, grants from CNPq 1019
(590106/2011-2), grants from the Armed Forces Health Surveillance Center/Global 1020
certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted June 27, 2019. ; https://doi.org/10.1101/684175doi: bioRxiv preprint
40
Emerging Infections Surveillance and Response System (WR1576 and WR2017), the 1021
NIH International Centers of Excellence in Malaria Research program (U19 AI089681) 1022
(Brazil), and the Howard Hughes Medical Institute. S.K. and A.M.P. were supported by 1023
the Intramural Research Program of the National Human Genome Research Institute, 1024
NIH (1ZIAHG200398). P.T.R. was supported by a scholarship from the Conselho 1025
Nacional de Desenvolvimento Cientifico e Tecnologico (CNPq), which also provided a 1026
senior researcher scholarship to M.U.F. 1027
Authors' contributions: K.A.M. designed the study, performed analyses, and wrote the 1028
paper. J.C.S., C.V.P., and S.L.H. designed the study, contributed samples and funding, 1029
and wrote the paper. E.F.D., A.D., J.C., E.M.S., A.D., S.K., A.M.P, and A.O. contributed 1030
analyses. Z.S., M.A., T.L., L.S., L.T. preformed sample preparation and sequencing. 1031
P.T.R., K.E.L., M.D.S., K.J., C.L., D.L.S., M.U.F., M.N., M.K.L., M.T., R.W.S., S.T.H., 1032
C.M.F, and B.K.L.S contributed samples, funding, and wrote the paper. 1033
Acknowledgements: We would like to thank the study teams that assisted in sample 1034
collections in Mali, Malawi, Cambodia, Thailand, and Myanmar. We would like to 1035
acknowledge Drs. Terrie Taylor and Don Mathanga for their assistance with collection of 1036
clinical isolates in Malawi. 1037
Authors' information (optional) 1038
Additional author information for M. D. S, K. J., C. L., D. L. S (affiliated with the Armed 1039
Forces Research Institute of Medical Sciences): Material has been reviewed by the 1040
Walter Reed Army Institute of Research. There is no objection to its presentation 1041
and/or publication. The opinions or assertions contained herein are the private views of 1042
the author, and are not to be construed as official, or as reflecting true views of the 1043
certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted June 27, 2019. ; https://doi.org/10.1101/684175doi: bioRxiv preprint
41
Department of the Army or the Department of Defense. The investigators have adhered 1044
to the policies for protection of human subjects as prescribed in AR 70–25. 1045
Tables 1046
Table 1: The PfSPZ strains differ from 3D7 in terms of genome size and base-pair 1047
content. Characteristics of the PacBio assembly for each strain (first four columns), 1048
with the 3D7 reference genome are shown for comparison. Single-nucleotide 1049
polymorphisms (SNPs) and indels in each PfSPZ assembly as compared to 3D7 are 1050
shown for coding and non-coding regions (last five columns). 1051
Strain # of
Nuclear Contigs1
Cumulative Length2
N503
SNPs4 Indels
Non-coding
Syn. Non-Syn.
Non-coding
Coding
3D7 14 23,332,831 1,687,656 - - - - -
NF54 28 23,426,028 1,527,116 1,278 25 80 8,166 230
7G8 20 22,801,099 1,455,033 22,694 5,675 15,490 40,668 6,730
NF166.C8 30 23,306,751 1,610,926 27,117 6,636 17,258 40,457 6,802
NF135.C10 19 23,540,633 1,631,396 28,791 6,535 18,141 41,435 7,029 1Contigs: number of pieces of continuous sequence in the final assembly. 1052 2Cumulative length: total length of the contigs. 1053 3N50: length of the contig which, along with all contigs larger than itself, contain 50% of the assembly 1054 (larger numbers indicate a more complete assembly). 1055 4Syn SNPs: synonymous SNPs; Non-Syn SNPs: non-synonymous SNPs. 1056
1057
1058
certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted June 27, 2019. ; https://doi.org/10.1101/684175doi: bioRxiv preprint
42
Figures 1059
1060 Figure 1: PacBio Assemblies for each PfSPZ strain reconstruct entire 1061
chromosomes in one-three continuous pieces. To determine the likely position of 1062
each non-reference contig on the 3D7 reference genome, mummer’s show-tiling program 1063
was used with relaxed settings (-g 100000 -v 50 -i 50) to align contigs to 3D7 1064
chromosomes (top). 3D7 nuclear chromosomes (1-14) are shown in grey, arranged from 1065
smallest to largest, along with organelle genomes (M=mitochondrion, A=apicoplast). 1066
Contigs from each PfSPZ assembly (NF54: black, 7G8: green, NF166.C8: orange, 1067
NF135.C10: hot pink) are shown aligned to their best 3D7 match. A small number of 1068
contigs could not be unambiguously mapped to the 3D7 reference genome (unmapped). 1069
1070
1071
1072
1073
1074
certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted June 27, 2019. ; https://doi.org/10.1101/684175doi: bioRxiv preprint
43
1075
Figure 2: Distribution of polymorphisms in PfSPZ PacBio assemblies. Single 1076
nucleotide polymorphism (SNP) densities (log SNPs/ 10kb) are shown for each 1077
assembly; the scale [0-3] refers to the range of the log-scaled SNP density graphs - 1078
from 100 to 103. Inner tracks, from outside to inside, are NF54 (black), 7G8 (green), 1079
NF166.C8 (orange), and NF135.C10 (hot pink). The outermost tracks are the 3D7 1080
reference genome nuclear chromosomes (chrm1 to chrm 14, in blue), followed by 3D7 1081
genes on the forward and reverse strand (black tick marks). 1082
certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted June 27, 2019. ; https://doi.org/10.1101/684175doi: bioRxiv preprint
44
Figure 3: Predicted CD8+ T cell epitopes from amino acid sequences of the four PfSPZ strains. Using amino acid sequences from a subset of the 5,548 protein-coding genes in the 3D7 annotation (PlasmoDBv24) that were deemed to be correctly annotated between all four PfSPZ strains, CD8+ T cell epitopes were predicted in silico using netMHCpan. A. Density plot of amino acid sequence length distributions of 4,814 genes used for in silico epitope predictions. Overall distribution of gene lengths was extremely similar between all four strains, with the exception of the three heterologous CHMI strains having a slightly higher number of amino acid sequences that weren’t quite as long as they were in the NF54 genome, possibly a reflection of difficulty in mapping annotations to genetically diverse genomes. B. Shared and unique epitope-gene combinations in the 4,814 genes. C. Shared and unique epitope-gene combinations in a subset of the 4,814 genes (n=1,974) that have evidence of being expressed in the sporozoite stage of the parasite in the mosquito salivary glands.
certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted June 27, 2019. ; https://doi.org/10.1101/684175doi: bioRxiv preprint
45
Figure 4: Predicted CD8+ T cell epitopes in the P. falciparum circumsporozoite protein (PfCSP). Protein domain
information based on the 3D7 reference sequence of PfCSP is found in the first track, and the following tracks are
epitopes predicted in the PfCSP sequences of NF54, 7G8, NF166.C8, and NF135.C10, respectively. Each box in the
track is a sequence that was identified as an epitope, and the colors of each box represent the HLA type that identified the
epitope.
certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted June 27, 2019. ; https://doi.org/10.1101/684175doi: bioRxiv preprint
46
Figure 5: Global diversity of clinical isolates and PfSPZ strains. Principal coordinates analyses (PCoA) of clinical isolates (n=654) from malaria endemic regions and PfSPZ strains were conducted using biallelic non-synonymous SNPs across the entire genome (left, n=31,761) and in a panel of 42 pre-erythrocytic genes of interest (right, n=1,060). For the genome-wide dataset, coordinate 1 separated South American and African isolates from Southeast Asian and Papua New Guinean isolates (27.6% of variation explained), coordinate 2 separated African isolates from South American isolates (10.7%), and coordinate 3 separated Southeast Asian isolates from Papua New Guinea (PNG) isolates (3.0%). Similar trends were found for the first two coordinates seen for the pre-erythrocytic gene data set (27.1 and 12.6%, respectively), but coordinate 3 separated isolates from all three regions (3.8%). In both datasets, NF54 (black cross) and NF166.C8 (orange cross) cluster with West African isolates (isolates labeled in red and dark orange colors), 7G8 (bright green cross) cluster with isolates from South America (greens and browns), and NF135.C10 (hot pink cross) clusters with isolates from Southeast Asia (purples and blues).
certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted June 27, 2019. ; https://doi.org/10.1101/684175doi: bioRxiv preprint
47
.
Figure 6: NF135.C10 is part of an admixed population of clinical isolates from
Southeast Asia. Top: admixture plots for clinical isolates from Myanmar (n=16),
Thailand (n=34), Cambodia (n=109), Papua New Guinea (PNG, n=34) and NF135.C10
are shown. Each sample is a column, and the height of the different colors in each
column corresponds to the proportion of the genome assigned to each K population by
the model. Bottom: hierarchical clustering of the Southeast Asian isolates used in the
admixture analysis (n=193; colored by their assigned subpopulation) and previously
characterized Cambodian isolates (n=167; black; labels correspond to clusters identified
in reference 61) place NF135.C10 (star) with samples from the previously identified
KHA admixed population (shown in grey dashed box). Each node ends with a sample,
and the y-axis represents distance between clusters.
certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted June 27, 2019. ; https://doi.org/10.1101/684175doi: bioRxiv preprint