| Literature DB >> 31915075 |
Kara A Moser1,2, Elliott F Drábek1, Ankit Dwivedi1, Emily M Stucke3, Jonathan Crabtree1, Antoine Dara3, Zalak Shah3, Matthew Adams3, Tao Li4, Priscila T Rodrigues5, Sergey Koren6, Adam M Phillippy6, James B Munro1, Amed Ouattara3, Benjamin C Sparklin1, Julie C Dunning Hotopp1, Kirsten E Lyke3, Lisa Sadzewicz1, Luke J Tallon1, Michele D Spring7, Krisada Jongsakul7, Chanthap Lon7, David L Saunders7,8, Marcelo U Ferreira5, Myaing M Nyunt3,9, Miriam K Laufer3, Mark A Travassos3, Robert W Sauerwein10, Shannon Takala-Harrison3, Claire M Fraser1, B Kim Lee Sim4, Stephen L Hoffman4, Christopher V Plowe3,9, Joana C Silva11,12.
Abstract
BACKGROUND: Plasmodium falciparum (Pf) whole-organism sporozoite vaccines have been shown to provide significant protection against controlled human malaria infection (CHMI) in clinical trials. Initial CHMI studies showed significantly higher durable protection against homologous than heterologous strains, suggesting the presence of strain-specific vaccine-induced protection. However, interpretation of these results and understanding of their relevance to vaccine efficacy have been hampered by the lack of knowledge on genetic differences between vaccine and CHMI strains, and how these strains are related to parasites in malaria endemic regions.Entities:
Keywords: Genome assembly; Malaria; P. falciparum; PfSPZ vaccine; Whole-sporozoite vaccine
Mesh:
Substances:
Year: 2020 PMID: 31915075 PMCID: PMC6950926 DOI: 10.1186/s13073-019-0708-9
Source DB: PubMed Journal: Genome Med ISSN: 1756-994X Impact factor: 11.117
Fig. 1PacBio Assemblies for each PfSPZ strain reconstruct entire chromosomes in one to three continuous pieces. To determine the likely position of each non-reference contig on the 3D7 reference genome, MUMmer’s show-tiling program was used with relaxed settings (-g 100000 -v 50 -i 50) to align contigs to 3D7 chromosomes (top). 3D7 nuclear chromosomes [1–14] are shown in gray, arranged from smallest to largest, along with organelle genomes (M = mitochondrion, A = apicoplast). Contigs from each PfSPZ assembly (NF54: black, 7G8: green, NF166.C8: orange, NF135.C10: hot pink) are shown aligned to their best 3D7 match. A small number of contigs could not be unambiguously mapped to the 3D7 reference genome (unmapped)
The PfSPZ strains differ from the 3D7 in genome size and sequence. Characteristics of the Pacio assembly for each strain (first four columns), with the Pf 3D7 reference genome shown for comparison (italics). Single nucleotide polymorphisms (SNPs) and indels in each PfSPZ assembly as compared to 3D7, both genome-wide (All) or restricted to the core genome
| Strain | # of nuclear contigs1 | Cumulative length2 | N503 | SNPs | Indels | ||
|---|---|---|---|---|---|---|---|
| All | Core4 | All | Core4 | ||||
| NF54 | 28 | 23,404,633 | 1,527,116 | 1383 | 816 | 8396 | 7848 |
| 7G8 | 20 | 22,807,193 | 1,455,033 | 43,859 | 23,980 | 47,398 | 44,059 |
| NF166.C8 | 30 | 23,281,895 | 1,610,926 | 51,011 | 25,925 | 47,259 | 43,335 |
| NF135.C10 | 21 | 23,508,985 | 1,631,396 | 53,467 | 24,966 | 48,464 | 43,969 |
1Contigs: number of pieces of continuous sequence in the final assembly
2Cumulative length: total length of the contigs
3N50: length of the contig which, along with all contigs larger than itself, contain 50% of the assembly (larger numbers indicate a more complete assembly)
4Core genome as defined in [32]
Fig. 2Distribution of polymorphisms in PfSPZ PacBio assemblies. Single nucleotide polymorphism (SNP) densities (log SNPs/ 10 kb) are shown for each assembly; the scale [0–3] refers to the range of the log-scaled SNP density graphs—from 100 to 103. Inner tracks, from outside to inside, are NF54 (black), 7G8 (green), NF166.C8 (orange), and NF135.C10 (pink). The outermost tracks are the 3D7 reference genome nuclear chromosomes (chrm1 to chrm 14, in blue), followed by 3D7 genes on the forward and reverse strand (black tick marks). Peaks in SNP densities mostly correlate with subtelomeric regions and internal multi-gene family clusters
Fig. 3Comparison of predicted CD8+ T cell epitopes from pre-erythrocytic antigen amino acid sequences. CD8+ T cell epitopes were predicted in silico for 42 confirmed or suspected pre-erythrocytic antigens (See Additional file 2: Table S7 for a complete list of genes included in this analysis). The plot shows the number of shared or unique epitopes, as compared between different PfSPZ strain groupings. The height of the bar is the number of epitopes that fell into each intersection category, and the horizontal tracks below the bars show the PfSPZ strains that are included in that intersection. For example, the first bar represents the number of shared epitopes between NF54, 7G8, and NF135.C10. At the bottom left, colored tracks represent the total number of epitopes predicted across all genes (> 10 k for each strain). As the vast majority of predicted epitopes were shared among all four strains, that group was removed from the bar plot to achieve better visual definition for the other comparison
Fig. 4Predicted CD8+ T cell epitopes in the P. falciparum circumsporozoite protein (PfCSP). Protein domain information based on the 3D7 reference sequence of PfCSP is found in the first track. The second track are previously experimentally validated (Exp. Val.) epitopes (from [59], after removing duplicate epitope sequences and epitopes > 20 amino acids in length) and the following tracks are epitopes predicted in the PfCSP sequences of NF54, 7G8, NF166.C8, and NF135.C10, respectively. Each box is a sequence that was identified as an epitope, and colors represent the HLA type that identified the epitope. The experimentally validated epitopes do not have HLA types reflected and are simply jittered across two rows
Fig. 5Global diversity of clinical isolates and PfSPZ strains. Principal coordinate analyses (PCoA) of clinical isolates (n = 654) from malaria-endemic regions and PfSPZ strains were conducted using biallelic non-synonymous SNPs across the entire genome (left, n = 31,761) and in a panel of 42 pre-erythrocytic genes of interest (right, n = 1060). For the genome-wide dataset, coordinate 1 separated South American and African isolates from Southeast Asian and Papua New Guinean isolates (27.6% of variation explained), coordinate two separated African isolates from South American isolates (10.7%), and coordinate three separated Southeast Asian isolates from Papua New Guinea (PNG) isolates (3.0%). Similar trends were found for the first two coordinates seen for the pre-erythrocytic gene data set (27.1 and 12.6%, respectively), but coordinate three separated isolates from all three regions (3.8%). In both datasets, NF54 (black cross) and NF166.C8 (orange cross) cluster with West African isolates (isolates labeled in red and dark orange colors), 7G8 (bright green cross) cluster with isolates from South America (greens and browns), and NF135.C10 (pink cross) clusters with isolates from Southeast Asia (purples and blues)
Fig. 6NF135.C10 is part of an admixed population of clinical isolates from Southeast Asia. Top: admixture plots for clinical isolates from Myanmar (n = 16), Thailand (n = 34), Cambodia (n = 109), Papua New Guinea (PNG, n = 34), and NF135.C10 (represented by a star) are shown. Each sample is a column, and the height of the different colors in each column corresponds to the proportion of the genome assigned to each K population by the model. Bottom: hierarchical clustering of the Southeast Asian isolates used in the admixture analysis (branch and leaves colored by their assigned subpopulation) and previously characterized Cambodian isolates (n = 167, black; [64]) place NF135.C10 (star) with samples from the previously identified KHA admixed population (shown in gray dashed box). The y-axis represents distance between clusters