| Literature DB >> 28008421 |
Sarah Auburn1, Ulrike Böhme2, Sascha Steinbiss2, Hidayat Trimarsanto3, Jessica Hostetler2,4, Mandy Sanders2, Qi Gao5, Francois Nosten6,7, Chris I Newbold2,8, Matthew Berriman2, Ric N Price1,7, Thomas D Otto2.
Abstract
Plasmodium vivax is now the predominant cause of malaria in the Asia-Pacific, South America and Horn of Africa. Laboratory studies of this species are constrained by the inability to maintain the parasite in continuous ex vivo culture, but genomic approaches provide an alternative and complementary avenue to investigate the parasite's biology and epidemiology. To date, molecular studies of P. vivax have relied on the Salvador-I reference genome sequence, derived from a monkey-adapted strain from South America. However, the Salvador-I reference remains highly fragmented with over 2500 unassembled scaffolds. Using high-depth Illumina sequence data, we assembled and annotated a new reference sequence, PvP01, sourced directly from a patient from Papua Indonesia. Draft assemblies of isolates from China (PvC01) and Thailand (PvT01) were also prepared for comparative purposes. The quality of the PvP01 assembly is improved greatly over Salvador-I, with fragmentation reduced to 226 scaffolds. Detailed manual curation has ensured highly comprehensive annotation, with functions attributed to 58% core genes in PvP01 versus 38% in Salvador-I. The assemblies of PvP01, PvC01 and PvT01 are larger than that of Salvador-I (28-30 versus 27 Mb), owing to improved assembly of the subtelomeres. An extensive repertoire of over 1200 Plasmodium interspersed repeat (pir) genes were identified in PvP01 compared to 346 in Salvador-I, suggesting a vital role in parasite survival or development. The manually curated PvP01 reference and PvC01 and PvT01 draft assemblies are important new resources to study vivax malaria. PvP01 is maintained at GeneDB and ongoing curation will ensure continual improvements in assembly and annotation quality.Entities:
Keywords: Plasmodium; genome; pir; reference; subtelomere; vir; vivax
Year: 2016 PMID: 28008421 PMCID: PMC5172418 DOI: 10.12688/wellcomeopenres.9876.1
Source DB: PubMed Journal: Wellcome Open Res ISSN: 2398-502X
Features of the new P. vivax assemblies against Salvador-I.
| Genome features | PvP01
[ | PvC01 | PvT01 | Salvador-I
[ |
|---|---|---|---|---|
|
| ||||
| Assembly size (Mb) | 29.0 | 30.2 | 28.9 | 26.8 |
| Coverage (fold) | 212 | 56 | 89 | 10 |
| G + C content (%) | 39.8 | 39.2 | 39.7 | 42.3 |
| No. scaffolds
| 14 | 14 | 14 | 30 |
| No. unassigned
| 226 | 529 | 359 | 2745 |
| No. genes
[ | 6,642 | 6,690 | 6,464 | 5,433 |
| No.
| 1,212 | 1,061 | 867 | 346 |
|
| ||||
| Assembly size (bp) | 5,989 | - | - | 5,990 |
| G + C content (%) | 30.5 | - | - | 30.5 |
|
| ||||
| Assembly size (kb) | 29.6 | 27.6
[ | 6.6
[ | 5.1
[ |
| G + C content (%) | 13.3 | 12.7 | 19.7 | 17.1 |
| No. genes | 30 | 3 | 0 | 0 |
a Genome version 1.09.2016
b Published reference sequence [14]
c Including pseudogenes and partial genes, excluding non-coding RNA genes.
d Mitochdondrial genome is not present in PvT01 and PvC01
e scaffold PvC01_00_191
f scaffold PvT01_00_162
g Partial apicoplast sequence of Salavador-I reference assembly has been published (scaffolds AAKM01000417, AAKM01000371)
Figure 1. Organization of the subtelomeric regions of chromosome 12 of the PvP01 and Salvador-I P. vivax references illustrating the higher assembly quality of PvP01.
The order and orientation of the genes in the 3’ subtelomeric region of chromosomes 12 of PvP01 (top) and Salvador-I (bottom) are shown. Exons are shown in coloured boxes, with introns illustrated by linking lines. Gaps in PvP01 are indicated with a forward slash (“/”). The blue box indicates the start of the telomeric heptamer repeats. The shaded (grey) areas mark the start of the conserved core of the chromosome that shares synteny with other Plasmodium species (e.g. P. falciparum). The black box shows the syntenic area of PvP01 and Salvador-I. The last gene in this syntenic area is fragmented in Salvador-I.
Annotation changes in P. vivax P01 from 1 st of September 2015 until 27 th of September 2016.
| Annotation event type | PvP01
[ |
|---|---|
| Assigned or updated product | 408 |
| Product updated from “conserved
| 107 |
| Updated GO term | 597 |
| Linked to publication | 291 |
| All unique genes with new functional
| 608 |
| All unique genes with new structural
| 50 |
a Genome version 1.09.2016
Number of most abundant genes in the subtelomeres in the genomes of Salvador-I, PvP01, PvT01 and PvC01.
| Description | Sal-I
[ | PvP01
[ | PvC01 | PvT01 | |
|---|---|---|---|---|---|
|
| PIR protein
[ | 346 | 1212 | 1061 | 867 |
| tryptophan-rich protein
[ | 34 | 40 | 40 | 40 | |
| lysophospholipase
[ | 11 | 10 | 9 | 8 | |
| STP1 protein
[ | 9 | 10 | 11 | 3 | |
| early transcribed membrane protein (ETRAMP) | 10 | 9 | 9 | 9 | |
| Plasmodium exported protein (PHIST), unknown function
[ | 64 | 84 | 22 | 23 | |
| reticulocyte binding protein (RBP) | 9
[ | 9
[ | 9 | 8 | |
|
| Plasmodium exported proteins of unknown function
[ | 23 | 447 | 266 | 261 |
|
| n/a | 497 | 1812 | 1427 | 1219 |
Numbers include pseudogenes and partial genes
a Published reference sequence [14]
b Genome version 1.09.2016
c Other names include VIR protein and Pv-fam-c protein
d Other names include Pv-fam-a, trag and tryptophan-rich antigen
e Other names include PST-A protein
f Other names include PvSTP1
g Other names include Phist protein (Pf-fam-b) and RAD protein (Pv-fam-e)
h Includes RBP2e (PVP01_0700500) that was not present in the Salvador-I assembly. RBP1b (PVP01_0701100) is complete in PvP01. In Salvador-I RBP1b consists of two partial genes (PVX_098582, PVX_125738)
i Other names include Pv-fam-d protein and Pv-fam-c protein
Figure 2. Cluster analysis illustrating the relatedness between the PIR proteins in PvP01 versus Salvador-I (a), and the stability of the major clusters in several other P. vivax assemblies (b).
Panel a) presents a network illustrating the relatedness between the 1063 PIR proteins of PvP01 and 341 PIRs of Salvador-I (Sal-I) with length greater than 150 amino acids. The PvP01 PIRs are illustrated by black dots (nodes). The Sal-I PIRs are illustrated by coloured dots with colour-coding according to the subfamily classification of Lopez et al [23] as follows; purple = A, pink = B, pale green = C, red = D, pale blue = E, orange = G, green = H, blue = I, white = J, yellow = K , and grey = unassigned genes. Two nodes (PIRs) are connected if they have a global similarity of at least 25%. With the exception of a few proteins, the majority of Sal-I PIRs demonstrate clustering consistent with the classification of Lopez et al. Five new, interconnected clusters comprising previously unassigned Sal-I PIRs are denoted with a white “X”. In Panel b, a heat map summarises the number of PIRs assigned to the 27 major clusters (minimum 15 PIRs in total) in five geographically divergent P. vivax strains; PvP01 (Papua Indonesia), PvT01 (Thailand), PvC01 (Central China), Sal-I (El Salvador) and Brazil-I (Brazil). With the exception of Sal-I, which displayed fewer genes than the other isolates in several of the major clusters, the isolates demonstrated similar numbers of genes in most clusters.