Literature DB >> 28008421

A new Plasmodium vivax reference sequence with improved assembly of the subtelomeres reveals an abundance of pir genes.

Sarah Auburn1, Ulrike Böhme2, Sascha Steinbiss2, Hidayat Trimarsanto3, Jessica Hostetler2,4, Mandy Sanders2, Qi Gao5, Francois Nosten6,7, Chris I Newbold2,8, Matthew Berriman2, Ric N Price1,7, Thomas D Otto2.   

Abstract

Plasmodium vivax is now the predominant cause of malaria in the Asia-Pacific, South America and Horn of Africa. Laboratory studies of this species are constrained by the inability to maintain the parasite in continuous ex vivo culture, but genomic approaches provide an alternative and complementary avenue to investigate the parasite's biology and epidemiology. To date, molecular studies of P. vivax have relied on the Salvador-I reference genome sequence, derived from a monkey-adapted strain from South America. However, the Salvador-I reference remains highly fragmented with over 2500 unassembled scaffolds.  Using high-depth Illumina sequence data, we assembled and annotated a new reference sequence, PvP01, sourced directly from a patient from Papua Indonesia. Draft assemblies of isolates from China (PvC01) and Thailand (PvT01) were also prepared for comparative purposes. The quality of the PvP01 assembly is improved greatly over Salvador-I, with fragmentation reduced to 226 scaffolds. Detailed manual curation has ensured highly comprehensive annotation, with functions attributed to 58% core genes in PvP01 versus 38% in Salvador-I. The assemblies of PvP01, PvC01 and PvT01 are larger than that of Salvador-I (28-30 versus 27 Mb), owing to improved assembly of the subtelomeres.  An extensive repertoire of over 1200 Plasmodium interspersed repeat (pir) genes were identified in PvP01 compared to 346 in Salvador-I, suggesting a vital role in parasite survival or development. The manually curated PvP01 reference and PvC01 and PvT01 draft assemblies are important new resources to study vivax malaria. PvP01 is maintained at GeneDB and ongoing curation will ensure continual improvements in assembly and annotation quality.

Entities:  

Keywords:  Plasmodium; genome; pir; reference; subtelomere; vir; vivax

Year:  2016        PMID: 28008421      PMCID: PMC5172418          DOI: 10.12688/wellcomeopenres.9876.1

Source DB:  PubMed          Journal:  Wellcome Open Res        ISSN: 2398-502X


Introduction

Infection with Plasmodium vivax is associated with significant direct and indirect morbidity that impacts on the poorest communities of malarious countries, with an estimated annual global cost of $1-2.7 billion [1– 3]. Accumulating reports of drug-resistant infection and life-threatening disease underscore the urgency to reduce the burden of P. vivax and ensure its ultimate elimination [4– 8]. Efforts to contain P. vivax are constrained by a limited understanding of the parasite’s basic biology, in part owing to the inability to maintain this species in continuous ex vivo culture. Genetic studies provide an alternative approach to gain novel insights into the parasite from which epidemiological tools and therapeutic approaches can be developed for clinical application [9– 17]. The rapidly declining costs of massively parallel sequencing technologies have made it feasible to undertake whole genome sequencing of hundreds of Plasmodium isolates, with recent population genomic studies of P. vivax revealing novel antimalarial drug resistance and vaccine candidates amongst other biological features of the parasite [16, 17]. However, in order to achieve a comprehensive understanding of the structure and composition of the P. vivax genome, and to improve read mapping efforts to characterise genetic polymorphisms, a high quality reference genome(s) representative of naturally occurring patient isolates is essential. The sequences of 5 monkey-adapted strains including the Salvador-I reference [14] and drafts of Brazil-I, India-VII, North Korea and Mauritania-I [13] have provided important resources for the vivax research community to investigate the core genome of P. vivax. However, over 60% of the genes in the published Salvador-I reference [14] (prior to curation by the authors) had unknown function, limiting insight into underlying biological mechanisms. Furthermore, assembly of the subtelomeric regions is highly fragmented in these strains, with Salvador-I comprising >2500 scaffolds. A subsequent draft assembly of a Cambodian patient isolate (C127) revealed 792 genes not present in Salvador-I, including 366 new pir ( Plasmodium interspersed repeat) genes [11]. The pir genes are a highly variable multigene family present in all Plasmodium genomes investigated to date [18]. The function of pir-encoded proteins (PIRs) remains poorly understood, although recent studies suggest roles in mechanisms associated with virulence. In vitro studies of P. vivax have demonstrated PIR encoded protein mediated cytoadherence to endothelial cells [19, 20] and a P. chabaudi mouse malaria model demonstrated red blood cell-binding properties consistent with roles in invasion and/or rosette formation [21]. A further P. chabaudi study demonstrated that changes in the expression of the pir gene repertoire following mosquito passage may attenuate virulence [22]. The sequence diversity amongst the pir genes in P. vivax suggests that different subfamilies may have different functions [14]. The published Salvador-I reference sequence revealed 346 pir genes, including 80 fragments and/or pseudogenes, 10 subfamilies and 84 unassigned genes [14]. In the most recent computational classification, Lopez et al. re-classified the Salvador-I pir genes, excluding members of 3 major subfamilies (A, D and H) but including previously unassigned genes, and re-defining 39 genes as encoding PIRs rather than hypothetical proteins [23]. However, given the limited number of PIRs in Salvador-I, further characterisation is required using a reference(s) with a more complete set of genes. To address the need of the vivax research community for a P. vivax reference with more comprehensive assembly and annotation, we used Illumina genomic data to establish a reference from a Papua Indonesian patient isolate (PvP01). Since P. vivax exhibits marked regional variation in phenotypes such as duration of the dormant liver-stage, drug resistance and disease severity, we compared PvP01 to C127 and the 5 monkey-adapted strains, and generated draft assemblies of patient isolates from Thailand (PvT01) and central China (PvC01). Our sampling focuses on the Asia-Pacific region, where a large burden of P. vivax infection lies [24]. The Indonesian reference provides representation of the island of Papua - the epicentre of multidrug resistance emergence in P. vivax [8]. The draft references from Thailand and Central China provide respective representation of the Mekong region, and the temperate north where long latency phenotypes prevail [25].

Methods

Samples

Three P. vivax field isolates that were judged to be clonal infections following preliminary genomic analysis within the framework of a separate study [17] were selected for assembly. The isolates were sourced from a patient presenting at hospital in northern Australia in December 2012 with a recent travel history to Mimika Regency, Papua Indonesia (strain PvP01), and patients presenting with symptomatic infection to local clinics in Nan Province, Thailand in May 2011 (strain PvT01) and Anhui Province, China, in September 2010 (strain PvC01). Patient blood samples were leukodepleted [26], and DNA extracted using the QIAamp blood midi kit (Qiagen). All samples were collected with written informed consent from the patients within the framework of previous studies.

Ethical approval

Ethical approval was provided by the Human Research Ethics Committee of NT Department of Health and Families and Menzies School of Health Research, Darwin, Australia (HREC-09/83), the Mahidol University Faculty of Medical Technology Ethics Committee, Bangkok, Thailand (MUTM 2011-043-03), and the Institutional Review Board of Jiangsu Institute of Parasitic Diseases, Wuxi, China (IRB00004221).

Sequencing, assembly and annotation

Library preparation and sequencing was performed at the Wellcome Trust Sanger Institute. Genomic DNA was sheared into 300–500 base pair (bp) fragments using ultrasonication (Covaris). Amplification-free Illumina libraries were prepared [27] and 75 bp, 100 bp and 250 bp paired end reads were generated on the Illumina GAII, Hi-Seq 2000 v3 and MiSeq platforms respectively, following the manufacturer’s standard cluster generation and sequencing protocols [28]. Mate-pair libraries with 2–3 kilobase (kb) inserts were additionally prepared for PvP01 and PvT01, using the Illumina mate-pair library preparation kit (v2), and sequenced on the Illumina HiSeq 2500 platform. Prior to assembly, contaminating host–derived sequences were excluded by mapping against the human reference genome (GRCh37: ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/) using BWA [29] (version 0.7.4). Assemblies were prepared using velvet (version 1.2.07, parameters: -exp_cov auto -ins_length 450 -ins_length_sd 30 -cov_cutoff 8, and using for a kmer of 71) and MaSuRCA [30, 31] (version 2.0.3.1, default parameters). Post-assembly genome improvements were undertaken using a range of automated configuration tools including ABACAS (version 2), IMAGE (version 2, iterating k-mers from 71 down 31, 7 iterations), Gapfiller (version 1–11, 14 iteration, parameter n=31) and iCORN (version 2, 7 iterations). PAGIT (version 1) and REAPR (version 1.0.17) were employed to detect assembly errors [32– 38]. This was followed by visual inspection using ACT [39] to identify any further assembly anomalies. Annotation was undertaken initially using the automated algorithms, RATT (version 1) and Augustus (version 2.7, trained on 500 manually curated gene models) [38, 40, 41] and further improved by detailed manual inspection performed by an experienced genome curator. PvT01 and PvC01 were annotated using Companion, a new automated annotation tool [42]. RNA-Seq data from asexual blood stage preparations of 4 P. vivax patient isolates from Cambodia (unpublished report, Jessica Hostetler, Lia Chappell, Chanaki Amaratunga, Seila Suon, Thomas D. Otto, Rick Fairhurst and Julian C. Rayner; Accession number ERP017542) was used as supporting evidence to aid the improvement of gene models in PvP01 by manual curation. For comparative analyses, genome assemblies and gene annotations were sourced for 6 additional P. vivax strains; Salvador-I, C127, Brazil-I, India-VII, Mauritania-I and North Korea [9, 13, 14]. The published version of Salvador-I [14] presented in PlasmoDB release 9 was selected for comparison of gene annotations as the additional improvements in release 10 reflected curations performed by the authors. Companion was also used to update the annotation of four previously published genomes (Brazil-I, India-VII, Mauritania-I and North Korea).

OrthoMCL and pir analysis

Comparisons of predicted protein-coding genes between the 9 P. vivax assemblies and P. falciparum 3D7 (Pf3D7) (geneDB.org) were undertaken using OrthoMCL version 1.4 [43] using the default parameter settings. We determined core genes as 1-1 orthologous between P. vivax P01 and Pf3D7, in total 4465. Cluster analysis based on structural and sequence homology was undertaken to compare the subfamily organization of the pirs in the partial (Salvador-I) versus more complete (PvP01) reference. All PIR encoded protein sequences in Salvador-I and PvP01 with length greater than 150 amino acids and not flagged as pseudogenes were included in the analysis. Low complexity regions were excluded using the SEG program [44]. The relatedness between sequences was assessed using BLASTp (parameters -F F -e 1e-6), and the results were visualized as a network constructed in Gephi [45]. After provisional assessment of cluster resolution at different thresholds, a cut-off of 25% of the global similarity was selected for distinguishing different clusters (subfamilies). To aid comparison against the new PIRs identified in PvP01, the Salvador-I PIRs were colour-coded according to the subfamily classification proposed by Lopez et al [23]. Further investigation of the diversity and relatedness amongst the PIRs was undertaken using the PIR sets from PvP01, PvT01, PvC01, Salvador-I and Brazil-I. Exclusion of proteins with less than 150 amino acids, filtering of low complexity sequences and relatedness analysis using BLASTp were performed as described above. A network was constructed from the BLAST output using tribeMCL with an inflation of 1.5 [46]. To aid visualization, clusters with less than 15 PIRs were excluded.

Dataset validation

The PvP01 assembly was generated as a new reference sequence and is thus a higher quality, more accurately annotated assembly than PvC01 and PvT01, which were both created as draft assemblies for comparative purposes. The PvP01 assembly quality is greatly improved over the previous Salvador-I reference genome, with fragmentation reduced to <250 scaffolds amongst other features ( Table 1). At 29 megabases (Mb), the assembly is notably larger than Salvador-I (27 Mb), mainly due to newly assembled subtelomeric sequences. A complete mitochondrial sequence (5 kb) and partial apicoplast sequence (29.6 kb) are also available. As in P. falciparum [47], the apicoplast reference will facilitate efforts to identify geographic surveillance markers for P. vivax.
Table 1.

Features of the new P. vivax assemblies against Salvador-I.

Genome featuresPvP01 [a] PvC01PvT01Salvador-I [b]
Nuclear genome
Assembly size (Mb)29.030.228.926.8
Coverage (fold)212568910
G + C content (%)39.839.239.742.3
No. scaffolds assigned to chrom.14141430
No. unassigned scaffolds2265293592745
No. genes [c] 6,6426,6906,4645,433
No. pir genes1,2121,061867346
Mitochondrial genome [ d ]
Assembly size (bp)5,989--5,990
G + C content (%)30.5--30.5
Apicoplast genome
Assembly size (kb)29.627.6 [e] 6.6 [f] 5.1 [g]
G + C content (%)13.312.719.717.1
No. genes30300

a Genome version 1.09.2016

b Published reference sequence [14]

c Including pseudogenes and partial genes, excluding non-coding RNA genes.

d Mitochdondrial genome is not present in PvT01 and PvC01

e scaffold PvC01_00_191

f scaffold PvT01_00_162

g Partial apicoplast sequence of Salavador-I reference assembly has been published (scaffolds AAKM01000417, AAKM01000371)

a Genome version 1.09.2016 b Published reference sequence [14] c Including pseudogenes and partial genes, excluding non-coding RNA genes. d Mitochdondrial genome is not present in PvT01 and PvC01 e scaffold PvC01_00_191 f scaffold PvT01_00_162 g Partial apicoplast sequence of Salavador-I reference assembly has been published (scaffolds AAKM01000417, AAKM01000371) Whilst the assembly quality in the core region is high in Salvador-I [14], PvP01 displays improved gene models and has more complete subtelomeres. Figure 1 provides a schematic of the right-hand end of chromosome 12 from PvP01 and Salvador-I, illustrating the generally greater extension into the subtelomeric regions of chromosomes in PvP01. Furthermore, owing to detailed manual curation and continuous maintenance within the GeneDB framework, the level of gene annotation in the core genome of PvP01 greatly exceeds that of the other available P. vivax assemblies. The asexual stage P. vivax RNA-Seq data enabled correction of the structure of 377 genes. Of the 4577 core P. vivax genes with 1:1 orthologues in P. falciparum, 3318 genes were transcribed with RPKM (reads per kilobase of transcript per million mapped reads) values greater than 15, and contained a total of 4887 splice sites. Of these splice sites, a total of 4845 (99.1%) were confirmed by ≥ 10 reads, highlighting the high quality of the structural annotation. Whereas the published Salvador-I reference includes functions attributed to a total of 1783 (38.0%) core genes [14], we have been able to expand this to 2848 (58.6%) in PvP01, as of the latest GeneDB release (1st September 2016). Ongoing curation on PvP01 will yield further improvements to the annotation statistics, and progress is highlighted in Table 2, which summarizes annotation changes over a 12 month period between GeneDB releases in 2015 and 2016. To date, a total of 1209 genes have been identified in PvP01 that were either completely absent from Salvador-I or have arisen by splitting gene structures that were falsely joined previously ( Table 1). Although the majority of newly identified genes belong to subtelomeric gene families, we confirmed the recently identified EBP2 (erythrocyte binding protein 2, PVP01_0102300) and RBP2e (reticulocyte binding protein 2e, PVP01_0700500) genes [11]. These genes are members of families encoding proteins implicated in host cell recognition during red blood cell (RBC) invasion, and present potential vaccine targets [48– 51].
Figure 1.

Organization of the subtelomeric regions of chromosome 12 of the PvP01 and Salvador-I P. vivax references illustrating the higher assembly quality of PvP01.

The order and orientation of the genes in the 3’ subtelomeric region of chromosomes 12 of PvP01 (top) and Salvador-I (bottom) are shown. Exons are shown in coloured boxes, with introns illustrated by linking lines. Gaps in PvP01 are indicated with a forward slash (“/”). The blue box indicates the start of the telomeric heptamer repeats. The shaded (grey) areas mark the start of the conserved core of the chromosome that shares synteny with other Plasmodium species (e.g. P. falciparum). The black box shows the syntenic area of PvP01 and Salvador-I. The last gene in this syntenic area is fragmented in Salvador-I.

Table 2.

Annotation changes in P. vivax P01 from 1 st of September 2015 until 27 th of September 2016.

Annotation event typePvP01 [a]
Assigned or updated product408
Product updated from “conserved Plasmodium protein, unknown function”107
Updated GO term597
Linked to publication291
All unique genes with new functional annotations, e.g. EC number, gene name608
All unique genes with new structural annotations50

a Genome version 1.09.2016

Organization of the subtelomeric regions of chromosome 12 of the PvP01 and Salvador-I P. vivax references illustrating the higher assembly quality of PvP01.

The order and orientation of the genes in the 3’ subtelomeric region of chromosomes 12 of PvP01 (top) and Salvador-I (bottom) are shown. Exons are shown in coloured boxes, with introns illustrated by linking lines. Gaps in PvP01 are indicated with a forward slash (“/”). The blue box indicates the start of the telomeric heptamer repeats. The shaded (grey) areas mark the start of the conserved core of the chromosome that shares synteny with other Plasmodium species (e.g. P. falciparum). The black box shows the syntenic area of PvP01 and Salvador-I. The last gene in this syntenic area is fragmented in Salvador-I. a Genome version 1.09.2016 As summarised in Table 3, the comparatively high assembly quality in the subtelomeres of PvP01 greatly expanded the repertoire of genes belonging to multigene families in these chromosome regions. Notably, more than 1200 pir genes were identified in PvP01 versus 346 in Salvador-I. To generate a snapshot of the diversity and structural organization of this expanded gene family in P. vivax, we conducted cluster analysis of the PIRs in PvP01 with comparison to previous homology classifications performed by Lopez et al on the partial set of PIRs from Salvador-I [23]. As illustrated in the network diagram in Figure 2a, the main subfamily clusters defined in earlier classifications are expanded but, on addition of the new PvP01 PIRs, the clusters remained moderately stable with no pooling between or sub-structure within subfamilies. However, the new PvP01 PIRs reveal several large subfamilies containing just 1–4 Salvador-I genes that were previously unclassified ( Figure 2a). Additional investigation with the PvC01, PvT01 and Brazil-I assemblies using tribeMCL (also used in Lopez et al) confirmed the stability of the new subfamilies identified in PvP01 across a geographically divergent collection of isolates ( Figure 2b). The analysis conducted here provides a broad overview of the diversity and relatedness amongst the expanded P. vivax pir gene sets, however further investigation beyond the scope of this study will be required to provide detailed characterisation of this family and its contribution to virulence and pathophysiology.
Table 3.

Number of most abundant genes in the subtelomeres in the genomes of Salvador-I, PvP01, PvT01 and PvC01.

DescriptionSal-I [a] PvP01 [b] PvC01PvT01
Multigene family PIR protein [c] 34612121061867
tryptophan-rich protein [d] 34404040
lysophospholipase [e] 111098
STP1 protein [f] 910113
early transcribed membrane protein (ETRAMP)10999
Plasmodium exported protein (PHIST), unknown function [g] 64842223
reticulocyte binding protein (RBP)9 [h] 9 [h] 98
Other genes Plasmodium exported proteins of unknown function [i] 23447266261
Total n/a497181214271219

Numbers include pseudogenes and partial genes

a Published reference sequence [14]

b Genome version 1.09.2016

c Other names include VIR protein and Pv-fam-c protein

d Other names include Pv-fam-a, trag and tryptophan-rich antigen

e Other names include PST-A protein

f Other names include PvSTP1

g Other names include Phist protein (Pf-fam-b) and RAD protein (Pv-fam-e)

h Includes RBP2e (PVP01_0700500) that was not present in the Salvador-I assembly. RBP1b (PVP01_0701100) is complete in PvP01. In Salvador-I RBP1b consists of two partial genes (PVX_098582, PVX_125738)

i Other names include Pv-fam-d protein and Pv-fam-c protein

Figure 2.

Cluster analysis illustrating the relatedness between the PIR proteins in PvP01 versus Salvador-I (a), and the stability of the major clusters in several other P. vivax assemblies (b).

Panel a) presents a network illustrating the relatedness between the 1063 PIR proteins of PvP01 and 341 PIRs of Salvador-I (Sal-I) with length greater than 150 amino acids. The PvP01 PIRs are illustrated by black dots (nodes). The Sal-I PIRs are illustrated by coloured dots with colour-coding according to the subfamily classification of Lopez et al [23] as follows; purple = A, pink = B, pale green = C, red = D, pale blue = E, orange = G, green = H, blue = I, white = J, yellow = K , and grey = unassigned genes. Two nodes (PIRs) are connected if they have a global similarity of at least 25%. With the exception of a few proteins, the majority of Sal-I PIRs demonstrate clustering consistent with the classification of Lopez et al. Five new, interconnected clusters comprising previously unassigned Sal-I PIRs are denoted with a white “X”. In Panel b, a heat map summarises the number of PIRs assigned to the 27 major clusters (minimum 15 PIRs in total) in five geographically divergent P. vivax strains; PvP01 (Papua Indonesia), PvT01 (Thailand), PvC01 (Central China), Sal-I (El Salvador) and Brazil-I (Brazil). With the exception of Sal-I, which displayed fewer genes than the other isolates in several of the major clusters, the isolates demonstrated similar numbers of genes in most clusters.

Cluster analysis illustrating the relatedness between the PIR proteins in PvP01 versus Salvador-I (a), and the stability of the major clusters in several other P. vivax assemblies (b).

Panel a) presents a network illustrating the relatedness between the 1063 PIR proteins of PvP01 and 341 PIRs of Salvador-I (Sal-I) with length greater than 150 amino acids. The PvP01 PIRs are illustrated by black dots (nodes). The Sal-I PIRs are illustrated by coloured dots with colour-coding according to the subfamily classification of Lopez et al [23] as follows; purple = A, pink = B, pale green = C, red = D, pale blue = E, orange = G, green = H, blue = I, white = J, yellow = K , and grey = unassigned genes. Two nodes (PIRs) are connected if they have a global similarity of at least 25%. With the exception of a few proteins, the majority of Sal-I PIRs demonstrate clustering consistent with the classification of Lopez et al. Five new, interconnected clusters comprising previously unassigned Sal-I PIRs are denoted with a white “X”. In Panel b, a heat map summarises the number of PIRs assigned to the 27 major clusters (minimum 15 PIRs in total) in five geographically divergent P. vivax strains; PvP01 (Papua Indonesia), PvT01 (Thailand), PvC01 (Central China), Sal-I (El Salvador) and Brazil-I (Brazil). With the exception of Sal-I, which displayed fewer genes than the other isolates in several of the major clusters, the isolates demonstrated similar numbers of genes in most clusters. Numbers include pseudogenes and partial genes a Published reference sequence [14] b Genome version 1.09.2016 c Other names include VIR protein and Pv-fam-c protein d Other names include Pv-fam-a, trag and tryptophan-rich antigen e Other names include PST-A protein f Other names include PvSTP1 g Other names include Phist protein (Pf-fam-b) and RAD protein (Pv-fam-e) h Includes RBP2e (PVP01_0700500) that was not present in the Salvador-I assembly. RBP1b (PVP01_0701100) is complete in PvP01. In Salvador-I RBP1b consists of two partial genes (PVX_098582, PVX_125738) i Other names include Pv-fam-d protein and Pv-fam-c protein The PvP01 reference is an important new resource for the vivax research community. It will support studies of the complex subtelomeric regions and provide insights into the mechanisms by which the gene families in this region contribute to virulence-associated functions. It will also allow investigation of an array of other biological functions that will expand with continual improvements in annotation in the core genome. PvP01, PvC01 and PvT01 add new geographic locations to the collection of P. vivax assemblies, facilitating biological studies of the diversity of this phenotypically divergent species.

Data availability

The raw sequence data for PvP01, PvT01 and PvC01 can be retrieved from the European Nucleotide Archive; sample accession numbers PvP01 ERS017708, ERS312161 3kb ERS328510, PvT01 ERS055881, ERS312160 3kb ERS328509 and PvC01 ERS407449. The assemblies can be found under the study PRJEB14589. The individual accession numbers are PvP01 (chromosomes: currently in submission to EBI, files on ftp, contigs: FLZR01000001-FLZR01000226), PvT01 (chromosomes LT615239-LT615252, contigs: FLYH01000001-FLYH01000360) and PvC01 (chromsomes LT615256-LT615269, contigs: FLYI01000001-FLYI01000530). PvP01 is maintained in GeneDB: http://www.genedb.org/Homepage/PvivaxP01 and updates are synchronized to PlasmoDB. This section will be updated with accession numbers for PvP01 chromosomes onces available. This work describes de novo assemblies of three new P. vivax genomes and comparison with the reference Sal I genome and other Pv genomes. Compared with previous annotation of the reference genome, the new assembly of the PvP01 genome for an isolate from Papua Indonesia has reduced the total scaffolds from over 2500 in SalI to 226 (+14). Major improvements are in the subtelomeric regions, where a significantly increased number of pir genes have been discovered. This more in-depth study of the Pv genome and manual curation of genes provide a better resource for biological studies of the vivax parasite. Comments: Abstract: The quality of the PvP01 assembly is improved greatly over Salvador-I, with fragmentation reduced to 226 scaffolds.  Perhaps “with fragmentation reduced to 226 unassigned scaffolds in addition to the 14 chromosomal scaffolds” will be more accurate? Does the “results” section begin at “Dataset validation”? Table 1 presented comparison of the genomes the three new sequences with that of Sal I. The PvC01 and PvT01 sequences contained more assigned scaffolds – are these located mostly in the telomeric regions? A more detailed comparison of the temperate strain PvC01 with the tropical strains would be more useful.  A big-picture type perspective on the C01 and T01 would be nice. Figure 1 illustrates the extension of the assembled sequences in the subtelomeric region of chrom12 as compared to that in SalI. Are the gap junctions verified by PCR? Also, the PvP01 also has quite some gaps – how are these assembled and verified? The network presentation of the Pir genes is interesting – A link to the alignment of the sequences or a phylogenetic tree-type of presentation (as supplements) would be very useful. I have read this submission. I believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard. In this manuscript, Auburn and colleagues describes the generation and initial analysis of a new P. vivax reference sequence. The authors extensively sequenced a single-clone field isolate from Papua Indonesia using Illumina technology. Assembly of these sequence reads lead to a much less fragmented and better covered core genome sequence. Furthermore, the improved assembly and annotation resulted in a much more complete overview on the subtelomeric multi gene families in this and two other isolates from China and Thailand. Reference genomes are the foundation for all genomic, transcriptomic and proteomic studies. Therefore, this new reference genome sequence is very welcome and will undoubtedly fuel the exploration of the biology and pathogenesis of P. vivax. While it is always difficult to access the quality of such assemblies based on description only it is conceivable that the 20x increase in coverage and the use of various post-assembly improvement tools have resulted in considerably better genome sequence. Furthermore, the manual curated gene models and functional classifications bring substantial added value to this work. Overall this study is well executed and the manuscript is well-written. I have only some minor suggestions for improvement: It would be important to clarify in the manuscript why PvP01 has been chosen to be the new reference “strain”. Sequencing and annotation of multigene families is challenging. To fully exclude the possibility that the 5 new clusters of PIR proteins identified in this study are the result of incorrect sequence assembly it would be relevant to PCR amplify and sequence a representative member from each of these families. In the abstract the authors state that the new reference genome contains 226 scaffolds, while according to table 1 it appears to be 226+14. Please double-check. I do not find Table 2 particularly useful/informative. It is basically a tribute to a huge amount of work. It might not be formally required to include a subheading “Results” in Welcome Open Research data notes, but nonetheless it would be nice to know where the description of the results begins. It would bring added value to this article if Table 3 would be extended by description of all and not only the subtelomeric gene families (Table 2 in Tachibana et al., 2012 [1] could provide a nice example). Instead of the extensive footnotes an extra column could be included for alternative names. In Table 3 it is unclear if other genes includes only “Plasmodium exported protein of unknown function” or also other proteins. If there are indeed couple of hundred of these proteins encoded in the PvP01 genome and they localize to the subtelomeric regions as Figure 1 suggests, it would be perhaps relevant to discuss them as a gene family. It could even be worthwhile to perform a cluster analysis on this “gene family” similar to the one performed on PIR proteins. On Figure 2B it would be useful to indicate the correspondence between the cluster numbers of this study and the former classification (A-K). Similarly it would be informative to indicate the cluster numbers on Figure 2A. From Figure 2B it seems that cluster 5 PIR gene subfamily has expanded (substantially more numerous) in the Brazilian isolate. Something perhaps worthwhile mentioning/discussing. I have read this submission. I believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.
  51 in total

1.  A reticulocyte-binding protein complex of Plasmodium vivax merozoites.

Authors:  M R Galinski; C C Medina; P Ingravallo; J W Barnwell
Journal:  Cell       Date:  1992-06-26       Impact factor: 41.582

2.  Cloning of the Plasmodium vivax Duffy receptor.

Authors:  X D Fang; D C Kaslow; J H Adams; L H Miller
Journal:  Mol Biochem Parasitol       Date:  1991-01       Impact factor: 1.759

3.  Iterative Correction of Reference Nucleotides (iCORN) using second generation sequencing technology.

Authors:  Thomas D Otto; Mandy Sanders; Matthew Berriman; Chris Newbold
Journal:  Bioinformatics       Date:  2010-06-18       Impact factor: 6.937

Review 4.  Vivax malaria: neglected and not benign.

Authors:  Ric N Price; Emiliana Tjitra; Carlos A Guerra; Shunmay Yeung; Nicholas J White; Nicholas M Anstey
Journal:  Am J Trop Med Hyg       Date:  2007-12       Impact factor: 2.345

5.  Toward almost closed genomes with GapFiller.

Authors:  Marten Boetzer; Walter Pirovano
Journal:  Genome Biol       Date:  2012-06-25       Impact factor: 13.583

6.  Whole genome sequencing of field isolates provides robust characterization of genetic diversity in Plasmodium vivax.

Authors:  Ernest R Chan; Didier Menard; Peter H David; Arsène Ratsimbasoa; Saorin Kim; Pheaktra Chim; Catherine Do; Benoit Witkowski; Odile Mercereau-Puijalon; Peter A Zimmerman; David Serre
Journal:  PLoS Negl Trop Dis       Date:  2012-09-06

Review 7.  Why is Plasmodium vivax a neglected tropical disease?

Authors:  Jane M Carlton; Barbara J Sina; John H Adams
Journal:  PLoS Negl Trop Dis       Date:  2011-06-28

8.  Malaria morbidity in Papua Indonesia, an area with multidrug resistant Plasmodium vivax and Plasmodium falciparum.

Authors:  Muhammad Karyana; Lenny Burdarm; Shunmay Yeung; Enny Kenangalem; Noah Wariker; Rilia Maristela; Ketut Gde Umana; Ram Vemuri; Maurits J Okoseray; Pasi M Penttinen; Peter Ebsworth; Paulus Sugiarto; Nicholas M Anstey; Emiliana Tjitra; Richard N Price
Journal:  Malar J       Date:  2008-08-02       Impact factor: 2.979

9.  Whole genome sequencing of field isolates reveals a common duplication of the Duffy binding protein gene in Malagasy Plasmodium vivax strains.

Authors:  Didier Menard; Ernest R Chan; Christophe Benedet; Arsène Ratsimbasoa; Saorin Kim; Pheaktra Chim; Catherine Do; Benoit Witkowski; Remy Durand; Marc Thellier; Carlo Severini; Eric Legrand; Lise Musset; Bakri Y M Nour; Odile Mercereau-Puijalon; David Serre; Peter A Zimmerman
Journal:  PLoS Negl Trop Dis       Date:  2013-11-21

Review 10.  Global extent of chloroquine-resistant Plasmodium vivax: a systematic review and meta-analysis.

Authors:  Ric N Price; Lorenz von Seidlein; Neena Valecha; Francois Nosten; J Kevin Baird; Nicholas J White
Journal:  Lancet Infect Dis       Date:  2014-09-08       Impact factor: 25.071

View more
  60 in total

1.  Single-cell transcription analysis of Plasmodium vivax blood-stage parasites identifies stage- and species-specific profiles of expression.

Authors:  Juliana M Sà; Matthew V Cannon; Ramoncito L Caleon; Thomas E Wellems; David Serre
Journal:  PLoS Biol       Date:  2020-05-04       Impact factor: 8.029

Review 2.  Ape Origins of Human Malaria.

Authors:  Paul M Sharp; Lindsey J Plenderleith; Beatrice H Hahn
Journal:  Annu Rev Microbiol       Date:  2020-09-08       Impact factor: 15.500

3.  A Tandem Mass Spectrometry Sequence Database Search Method for Identification of O-Fucosylated Proteins by Mass Spectrometry.

Authors:  Kristian E Swearingen; Jimmy K Eng; David Shteynberg; Vladimir Vigdorovich; Timothy A Springer; Luis Mendoza; D Noah Sather; Eric W Deutsch; Stefan H I Kappe; Robert L Moritz
Journal:  J Proteome Res       Date:  2018-12-21       Impact factor: 4.466

Review 4.  Systems biology of malaria explored with nonhuman primates.

Authors:  Mary R Galinski
Journal:  Malar J       Date:  2022-06-07       Impact factor: 3.469

5.  Plasmodium simium: Population Genomics Reveals the Origin of a Reverse Zoonosis.

Authors:  Thaís C de Oliveira; Priscila T Rodrigues; Angela M Early; Ana Maria R C Duarte; Julyana C Buery; Marina G Bueno; José L Catão-Dias; Crispim Cerutti; Luísa D P Rona; Daniel E Neafsey; Marcelo U Ferreira
Journal:  J Infect Dis       Date:  2021-12-01       Impact factor: 5.226

6.  Platelet derived growth factor receptor β (PDGFRβ) is a host receptor for the human malaria parasite adhesin TRAP.

Authors:  Ryan W J Steel; Vladimir Vigdorovich; Nicholas Dambrauskas; Brandon K Wilder; Silvia A Arredondo; Debashree Goswami; Sudhir Kumar; Sara Carbonetti; Kristian E Swearingen; Thao Nguyen; Will Betz; Nelly Camargo; Bridget S Fisher; Jo Soden; Helen Thomas; Jim Freeth; Robert L Moritz; D Noah Sather; Stefan H I Kappe
Journal:  Sci Rep       Date:  2021-05-31       Impact factor: 4.379

7.  Variation in selective constraints along the Plasmodium life cycle.

Authors:  Kieran Tebben; Katie Bradwell; David Serre
Journal:  Infect Genet Evol       Date:  2021-05-08       Impact factor: 4.393

8.  In Vitro Culture, Drug Sensitivity, and Transcriptome of Plasmodium Vivax Hypnozoites.

Authors:  Nil Gural; Liliana Mancio-Silva; Alex B Miller; Ani Galstian; Vincent L Butty; Stuart S Levine; Rapatbhorn Patrapuvich; Salil P Desai; Sebastian A Mikolajczak; Stefan H I Kappe; Heather E Fleming; Sandra March; Jetsumon Sattabongkot; Sangeeta N Bhatia
Journal:  Cell Host Microbe       Date:  2018-02-22       Impact factor: 21.023

9.  Distinctive genetic structure and selection patterns in Plasmodium vivax from South Asia and East Africa.

Authors:  Ernest Diez Benavente; Emilia Manko; Jody Phelan; Monica Campos; Debbie Nolder; Diana Fernandez; Gabriel Velez-Tobon; Alberto Tobón Castaño; Jamille G Dombrowski; Claudio R F Marinho; Anna Caroline C Aguiar; Dhelio Batista Pereira; Kanlaya Sriprawat; Francois Nosten; Robert Moon; Colin J Sutherland; Susana Campino; Taane G Clark
Journal:  Nat Commun       Date:  2021-05-26       Impact factor: 14.919

Review 10.  Malaria in the 'Omics Era'.

Authors:  Mirko Pegoraro; Gareth D Weedall
Journal:  Genes (Basel)       Date:  2021-05-30       Impact factor: 4.096

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.