| Literature DB >> 22863735 |
Shin-Ichiro Tachibana1, Steven A Sullivan, Satoru Kawai, Shota Nakamura, Hyunjae R Kim, Naohisa Goto, Nobuko Arisue, Nirianne M Q Palacpac, Hajime Honma, Masanori Yagi, Takahiro Tougan, Yuko Katakai, Osamu Kaneko, Toshihiro Mita, Kiyoshi Kita, Yasuhiro Yasutomi, Patrick L Sutton, Rimma Shakhbatyan, Toshihiro Horii, Teruo Yasunaga, John W Barnwell, Ananias A Escalante, Jane M Carlton, Kazuyuki Tanabe.
Abstract
P. cynomolgi, a malaria-causing parasite of Asian Old World monkeys, is the sister taxon of P. vivax, the most prevalent malaria-causing species in humans outside of Africa. Because P. cynomolgi shares many phenotypic, biological and genetic characteristics with P. vivax, we generated draft genome sequences for three P. cynomolgi strains and performed genomic analysis comparing them with the P. vivax genome, as well as with the genome of a third previously sequenced simian parasite, Plasmodium knowlesi. Here, we show that genomes of the monkey malaria clade can be characterized by copy-number variants (CNVs) in multigene families involved in evasion of the human immune system and invasion of host erythrocytes. We identify genome-wide SNPs, microsatellites and CNVs in the P. cynomolgi genome, providing a map of genetic variation that can be used to map parasite traits and study parasite populations. The sequencing of the P. cynomolgi genome is a critical step in developing a model system for P. vivax research and in counteracting the neglect of P. vivax.Entities:
Mesh:
Year: 2012 PMID: 22863735 PMCID: PMC3759362 DOI: 10.1038/ng.2375
Source DB: PubMed Journal: Nat Genet ISSN: 1061-4036 Impact factor: 38.330
Comparison of genome features between P. cynomolgi, P. vivax and P. knowlesi, three species of the monkey malaria clade.
| Feature | ||||
|---|---|---|---|---|
| Size (Mb) | 26.2 | 26.9 | 23.7 | |
| No. scaffolds | 14 (1,649) | 14 (2,547) | 14 (67) | |
| Coverage (fold) | 161 | 10 | 8 | |
| G+C content (%) | 40.4 | 42.3 | 38.8 | |
| No. genes | 5,722 | 5,432 | 5,197 | |
| Mean gene length | 2,240 | 2,164 | 2,180 | |
| Gene density (bp per gene) | 4,428.2 | 4,950.5 | 4,416.1 | |
| Percentage coding | 51.0 | 47.1 | 49.0 | |
| No. tRNA genes | 43 | 44 | 41 | |
| No. 5S rRNA genes | 3 | 3 | 0 | |
| No. 5.8S/18S/28S rRNA units | 7 | 7 | 5 | |
| No. chromosomes | 14 | 14 | 14 | |
| No. centromeres | 14 | 14 | 14 | |
| Isochore structure | Yes | Yes | No | |
| Size (bp) | 5,986 (AB444123) | 5,990 (AY598140) | 5,958 (AB444108) | |
| G+C content (%) | 30.3 | 30.5 | 30.5 | |
| Size (bp) | 29,297 | 5,064 | Not available | |
| G+C content (%) | 13.0 | 17.1 | Not available | |
Small unassigned contigs indicated in parentheses
Sequence gaps excluded
Regions of the genome that differ in their density and are separable by CsCl centrifugation; isochores correspond to domains differing in their GC content
Not present in P. knowlesi assembly version 4.0
Identified in other studies, see Accession Numbers
Partial sequence (~86% complete) identified during this project
Partial sequence of reference genome only published[12]; actual size is ~35 kb
Figure 1Architecture of the P. cynomolgi genome and associated genome-wide variation data. Each of the 14 P. cynomolgi chromosomes is indicated, and one chromosome slice is shown annotated in the center. The six co-centric rings represent: (1) Outer ring: localization of 5,049 P. cynomolgi genes excluding those on small contigs (cyan lines); (2) Second ring: genome features including 14 centromeres (thick black lines), 43 telomeric sequence repeats (short red lines), 43 tRNA genes (red lines), 10 rRNAs (dark blue lines), and several gene family members including: 53 cyir (dark green lines), 8 RBP (brown lines), 13 SERA (serine-rich antigen; pink lines), 25 TRAG (tryptophan-rich antigen; purple lines), 12 MSP3 (merozoite surface protein 3; light grey lines), 13 MSP7 (merozoite surface protein 7; grey lines), 25 RAD (silver lines), 8 etramp (orange lines), 16 Pf-fam-b (light blue lines), 7 Pv-fam-d (light green lines lines); (3) Third ring: plot of Ds-Dn for 4,605 orthologs depicting genome-wide (i) polymorphism within P. cynomolgi strains B and Berok (black line); and (ii) divergence between P. cynomolgi strains B and Berok, and P. vivax Salvador I (red line); a track above the plot indicates P. cynomolgi genes under positive selection (red) and purifying selection (blue), and a track below the plot indicates P. cynomolgi/P. vivax orthologs under positive selection (red) and purifying selection (blue); (4) Fourth ring: heat map indicating SNP density of three P. cynomolgi strains plotted per 10 kb window: red, 0–83 SNPs/10 kb (regions of lowest SNP density); blue, 84–166 SNPs/10 kb; green, 166–250 SNPs/10 kb; purple, 251–333 SNPs/10 kb; orange, 334–416 SNPs/10 kb; yellow, 417–500 SNPs/10 kb (regions of highest SNP density); (5) Fifth ring: log2 ratio plot of CNVs identified from a comparison of P. cynomolgi strain B strain with Berok; and (6) Inner ring: map of 182 polymorphic intergenic microsatellites (MS; black dots). Figure was generated using Circos software (see URLs).
Figure 2Genome synteny between six species of Plasmodium parasite. Protein coding genes of P. cynomolgi are shown aligned with those of five other Plasmodium genomes: two species belonging to the monkey malaria clade P. vivax and P. knowlesi, two species of rodent malaria P. berghei and P. chabaudi, and P. falciparum. Highly conserved protein coding regions between the genomes are colored in order from red (5’-end of chromosome 1) to blue (3’-end of chromosome 14) after the genomic position of P. cynomolgi. A scale in Mb is shown on top of each genome alignment. This genome-wide view of synteny identified two apparent errors in existing public sequence databases: an inversion in chromosome 3 of P. knowlesi, and an inversion in chromosome 6 of P. vivax.
Multigene families of P. cynomolgi, P. vivax and P. knowlesi differ in their copy number.
| # | Multigene family | Localization | Arrangement | Putative function & | |||
|---|---|---|---|---|---|---|---|
| 1 | pir (vir-like) | subtelomeric | scattered/clustered | 254 | 319 | 4 | Immune evasion |
| 2 | pir (kir-like) | subtelomeric/central | scattered/clustered | 11 | 2 | 66 | Immune evasion |
| 3 | SICAvar | subtelomeric/central | scattered/clustered | 2 | 1 | 242 | Antigenic variation, immune evasion |
| 4 | msp3 | central | clustered | 12 | 12 | 3 | Merozoite surface protein |
| 5 | msp7 | central | clustered | 13 | 13 | 5 | Merozoite surface protein |
| 6 | DBL (DBP/EBL) | subtelomeric | scattered | 2 | 1 | 3 | Host cell recognition |
| 7 | RBL (RBP/NBP/Rh) | subtelomeric | scattered | 8 | 10 | 3 | Host cell recognition |
| 8 | Pv-fam-a (PvTRAG) | subtelomeric | scattered/clustered | 36 | 36 | 26 | Tryptophan-rich |
| 9 | Pv-fam-b | central | clustered | 3 | 6 | 1 | Unknown |
| 10 | Pv-fam-c | subtelomeric | unknown | 1 | 7 | 0 | Unknown |
| 11 | Pv-fam-d (HYPB) | subtelomeric | scattered | 18 | 16 | 2 | Unknown |
| 12 | Pv-fam-e (RAD) | subtelomeric | clustered | 27 | 44 | 16 | Unknown |
| 13 | Pv-fam-g | central | clustered | 3 | 3 | 3 | Unknown |
| 14 | Pv-fam-h (HYP16) | central | clustered | 6 | 4 | 2 | Unknown |
| 15 | Pv-fam-i (HYP11) | subtelomeric | scattered | 6 | 6 | 5 | Unknown |
| 16 | Pk-fam-a | central | scattered | 0 | 0 | 12 | Unknown |
| 17 | Pk-fam-b | subtelomeric | scattered | 0 | 0 | 9 | Unknown |
| 18 | Pk-fam-c | subtelomeric | scattered | 0 | 0 | 6 | Unknown |
| 19 | Pk-fam-d | central | scattered | 0 | 0 | 3 | Unknown |
| 20 | Pk-fam-e | subtelomeric | scattered | 0 | 0 | 3 | Unknown |
| 21 | PST-A | subtelomeric/central | scattered | 9 | 11 | 7 | Alpha beta hydrolase |
| 22 | ETRAMP | subtelomeric | scattered | 9 | 9 | 9 | Parasitophorous vacuole membrane |
| 23 | CLAG (RhopH-1) | subtelomeric | scattered | 2 | 3 | 2 | High MW rhoptry antigen complex |
| 24 | PvSTP1 | subtelomeric | unknown | 3 | 10 | 0 | Unknown |
| 25 | PHIST (Pf-fam-b) | subtelomeric | scattered/clustered | 21 | 20 | 15 | Unknown |
| 26 | SERA | central | clustered | 13 | 13 | 8 | Cysteine protease |
Pseudogenes, truncated genes and gene fragments included.
Gene arrangement could not be determined due to localization on unassigned contigs.
Figure 3A comparison of the genes of P. cynomolgi, P. vivax and P. knowlesi. The three ellipses represent the three genomes, with total number of genes assigned to chromosomes indicated under the species name. The Venn diagram delineates orthologous and non-orthologous genes between the three genomes, with the number of genes in each indicated and represented graphically by a cylinder of proportional width. In each cylinder, genes are divided into three categories (putatively known function, hypothetical, and members of multigene families) represented by colored bands proportional to their percentage.