| Literature DB >> 32849480 |
Awtum M Brashear1, Adam C Huckaby2, Qi Fan3, Luke J Dillard2, Yubing Hu4, Yuling Li4, Yan Zhao4, Zenglei Wang5, Yaming Cao4, Jun Miao1, Jennifer L Guler2, Liwang Cui1.
Abstract
Plasmodium vivax is increasingly the dominant species of malaria in the Greater Mekong Subregion (GMS), which is pursuing regional malaria elimination. P. vivax lineages in the GMS are poorly characterized. Currently, P. vivax reference genomes are scarce due to difficulties in culturing the parasite and lack of high-quality samples. In addition, P. vivax is incredibly diverse, necessitating the procurement of reference genomes from different geographical regions. Here we present four new P. vivax draft genomes assembled de novo from clinical samples collected in the China-Myanmar border area. We demonstrate comparable length and content to existing genomes, with the majority of structural variation occurring around subtelomeric regions and exported proteins, which we corroborated with detection of copy number variations in these regions. We predicted peptides from all PIR gene subfamilies, except for PIR D. We confirmed that proteins classically labeled as PIR D family members are not identifiable by PIR motifs, and actually bear stronger resemblance to DUF (domain of unknown function) family DUF3671, potentially pointing to a new, closely related gene family. Further, phylogenetic analyses of MSP7 genes showed high variability within the MSP7-B family compared to MSP7-A and -C families, and the result was comparable to that from whole genome analyses. The new genome assemblies serve as a resource for studying P. vivax within the GMS.Entities:
Keywords: China-Myanmar border; Plasmodium vivax; genome assembly; malaria; next-generation sequencing
Year: 2020 PMID: 32849480 PMCID: PMC7432439 DOI: 10.3389/fmicb.2020.01930
Source DB: PubMed Journal: Front Microbiol ISSN: 1664-302X Impact factor: 5.640
FIGURE 1Circos plots for the four P. vivax genome assemblies (A) NB45, (B) LZCH1720, (C) LZCH1886, and (D) LZCH1476. The outermost green circle represents genes on each chromosome. The second ring colors in select gene families wherein PIR genes are blue, DUF3671 genes are teal, MSP7 genes are red, RBPs are yellow, PHIST genes are orange, and STP1 genes are green. The 3rd ring highlights regions which do not share 98% identity with PvP01. The innermost circle represents coverage of reads for the assembly wherein the minimum is 0, the maximum is 100. To provide context, the interval between 90 and 100 is shown in green, and the interval between 0 and 10 is shown in red.
Genome statistics for each of the four P. vivax field samples and two reference genomes.
| Sal-I | P01 | NB45 | LZCH1476 | LZCH1886 | LZCH1720 | |
| Reads sequenced | NA | NA | 15343644 | 12553940 | 12799214 | 12929808 |
| Average depth | 58.51 | 58.04 | 59.23 | 60.07 | ||
| Total genome length (Mbp) | 27.01 | 29.05 | 30.05 | 29.88 | 29.53 | 29.84 |
| Chromosome length (Mbp) | 22.62 | 24.21 | 25.00 | 25.54 | 25.61 | 24.93 |
| Ns/100 kbp | 199.53 | 517.08 | 1034 | 1327.3 | 1365.97 | 1337.33 |
| Number of scaffolds | 2748 | 242 | 139 | 73 | 65 | 94 |
| N50 (bp) | 1678596 | 1761288 | 1743222 | 1783114 | 1780500 | 1920463 |
| % Alignment to P01 | 79.31% | 99.48% | 81.43% | 81.50% | 81.06% | 81.36% |
FIGURE 2Ortholog presence in each of the four China-Myanmar border P. vivax isolates compared to reference PvP01 genome. Each color represents all ortholog groups present in each of 5 different isolates, with the reference P01 assembly being shown in orange.
Gene content of each of 4 genomes chosen for gene analysis compared to references Sal-I and PvP01.
| Annotation | Sal-I | P01* | LZCH1720 | LZCH1886 | NB45 | LZCH1476 |
| Encoded peptides | 5389 | 6677 | 6606 | 6530 | 6655 | 6601 |
| Genes without PvP01 orthologs | 60 | 34* | 712 | 543 | 869 | 548 |
| PIR | 387 | 1181 | 1157 | 1111 | 1204 | 1164 |
| ETRAMP | 10 | 10 | 10 | 10 | 10 | 10 |
| PHIST | 79 | 81 | 77 | 76 | 77 | 76 |
| STP1* | 13 | 10 | 8 | 10 | 16 | 9 |
| RBP* | 10 | 10 | 9 | 9 | 9 | 9 |
| MSP7 | 11 | 11 | 11 | 11 | 11 | 11 |
| Tryptophan-rich antigens | 36 | 40 | 40 | 40 | 40 | 40 |
Count of PIR protein subfamilies within PIR orthologous genes.
| Subfamilies | LZCH1720 | LZCH1886 | NB45 | LZCH1476 |
| PIR A | 32 | 32 | 37 | 32 |
| PIR B | 53 | 47 | 51 | 53 |
| PIR C | 227 | 217 | 225 | 227 |
| PIR D | 0 | 0 | 0 | 0 |
| PIR E | 231 | 225 | 256 | 238 |
| PIR G | 117 | 109 | 130 | 114 |
| PIR H | 51 | 47 | 47 | 51 |
| PIR I | 124 | 122 | 128 | 124 |
| PIR J | 138 | 133 | 143 | 138 |
| PIR K | 154 | 144 | 154 | 154 |
| Unassigned | 38 | 35 | 33 | 38 |
FIGURE 3Maximum-likelihood PIR family structure within two assembled genomes (A) NB45 and (B) LZCH1720. Each color represents a separate subfamily as denoted by the color key on the right.
FIGURE 4Genetic diversity within P. vivax assemblies. (A) Genetic diversity within the entire genome of the four assemblies compared to PvP01 based on SNPs from alignment to Sal-I. (B) Diversity within msp7 genes between different assemblies, Sal-I and PvP01. P. cynomolgi MSP7-A was used as an outgroup. Colors: Black– P01, Blue – Sal-I, Orange—NB45, Red—LZCH1720, Cyan—LZCH1476, Purple—LZCH1886.
Copy Number Variations detected within the 4 isolates compared to PvP01.
| Supporting Samples | Type | Chr. | Start | End | Length |
| LZCH1886, NB45 | Amplification | 4 | 38123 | 124716 | 86593 |
| LZCH1476, LZCH1720, NB45 | Amplification | 9 | 79721 | 135397 | 55676 |
| LZCH1720, NB45 | Amplification | 9 | 79721 | 94710 | 14989 |
| LZCH1720, NB45 | Amplification | 9 | 94119 | 135206 | 41087 |
| LZCH1476, LZCH1720 | Amplification | 14 | 82 | 35455 | 35373 |
| NB45 | Deletion | 1 | 58587 | 62858 | 4271 |
| LZCH1476, LZCH1720, LZCH1886 | Deletion | 2 | 15890 | 19643 | 3753 |
| LZCH1476, LZCH1720, LZCH1886 | Deletion | 2 | 776331 | 785302 | 8971 |
| LZCH1720 | Deletion | 5 | 1010209 | 1013244 | 3035 |
| LZCH1720, LZCH1886 | Deletion | 7 | 1531179 | 1630181 | 99002 |
| LZCH1720 | Deletion | 8 | 1647230 | 1662256 | 15026 |
| LZCH1476, LZCH1720 | Deletion | 9 | 135520 | 190955 | 55435 |
| LZCH1886 | Deletion | 9 | 135520 | 182436 | 46916 |
| NB45 | Deletion | 9 | 141424 | 191395 | 49971 |
| LZCH1476, LZCH1720, LZCH1886 | Deletion | 10 | 91532 | 96146 | 4614 |
| LZCH1720, LZCH1886 | Deletion | 11 | 2054089 | 2057879 | 3790 |
FIGURE 5Structural variation within assembled genomes. (A) Locations of all predicted copy number variations. Each circle represents a different sample in this order: NB45, LZCH1476, LZCH1720, and LZCH1886. Pink wedges represent predicted deletions while blue wedges represent predicted amplifications. Outer ring shows gene density wherein green highlights are individual genes. (B) CNVs on chromosome 2 with wedges representing their location within PvP01 and lines connecting them to homologous regions on respective assemblies when applicable. Coverage support plots were included for the four new assemblies in the second ring. (C) Normalized depth for both discordant and properly paired reads around a chromosome 2 deletion predicted for 3 of the 4 samples. Black lines denote the predicted deletion boundary.