| Literature DB >> 28351419 |
Antoine Dara1, Elliott F Drábek2, Mark A Travassos1, Kara A Moser2, Arthur L Delcher2, Qi Su2, Timothy Hostelley2, Drissa Coulibaly3, Modibo Daou3, Ahmadou Dembele3, Issa Diarra3, Abdoulaye K Kone3, Bourema Kouriba3, Matthew B Laurens1, Amadou Niangaly3, Karim Traore3, Youssouf Tolo3, Claire M Fraser2,4,5, Mahamadou A Thera3, Abdoulaye A Djimde3, Ogobara K Doumbo3, Christopher V Plowe1, Joana C Silva6,7.
Abstract
BACKGROUND: Encoded by the var gene family, highly variable Plasmodium falciparum erythrocyte membrane protein-1 (PfEMP1) proteins mediate tissue-specific cytoadherence of infected erythrocytes, resulting in immune evasion and severe malaria disease. Sequencing and assembling the 40-60 var gene complement for individual infections has been notoriously difficult, impeding molecular epidemiological studies and the assessment of particular var elements as subunit vaccine candidates.Entities:
Keywords: ETHA; Malaria; Mali; PfEMP1; Plasmodium falciparum; Plasmodium falciparum erythrocyte membrane protein-1; var; var2csa
Mesh:
Substances:
Year: 2017 PMID: 28351419 PMCID: PMC5368897 DOI: 10.1186/s13073-017-0422-4
Source DB: PubMed Journal: Genome Med ISSN: 1756-994X Impact factor: 11.117
Fig. 1Whole-genome assembly contigs aligned against the 3D7 Reference. Contigs from Sample 303_1 (blue) aligned to chromosome 12 of 3D7 (black); var exon 1 sequences extracted from 303_1 (see “Methods”) and previously identified var exon 1 sequences from 3D7 are labeled as red dots on the contigs and chromosome. Exon 1 sequences extracted from the whole-genome assembly of 303_1 are found in chromosome regions similar to where 3D7 exon 1 sequences have been found. While some are found towards the center of the chromosome, a portion are also found on the ends, where complicated repeat regions and multi-gene families cause assembly issues and assembly contigs are seen to pile up in this area. Similar layouts for all 14 chromosomes for each of the 12 Malian clinical samples can be found in the Additional files
Fig. 2Number and length of var exon 1 sequences. a Complete exon 1 sequence extracted from whole-genome assemblies using standard bioinformatic methods, requiring a minimum sequence length of 2 Kb. b Complete exon 1 sequences reconstructed from whole-genome assemblies by ETHA. Distributions of var exon 1 lengths in each sample are represented by a box-and-whiskers plot: the median is indicated by a dark line, the first and third quartiles by the boundaries of the box, and the minimum and maximum, excluding outliers, by the whiskers. Outliers are those points lying beyond the first or third quartile by more than 1.5 times the interquartile range. Clinical samples are colored by clonality (polyclonal: orange, monoclonal: blue). 3D7 is shown in gray. The number of exon 1 sequences per sample, estimated from the counts of extracted exon 1 sequences are listed on the bottom (see “Methods”)
Fig. 3Overview of the ETHA algorithm for reconstructing exon 1 sequences: Illumina and PacBio sequencing data are both used in conjunction with previously characterized exon 1 sequences from VarDom [13] as data inputs for reconstructing exon 1 sequences in clinical whole-genome assemblies. Pacbio data are assembled and exon 1 ends are identified by mapping known exon 1 sequences from VarDom onto the assembly (steps 1 and 2). Illumina data corresponding to var genes are identified by finding 71 bp segments (71mers) containing var splice site sequences at the end of exon 1 in the assembly and iteratively following possible continuations (new trusted 71mers overlapping previously identified var 71mers by 70 bp) within the Illumina data (steps 3 and 4). This process is extended until a start methionine is reached (step 4). This k-mer walk is repeated in the opposite direction, now from the start methionine to the intron. They are then assembled by generating all possible paths within the de Bruijn graph of 71mers (step 5) and reconciled with the whole-genome assembly by choosing those paths which align best with the whole-genome assembly (steps 6–8). Data inputs in white; processes are in gray. See “Methods” for additional details
Fig. 4Subfamily composition of recovered ETHA exon 1 sequences: relative frequencies of each UPS class (UPSB in red, UPSA in green, UPSC in orange, and UPSE in purple) for each of the 12 Malian samples, compared to the frequencies found by Rask et al. [13]
Fig. 5Relative distribution of constitutive domains. Relative domain frequencies compared to the reference 3D7
Fig. 6Sequence similarity among var2csa elements. a PCA of reconstructed and known var2csa exon 1 amino acid sequences: 19 var2csa sequences reconstructed from the 12 Malian samples were aligned to VarDom var2csa sequences and Euclidian distance matrix generated from the multiple alignment was used to plot the PCA. The sequences are colored based on their origin. 3D7, FCR3/IT4 (Gambia), HB3 (Honduras), Dd2 (Indochina), PFCLIN (Ghana), IGH (India), RAJ116 (India), AAQ73930 (Malayan Camp), and NHP (Plasmodium reichenowi, a Non-Human Primate parasite var2csa). b Neighbor net of the reconstructed var2csa exon 1 nucleotide sequences based on uncorrected p-distances. The network is inferred from the 19 var2csa sequences reconstructed from the 12 Malian samples and 11 var2csa sequences from the VarDom database multiple alignments. The sequences are color-coded based on their origin as in Fig. 6a