| Literature DB >> 34991452 |
Albert E Zhou1, Zalak V Shah1, Katie R Bradwell2, James B Munro2, Andrea A Berry1, David Serre2, Shannon Takala-Harrison1, Timothy D O'Connor2,3, Joana C Silva2,4, Mark A Travassos5.
Abstract
BACKGROUND: RIFINs and STEVORs are variant surface antigens expressed by P. falciparum that play roles in severe malaria pathogenesis and immune evasion. These two highly diverse multigene families feature multiple paralogs, making their classification challenging using traditional bioinformatic methods.Entities:
Keywords: Bioinformatics; Hidden Markov models; Malaria; Plasmodium falciparum; RIFIN; STEVOR
Mesh:
Substances:
Year: 2022 PMID: 34991452 PMCID: PMC8733436 DOI: 10.1186/s12859-021-04515-8
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1General structure of RIFINs and STEVORs. RIFINs and STEVORs are expressed on the surface of an erythrocyte infected with P. falciparum. Protein domains are illustrated as green (signal peptide), grey (variable domains), red (transmembrane domains), blue (25 amino acid insertion), and orange and purple (semi-conserved domains). There are approximately 160 rif genes in the 3D7 reference genome, separated into two subtypes, RIFIN-A and RIFIN-B, depending on sequence and subcellular localization. The FHEYDER motif (in blue) is present in the semi-conserved domain of 36 RIFIN-As in the 3D7 reference strain. STEVORs encompass ~ 30 genes per genome and are structurally similar to RIFIN-Bs
Comparison of STRIDE to PFAM and TIGRFAM, using the same parameter values
| Dataset | STRIDE | PFAM | TIGRFAM | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| RIFIN-A | RIFIN-B | RIFIN | Likely RIFIN | # FHEYDER | STEVOR | RIFIN | STEVOR | RIFIN | STEVOR | |||||
(185 | 182 | 122 | 52 | 4 | 4 | 36 | 38 | 178 | 38 | 177 | 38 | |||
(463 | 452 | 250 | 158 | 24 | 20 | 59 | 62 | 416 | 61 | 436 | 62 | |||
| Validation Dataset (3888 sequences) | 2844 | 1409 | 741 | 318 | 376 | 605 | 696 | 2014 | 693 | 2694 | 700 | |||
| 0 | 0 | 0 | 0 | 0 | – | 0 | 0 | 0 | 0 | 0 | ||||
∑ represents the total number of sequences. All three tools had 100% specificity
Fig. 2Stacked bar graphs of the sequence distribution from all available P. falciparum genomes from PlasmoDB v45 at the conclusion of training each HMM profile. A total of 3536 RIFIN and STEVOR sequences were downloaded from PlasmoDB (Release 45; August 28, 2019). Redundant sequences were clustered with CD-HIT v4.6. HMM (Hidden Markov Model) profiles specific for RIFIN-A, RIFIN-B, and STEVOR proteins were created and iteratively trained against subsets of sequences that were not present in the initial seeding. 967 RIFIN-A, 495 RIFIN-B, and 229 STEVOR sequences comprised the final datasets, providing representation of sequences from all genomes. The Malian (ML01) and Togo (TG01) strains were polyclonal and had overall higher numbers of representative sequences. Of the total of 228 RIFINs and STEVORs annotated in the 3D7 reference genome, STRIDE used 122 3D7 sequences
Fig. 3Depicting relationships of HMM Scores. Whole sequence scores are represented in blue and HMM domain scores are represented in red. Sets of sequences excluded from the creation of each HMM profile were used to define the limits of detection, represented by a grey line. Datasets of 404 RIFIN-A, 476 RIFIN-B, and 40 STEVOR sequences excluded from the HMM training were used to test and define the limits of detection for each profile. All RIFIN-A and -B sequences had low concordance to the STEVOR profile, failing to meet its threshold score of 145. The 404 RIFIN-A sequences had whole sequence (blue) and domain (red) scores that exceeded the thresholds for the RIFIN-A profile. In contrast, none of the 404 RIFIN-A sequences met classification criteria for RIFIN-Bs, as their domain scores (red) were below the threshold score of 250. The Y-axis represents HMM scores, and the X-axis represents the ordered numerical label for each sequence
Depicting the sensitivity and specificity analyses of STRIDE compared to PFAM# and TIGRFAM# using 3D7^
| RIFIN (185) | Non-RIFIN | Predictive value | STEVOR (43) | Non-STEVOR | Predictive value | |||
|---|---|---|---|---|---|---|---|---|
| STRIDE | Called + | 182 | 0 | 100% | Called + | 38 | 0 | 100% |
| Called − | 3 | 5363 | 100% | Called − | 5 | 5505 | 99.9% | |
| 100% | 100% | |||||||
| PFAM | Called + | 178 | 0 | 99% | Called + | 38 | 0 | 100% |
| Called − | 7 | 5363 | 99.8% | Called − | 5 | 5505 | 99.9% | |
| 99.9% | 100% | |||||||
| TIGRFAM | Called + | 177 | 0 | 100% | Called + | 38 | 0 | 100% |
| Called − | 8 | 5363 | 99.8% | Called − | 5 | 5505 | 99.9% | |
| 100% | 100% |
#We made comparisons across tools using the same parameters as STRIDE
^The curated 3D7 reference genome served as a gold standard. There are a total of 5548 sequences in the P. falciparum 3D7 reference, where 182 sequences are annotated as RIFINs and 43 sequences are annotated as STEVORs (includes 27 RIFIN and 10 STEVOR pseudogenes). Unlike PFAM and TIGRFAM, STRIDE was not trained with the entire 3D7 RIFIN and STEVOR repertoire (Fig. 2). The bolded text illustrates the sensitivity of each program; all three tools had 100% specificity