| Literature DB >> 35850574 |
John Juma1,2, Vagner Fonseca3,4,5,6, Samson L Konongoi1,7, Peter van Heusden2, Kristina Roesel1, Rosemary Sang7, Bernard Bett1, Alan Christoffels2, Tulio de Oliveira3,4,8,9, Samuel O Oyola10.
Abstract
Genetic evolution of Rift Valley fever virus (RVFV) in Africa has been shaped mainly by environmental changes such as abnormal rainfall patterns and climate change that has occurred over the last few decades. These gradual environmental changes are believed to have effected gene migration from macro (geographical) to micro (reassortment) levels. Presently, 15 lineages of RVFV have been identified to be circulating within the Sub-Saharan Africa. International trade in livestock and movement of mosquitoes are thought to be responsible for the outbreaks occurring outside endemic or enzootic regions. Virus spillover events contribute to outbreaks as was demonstrated by the largest epidemic of 1977 in Egypt. Genomic surveillance of the virus evolution is crucial in developing intervention strategies. Therefore, we have developed a computational tool for rapidly classifying and assigning lineages of the RVFV isolates. The computational method is presented both as a command line tool and a web application hosted at https://www.genomedetective.com/app/typingtool/rvfv/ . Validation of the tool has been performed on a large dataset using glycoprotein gene (Gn) and whole genome sequences of the Large (L), Medium (M) and Small (S) segments of the RVFV retrieved from the National Center for Biotechnology Information (NCBI) GenBank database. Using the Gn nucleotide sequences, the RVFV typing tool was able to correctly classify all 234 RVFV sequences at species level with 100% specificity, sensitivity and accuracy. All the sequences in lineages A (n = 10), B (n = 1), C (n = 88), D (n = 1), E (n = 3), F (n = 2), G (n = 2), H (n = 105), I (n = 2), J (n = 1), K (n = 4), L (n = 8), M (n = 1), N (n = 5) and O (n = 1) were also correctly classified at phylogenetic level. Lineage assignment using whole RVFV genome sequences (L, M and S-segments) did not achieve 100% specificity, sensitivity and accuracy for all the sequences analyzed. We further tested our tool using genomic data that we generated by sequencing 5 samples collected following a recent RVF outbreak in Kenya. All the 5 samples were assigned lineage C by both the partial (Gn) and whole genome sequence classifiers. The tool is useful in tracing the origin of outbreaks and supporting surveillance efforts.Availability: https://github.com/ajodeh-juma/rvfvtyping.Entities:
Keywords: Genomic surveillance; Genotyping; Glycoprotein Gn; L-segment; Lineage; M-segment; RVFV, Rift Valley fever virus; S-segment; Sequencing
Mesh:
Year: 2022 PMID: 35850574 PMCID: PMC9295512 DOI: 10.1186/s12864-022-08764-6
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 4.547
Fig. 1Schematic representation of the command line workflow. The workflow begins with virus classification using DIAMOND and reports the output as a text file with taxonomic information and similarity metrics. Phylogenetic analysis is performed using a default phylogenetic reference dataset generated by Neighbor-Joining (NJ), Maximum likelihood (ML) and Bayesian tree. Users can specify which phylogenetic reference dataset to use. Query sequences are aligned to the reference dataset multiple sequence alignment with MAFFT, and a ML phylogenetic tree is constructed followed by lineage assignment. An output file with the lineage assignment, bootstrap values and likelihood test ratio is generated in comma-separated values (CSV) file format
Fig. 2Screenshot of the web interface for RVFV typing tool. (A) The web interface offers a portal for users to perform classification and visualize the results. The typing report provides information on the sequence name of the query sequence, the nucleotide length of the sequence, an illustration of the position in the virus’ genomic segment, the species assignment and the genotype assignment. A detailed report (B) is provided for the phylogenetic analysis that resulted into this classification. All results can be exported to a variety of file formats (XML, CSV, Excel or FASTA format). The detailed HTML report (C) contains information on the sequence name, length, assigned virus and genotype, an illustration (D) of the position of the sequence in the virus’ genomic segment and the phylogenetic analysis section. The alignment section shows the alignment and constructed phylogenetic tree
Fig. 4Distribution of RVFV lineages in Africa and Middle East. A Lineages reported in Africa and the Middle East (Saudi Arabia) sampled between 1944 to 2016. B Map of Africa and Saudi Arabia indicating the number RVFV sequences for the M-segment (partial and complete) as of 28th May 2021 for the 129 sequences used in the lineage assignment. C Maximum likelihood phylogenetic tree using glycoprotein (Gn) representative sequences (n = 51) showing geographical distribution of lineages. The tips of the tree are colored according to their country of origin. CAR, Central African Republic
RVFV Lineage defining single nucleotide polymorphisms (SNPs) in Glycoprotein (Gn) gene. For each lineage sequences, SNPs were identified in comparison to the reference (strain ZH-548). Since the reference strain falls within lineage A, there were no observed SNPs in the category
| Lineage | SNPs | Total |
|---|---|---|
| A | 1 | |
| B | 830GA;1103TC;1142TC;1304GA | 4 |
| C | 836TA;926GA;1103TC;1163CT;1190TC;1241AG | 6 |
| D | 839TC;926GA;1103TC;1142TC;1163CT;1195GA | 6 |
| E | 854TA;926GA;1103TC;1142TC;1163CT;1166AG | 6 |
| F | 816AG;902GA;926GA;1079GA;1103TC;1106GA;1142TC;1163CT;1253GA | 9 |
| G | 926GA;1103TC;1142TC;1163CT | 4 |
| H | 920AG;926GA;1103TC;1142TC;1157AG;1163CT;1169AT | 7 |
| I | 833CT;920AG;986CT;998TC;1049GA;1103TC;1115GA;1142TC;1163CT;1304GA | 10 |
| J | 836TC;860CT;920AG;926GA;953AG;995GA;1007CA;1055TC;1115GA;1142TC;1154GA;1160GA;1161TC;1163CT;1190TC;1250TC | 16 |
| K | 894CT;1091TC;1115GA;1142TC;1250TC | 5 |
| L | 842GA;866CT;917CT;920AG;926GA;1103TC;1115GA;1122CT;1124AG;1142TC;1163CT;1190TC;1250TC;1274AT;1304GA | 15 |
| M | 857GA;894CT;920AG;924TC;926GA;992GT;1103TC;1115GA;1142TC;1151TC;1163CT;1250TC;1304GA | 13 |
| N | 920AG;926GA;1103TC;1112GA;1115GA;1142TC;1163CT;1187GA;1304GA | 9 |
| O | 920AG;926GA;1103TC;1106GA;1115GA;1142TC;1163CT;1205AG;1243AG;1250TC;1304GA | 11 |
Fig. 3Phylogenetic analysis using Gn and whole genome (L, M & S) segment classifiers. A-D Maximum likelihood (ML) phylogenetic trees inferred from the representative sequences for all lineages within the (A) 51 sequences of the glycoprotein (490 bp) gene aligned with MAFFT and ML tree inferred under the GTR + I + G substitution model, (B) 47 sequences of the Small (S) segment (1690 bp), (C) 47 sequences of the Medium (M) segment (3885 bp) and (D) 47 sequences of the Large (L) segment (6404 bp). All the trees show similar topology for all the lineages
Validation/testing of the RVFV Typing tool to classify partial and whole genome sequences (n = 128) using glycoprotein sequences. The classification results were compared to manual phylogenetic analysis. Abbreviations as used in this table: TP True Positives, TN True Negatives, FP False Positives, FN False Negatives, TPR True Positive Rate, FPR False Positive Rate, ACC Accuracy
| Lineage | Known | TP | TN | FP | FN | TPR | FPR | ACC |
|---|---|---|---|---|---|---|---|---|
| A | 13 | 13 | 115 | 0 | 0 | 100.0 | 0.0 | 100.0 |
| B | 1 | 1 | 127 | 0 | 0 | 100.0 | 0.0 | 100.0 |
| C | 44 | 44 | 84 | 0 | 0 | 100.0 | 0.0 | 100.0 |
| D | 1 | 1 | 127 | 0 | 0 | 100.0 | 0.0 | 100.0 |
| E | 7 | 7 | 121 | 0 | 0 | 100.0 | 0.0 | 100.0 |
| F | 1 | 1 | 127 | 0 | 0 | 100.0 | 0.0 | 100.0 |
| G | 8 | 8 | 120 | 0 | 0 | 100.0 | 0.0 | 100.0 |
| H | 12 | 12 | 116 | 0 | 0 | 100.0 | 0.0 | 100.0 |
| I | 2 | 2 | 126 | 0 | 0 | 100.0 | 0.0 | 100.0 |
| J | 1 | 1 | 127 | 0 | 0 | 100.0 | 0.0 | 100.0 |
| K | 11 | 11 | 117 | 0 | 0 | 100.0 | 0.0 | 100.0 |
| L | 10 | 10 | 118 | 0 | 0 | 100.0 | 0.0 | 100.0 |
| M | 2 | 2 | 126 | 0 | 0 | 100.0 | 0.0 | 100.0 |
| N | 13 | 13 | 115 | 0 | 0 | 100.0 | 0.0 | 100.0 |
| O | 2 | 2 | 126 | 0 | 0 | 100.0 | 0.0 | 100.0 |
Validation/testing of the RVFV Typing tool to classify whole genome sequences (n = 234) using complete L-segment sequences. The classification results were compared to manual phylogenetic analysis. Abbreviations as used in this table: TP True Positives, TN True Negatives, FP False Positives, FN False Negatives, TPR True Positive Rate, FPR False Positive Rate, ACC Accuracy
| Lineage | Known | TP | TN | FP | FN | TPR | FPR | ACC |
|---|---|---|---|---|---|---|---|---|
| A | 10 | 11.0 | 223.0 | 1.0 | 0.0 | 100.0 | 0.45 | 99.57 |
| B | 1 | 1.0 | 233.0 | 0.0 | 0.0 | 100.0 | 0.0 | 100.0 |
| C | 88 | 93.0 | 141.0 | 5.0 | 0.0 | 100.0 | 3.42 | 97.91 |
| D | 1 | 0.0 | 234.0 | 0.0 | 1.0 | 0.0 | 0.0 | 99.57 |
| E | 3 | 3.0 | 231.0 | 0.0 | 0.0 | 100.0 | 0.0 | 100.0 |
| F | 2 | 1.0 | 233.0 | 0.0 | 1.0 | 50.0 | 0.0 | 99.57 |
| G | 2 | 2.0 | 232.0 | 0.0 | 0.0 | 100.0 | 0.0 | 100.0 |
| H | 105 | 99.0 | 135.0 | 0.0 | 6.0 | 94.29 | 0.0 | 97.5 |
| I | 2 | 3.0 | 231.0 | 1.0 | 0.0 | 100.0 | 0.43 | 99.57 |
| J | 1 | 0.0 | 234.0 | 0.0 | 1.0 | 0.0 | 0.0 | 99.57 |
| K | 4 | 6.0 | 228.0 | 2.0 | 0.0 | 100.0 | 0.87 | 99.15 |
| L | 8 | 10.0 | 224.0 | 2.0 | 0.0 | 100.0 | 0.88 | 99.15 |
| M | 1 | 0.0 | 234.0 | 0.0 | 1.0 | 0.0 | 0.0 | 99.57 |
| N | 5 | 5.0 | 229.0 | 0.0 | 0.0 | 100.0 | 0.0 | 100.0 |
| O | 1 | 0.0 | 234.0 | 0.0 | 1.0 | 0.0 | 0.0 | 99.57 |
Validation/testing of the RVFV Typing tool to classify whole genome sequences (n = 234) using complete M-segment representative sequences. The classification results were compared to manual phylogenetic analysis. Abbreviations as used in this table: TP True Positives, TN True Negatives, FP False Positives, FN False Negatives, TPR True Positive Rate, FPR False Positive Rate, ACC Accuracy
| Lineage | Known | TP | TN | FP | FN | TPR | FPR | ACC |
|---|---|---|---|---|---|---|---|---|
| A | 10 | 12.0 | 222.0 | 2.0 | 0.0 | 100.0 | 0.89 | 99.15 |
| B | 1 | 0.0 | 234.0 | 0.0 | 1.0 | 0.0 | 0.0 | 99.57 |
| C | 88 | 89.0 | 145.0 | 1.0 | 0.0 | 100.0 | 0.68 | 99.57 |
| D | 1 | 0.0 | 234.0 | 0.0 | 1.0 | 0.0 | 0.0 | 99.57 |
| E | 3 | 3.0 | 231.0 | 0.0 | 0.0 | 100.0 | 0.0 | 100.0 |
| F | 2 | 4.0 | 230.0 | 2.0 | 0.0 | 100.0 | 0.86 | 99.15 |
| G | 2 | 2.0 | 232.0 | 0.0 | 0.0 | 100.0 | 0.0 | 100.0 |
| H | 105 | 102.0 | 132.0 | 0.0 | 3.0 | 97.14 | 0.0 | 98.73 |
| I | 2 | 4.0 | 230.0 | 2.0 | 0.0 | 100.0 | 0.86 | 99.15 |
| J | 1 | 0.0 | 234.0 | 0.0 | 1.0 | 0.0 | 0.0 | 99.57 |
| K | 4 | 5.0 | 229.0 | 1.0 | 0.0 | 100.0 | 0.43 | 99.57 |
| L | 8 | 8.0 | 226.0 | 0.0 | 0.0 | 100.0 | 0.0 | 100.0 |
| M | 1 | 0.0 | 234.0 | 0.0 | 1.0 | 0.0 | 0.0 | 99.57 |
| N | 5 | 5.0 | 229.0 | 0.0 | 0.0 | 100.0 | 0.0 | 100.0 |
| O | 1 | 0.0 | 234.0 | 0.0 | 1.0 | 0.0 | 0.0 | 99.57 |
Validation/testing of the RVFV Typing tool to classify whole genome sequences (n = 234) using complete S-segment sequences. The classification results were compared to manual phylogenetic analysis. Abbreviations as used in this table: TP True Positives, TN True Negatives, FP False Positives, FN False Negatives, TPR True Positive Rate, FPR False Positive Rate, ACC Accuracy
| Lineage | Known | TP | TN | FP | FN | TPR | FPR | ACC |
|---|---|---|---|---|---|---|---|---|
| A | 10 | 11.0 | 223.0 | 1.0 | 0.0 | 100.0 | 0.45 | 99.57 |
| B | 1 | 1.0 | 233.0 | 0.0 | 0.0 | 100.0 | 0.0 | 100.0 |
| C | 88 | 88.0 | 146.0 | 0.0 | 0.0 | 100.0 | 0.0 | 100.0 |
| D | 1 | 1.0 | 233.0 | 0.0 | 0.0 | 100.0 | 0.0 | 100.0 |
| E | 3 | 5.0 | 229.0 | 2.0 | 0.0 | 100.0 | 0.87 | 99.15 |
| F | 2 | 2.0 | 232.0 | 0.0 | 0.0 | 100.0 | 0.0 | 100.0 |
| G | 2 | 0.0 | 234.0 | 0.0 | 2.0 | 0.0 | 0.0 | 99.15 |
| H | 105 | 103.0 | 131.0 | 0.0 | 2.0 | 98.1 | 0.0 | 99.15 |
| I | 2 | 2.0 | 232.0 | 0.0 | 0.0 | 100.0 | 0.0 | 100.0 |
| J | 1 | 1.0 | 233.0 | 0.0 | 0.0 | 100.0 | 0.0 | 100.0 |
| K | 4 | 5.0 | 229.0 | 1.0 | 0.0 | 100.0 | 0.43 | 99.57 |
| L | 8 | 7.0 | 227.0 | 0.0 | 1.0 | 87.5 | 0.0 | 99.57 |
| M | 1 | 1.0 | 233.0 | 0.0 | 0.0 | 100.0 | 0.0 | 100.0 |
| N | 5 | 5.0 | 229.0 | 0.0 | 0.0 | 100.0 | 0.0 | 100.0 |
| O | 1 | 2.0 | 232.0 | 1.0 | 0.0 | 100.0 | 0.43 | 99.57 |
Validation/testing of the RVFV Typing tool to classify whole genome sequences (n = 234) using partial glycoprotein representative sequences. The classification results were compared to manual phylogenetic analysis. Abbreviations as used in this table: TP True Positives, TN True Negatives, FP False Positives, FN False Negatives, TPR True Positive Rate, FPR False Positive Rate, ACC Accuracy
| Lineage | Known | TP | TN | FP | FN | TPR | FPR | ACC |
|---|---|---|---|---|---|---|---|---|
| A | 10 | 10 | 224 | 0 | 0 | 100.0 | 0.0 | 100.0 |
| B | 1 | 1 | 233 | 0 | 0 | 100.0 | 0.0 | 100.0 |
| C | 88 | 88 | 146 | 0 | 0 | 100.0 | 0.0 | 100.0 |
| D | 1 | 1 | 233 | 0 | 0 | 100.0 | 0.0 | 100.0 |
| E | 3 | 3 | 231 | 0 | 0 | 100.0 | 0.0 | 100.0 |
| F | 2 | 2 | 232 | 0 | 0 | 100.0 | 0.0 | 100.0 |
| G | 2 | 2 | 232 | 0 | 0 | 100.0 | 0.0 | 100.0 |
| H | 105 | 105 | 129 | 0 | 0 | 100.0 | 0.0 | 100.0 |
| I | 2 | 2 | 232 | 0 | 0 | 100.0 | 0.0 | 100.0 |
| J | 1 | 1 | 233 | 0 | 0 | 100.0 | 0.0 | 100.0 |
| K | 4 | 4 | 230 | 0 | 0 | 100.0 | 0.0 | 100.0 |
| L | 8 | 8 | 226 | 0 | 0 | 100.0 | 0.0 | 100.0 |
| M | 1 | 1 | 233 | 0 | 0 | 100.0 | 0.0 | 100.0 |
| N | 5 | 5 | 229 | 0 | 0 | 100.0 | 0.0 | 100.0 |
| O | 1 | 1 | 233 | 0 | 0 | 100.0 | 0.0 | 100.0 |
RVFV Typing tool lineage assignment analysis. Tabular results of the phylogenetic lineage assignment analysis of query sequences. The following terminologies are used: Query, sequence identifier/header in the FASTA file; Lineage, assigned/identified lineage of the query sequence; Bootstrap, ultrafast bootstrap approximation support value; Length, length of the nucleotide sequence; Year_first; Year when the lineage was first reported; Year_last: Year when the lineage was last reported, Countries: Countries where the identified lineage have also been reported
| Query | Lineage | Bootstrap | Length | Year_first | Year_last | Countries |
|---|---|---|---|---|---|---|
| DVS-372 | C | 98 | 3885 | 1976 | 2016 | South Africa; Mauritania; Zimbabwe; Uganda; Somalia; Angola; Madagascar; Sudan; Saudi Arabia; Kenya |
| DVS-333 | C | 97 | 3885 | 1976 | 2016 | South Africa; Mauritania; Zimbabwe; Uganda; Somalia; Angola; Madagascar; Sudan; Saudi Arabia; Kenya |
| DVS-356 | C | 91 | 3885 | 1976 | 2016 | South Africa; Mauritania; Zimbabwe; Uganda; Somalia; Angola; Madagascar; Sudan; Saudi Arabia; Kenya |
| DVS-321 | C | 93 | 3885 | 1976 | 2016 | South Africa; Mauritania; Zimbabwe; Uganda; Somalia; Angola; Madagascar; Sudan; Saudi Arabia; Kenya |
| DVS-230 | C | 96 | 3885 | 1976 | 2016 | South Africa; Mauritania; Zimbabwe; Uganda; Somalia; Angola; Madagascar; Sudan; Saudi Arabia; Kenya |