| Literature DB >> 30124794 |
Michael Vilsker1, Yumna Moosa2, Sam Nooij3, Vagner Fonseca2,4,5, Yoika Ghysens1, Korneel Dumon1, Raf Pauwels1, Luiz Carlos Alcantara4,5,6, Ewout Vanden Eynden7, Anne-Mieke Vandamme7,8, Koen Deforche1, Tulio de Oliveira2.
Abstract
SUMMARY: Genome Detective is an easy to use web-based software application that assembles the genomes of viruses quickly and accurately. The application uses a novel alignment method that constructs genomes by reference-based linking of de novo contigs by combining amino-acids and nucleotide scores. The software was optimized using synthetic datasets to represent the great diversity of virus genomes. The application was then validated with next generation sequencing data of hundreds of viruses. User time is minimal and it is limited to the time required to upload the data.Entities:
Mesh:
Year: 2019 PMID: 30124794 PMCID: PMC6524403 DOI: 10.1093/bioinformatics/bty695
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Validation datasets
| Publication | PMID | Description | Number of datasets | Expected number of viruses | Assigned number of viruses | Average reconstructed genome size (%) | Number of additional viruses |
|---|---|---|---|---|---|---|---|
| 1 | 26 559 140 | Synthetic virome | 8 | 64 | 57 | 92 | 9 |
| 2 | Pending (bioRxiv) | Single virus—HIV | 14 | 13 | 13 | 93 | 1 |
| Unpublished | PRJNA434 385 (SRA) | Single virus—HIV | 94 | 94 | 94 | 95 | 15 |
| 3 | 25 609 811 | Single virus—RSV | 12 | 12 | 12 | 98 | 1 |
| 4 | 25 056 894 | Single virus—norovirus | 12 | 12 | 12 | 99 | 7 |
| 5 | 26 071 329 | Single virus—influenza | 10 | 10 | 10 (80 segments) | 94 | 26 |
| 6 | 24 055 451 | Single virus—MERS | 14 | 14 | 14 | 94 | 0 |
| 7 | 28 748 110 | Metagenomic—pig fecal | 20 | 20 | 20 (220 segments) | 90 | 143 |
| 8 | 24 695 106 | Metagenomic—human fecal | 20 | 66 | 25 | 83 | 35 |
Note: For the validation of Genome Detective (GD) we used 204 datasets from seven studies. This table lists the PMID of the publications, a description of the data, number of datasets, number of viruses originally identified, number of viruses for which GD reconstructed whole genomes (i.e. >80% of the whole genome ) and number of viruses that GD additionally detected (i.e. <80% of the whole genome ). Detailed information such as (SRA files list and full results are seem in Supplementary Material).