| Literature DB >> 29351302 |
Zurab Bzhalava1, Emilie Hultin1, Joakim Dillner1.
Abstract
When human samples are sequenced, many assembled contigs are "unknown", as conventional alignments find no similarity to known sequences. Hidden Markov models (HMM) exploit the positions of specific nucleotides in protein-encoding codons in various microbes. The algorithm HMMER3 implements HMM using a reference set of sequences encoding viral proteins, "vFam". We used HMMER3 analysis of "unknown" human sample-derived sequences and identified 510 contigs distantly related to viruses (Anelloviridae (n = 1), Baculoviridae (n = 34), Circoviridae (n = 35), Caulimoviridae (n = 3), Closteroviridae (n = 5), Geminiviridae (n = 21), Herpesviridae (n = 10), Iridoviridae (n = 12), Marseillevirus (n = 26), Mimiviridae (n = 80), Phycodnaviridae (n = 165), Poxviridae (n = 23), Retroviridae (n = 6) and 89 contigs related to described viruses not yet assigned to any taxonomic family). In summary, we find that analysis using the HMMER3 algorithm and the "vFam" database greatly extended the detection of viruses in biospecimens from humans.Entities:
Mesh:
Substances:
Year: 2018 PMID: 29351302 PMCID: PMC5774701 DOI: 10.1371/journal.pone.0190938
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Number of contigs classified into different taxonomy groups by blastn and blastx.
| Project ID | Bacteria | Human | Virus | Other |
|---|---|---|---|---|
| 2011_1 | 3134 | 3515 | 251 | 4967 |
| 2011_2 | 36824 | 106648 | 29 | 6689 |
| 2014_1 | 1863 | 9957 | 81 | 2961 |
| 2014_2 | 4670 | 7845 | 348 | 14344 |
| 2014_3 | 1521 | 25100 | 41 | 1805 |
| 2014_4 | 2002 | 52057 | 129 | 3110 |
| 2014_5 | 3801 | 72491 | 561 | 14662 |
| 2014_6 | 1172 | 5591 | 86 | 862 |
| 2014_7 | 826 | 626 | 7 | 398 |
| 2014_8 | 2635 | 58986 | 247 | 19891 |
| 2015_1 | 0 | 66454 | 0 | 0 |
| 2015_2 | 570 | 47123 | 26 | 741 |
| 2015_3 | 3638 | 63553 | 263 | 21602 |
| 2015_4 | 2118 | 103779 | 383 | 28617 |
| 2014_A1 | 989 | 1227 | 11 | 382 |
| 2015_5_LH | 0 | 206 | 0 | 0 |
| 2014_9 | 17975 | 1687633 | 353 | 5233 |
| 2014_14 | 0 | 1586 | 0 | 0 |
| 2014_15_SR | 25299 | 1178299 | 136 | 7612 |
| 2013_1 | 143 | 4792 | 2 | 604 |
| 2013_2 | 183 | 3740 | 1 | 147 |
| 2012_D3 | 0 | 21 | 0 | 0 |
| 2014K1 | 5 | 647 | 6 | 120 |
| 2014_10 | 275 | 458 | 0 | 194 |
| 2014_11 | 6 | 1128 | 0 | 5 |
| 2014_12 | 154 | 31718 | 12 | 31 |
| Total | 109803 | 3535180 | 2973 | 134977 |
Column “Other” includes contigs that were classified as plants, invertebrates, synthetic, etc. We consider these as low quality contigs.
Number of contigs, classified as virus-related by HMM, stratified by related virus family and types of samples.
FFPE: Formalin-fixed paraffin-embedded tissue specimens.
| Realated Family | Mouth | Cervix | Condyloma | Prostate secretions | Skin (FFPE) | Saliva | Serum | Skin (Fresh) |
|---|---|---|---|---|---|---|---|---|
| Anelloviridae | 2 | 0 | 0 | 0 | 0 | 0 | 9 | 0 |
| Baculoviridae | 27 | 0 | 202 | 3 | 0 | 9 | 11 | 4 |
| Caulimoviridae | 1 | 0 | 0 | 0 | 0 | 0 | 10 | 0 |
| Circoviridae | 31 | 0 | 250 | 12 | 4 | 29 | 2 | 1 |
| Closteroviridae | 89 | 0 | 198 | 0 | 1 | 11 | 7 | 46 |
| Geminiviridae | 7 | 0 | 18 | 0 | 0 | 0 | 11 | 0 |
| Herpesviridae | 16 | 0 | 386 | 6 | 0 | 7 | 7 | 16 |
| Iridoviridae | 11005 | 1 | 190 | 62 | 2 | 6 | 109338 | 22191 |
| Marseillevirus | 155 | 2 | 598 | 7 | 1 | 15 | 18 | 26 |
| Mimiviridae | 556 | 0 | 951 | 10 | 315 | 18 | 102 | 102 |
| Phycodnaviridae | 1162 | 24 | 1223 | 59 | 561 | 266 | 4905 | 235 |
| Poxviridae | 67 | 0 | 129 | 0 | 0 | 5 | 6 | 7 |
| Retroviridae | 82 | 0 | 0 | 2 | 0 | 3 | 590 | 109 |
| Unassigned | 11591 | 111 | 871 | 27 | 37383 | 10215 | 7787 | 81 |
| Total | 24791 | 138 | 5016 | 188 | 38267 | 10584 | 122803 | 22818 |
Fig 1Maximum likelihood phylogenetic tree (PhyML v3.0 www.atgc-montpellier.fr/phyml/) based on the RCR Rep proteins from genbank and 21 previously not described Rep proteins related to Circoviridae, that were found in the present study (shown in black color with the prefix SE).
Number of different contigs detected, by the most related virus families identified using HMM and by existence of typical protein sequence motifs.
| Tetratricopeptide repeat | Leucine Rich Repeat | Helix-turn-helix | Ankyrin repeat | Methyltransferase domain | Helicases | Rep-like domain | FtsK/SpoIIIE family | Reverse transcriptase | Satellite tobacco necrosis virus coat protein | |
|---|---|---|---|---|---|---|---|---|---|---|
| Baculoviridae | 0 | 0 | 0 | 0 | 0 | 9 | 0 | 0 | 0 | 0 |
| Caulimoviridae | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| Circoviridae | 0 | 0 | 7 | 0 | 0 | 0 | 21 | 6 | 1 | 0 |
| Closteroviridae | 0 | 0 | 9 | 0 | 0 | 1 | 0 | 0 | 0 | 0 |
| Geminiviridae | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 9 |
| Herpesviridae | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| Iridoviridae | 15 | 0 | 0 | 0 | 0 | 6 | 0 | 0 | 3 | 0 |
| Malacoherpesviridae | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| Marseillevirus | 9 | 0 | 5 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| Mimiviridae | 0 | 53 | 20 | 27 | 10 | 6 | 0 | 1 | 0 | 0 |
| Phycodnaviridae | 15 | 0 | 38 | 10 | 16 | 18 | 0 | 1 | 0 | 0 |
| Poxviridae | 0 | 43 | 1 | 0 | 3 | 2 | 0 | 0 | 0 | 0 |
| Retroviridae | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 |
| Unassigned | 0 | 0 | 5 | 25 | 4 | 3 | 0 | 0 | 0 | 21 |