| Literature DB >> 27479078 |
Martin Norling1, Oskar E Karlsson-Lindsjö2,3,4, Hadrien Gourlé2, Erik Bongcam-Rudloff2, Juliette Hayer2.
Abstract
Metagenomics, the sequence characterization of all genomes within a sample, is widely used as a virus discovery tool as well as a tool to study viral diversity of animals. Metagenomics can be considered to have three main steps; sample collection and preparation, sequencing and finally bioinformatics. Bioinformatic analysis of metagenomic datasets is in itself a complex process, involving few standardized methodologies, thereby hampering comparison of metagenomics studies between research groups. In this publication the new bioinformatics framework MetLab is presented, aimed at providing scientists with an integrated tool for experimental design and analysis of viral metagenomes. MetLab provides support in designing the metagenomics experiment by estimating the sequencing depth needed for the complete coverage of a species. This is achieved by applying a methodology to calculate the probability of coverage using an adaptation of Stevens' theorem. It also provides scientists with several pipelines aimed at simplifying the analysis of viral metagenomes, including; quality control, assembly and taxonomic binning. We also implement a tool for simulating metagenomics datasets from several sequencing platforms. The overall aim is to provide virologists with an easy to use tool for designing, simulating and analyzing viral metagenomes. The results presented here include a benchmark towards other existing software, with emphasis on detection of viruses as well as speed of applications. This is packaged, as comprehensive software, readily available for Linux and OSX users at https://github.com/norling/metlab.Entities:
Mesh:
Year: 2016 PMID: 27479078 PMCID: PMC4968819 DOI: 10.1371/journal.pone.0160334
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Statistics of the simulated reads: quality filtering and de novo assembly.
| NGS profile | IonProton | IonTorrent200 | IonTorrent400 | IonTorrent | |||||
|---|---|---|---|---|---|---|---|---|---|
| Species Distribution (200 viruses) | |||||||||
| Prinseq quality filtering | 15,399,727 | 14,553,370 | 591,020 | 610,006 | 411,304 | 462,169 | 2,521,607 | 2,623,306 | |
| 144.64 | 144.65 | 244.1 | 243.89 | 325.19 | 325.52 | 198.88 | 198.71 | ||
| 84.64% | 85.73% | 88.49% | 88.45% | 83.88% | 83.86% | 85.87% | 85.89% | ||
| 153.68 | 152.92 | 237.54 | 237.48 | 336.88 | 336.97 | 215.14 | 215.14 | ||
| Ray | 2,455 | 3,521 | 1,953 | 7,533 | 2,659 | 8,889 | 1,111 | 3,075 | |
| 2,939,578 | 6,220,833 | 2,361,974 | 6,777,218 | 2,269,692 | 6,662,688 | 1,655,583 | 6,146,569 | ||
| 1,197 | 1,766 | 1,209 | 899 | 853 | 749 | 1,490 | 1,998 | ||
| 24,608 | 25,523 | 11,761 | 2,077 | 3,566 | 1,350 | 25,150 | 12,567 | ||
| 171,369 | 167,708 | 93,770 | 59,708 | 159,613 | 49,652 | 137,229 | 93,761 | ||
| 89.93 | 95.93 | 98.2 | 94.37 | 97.98 | 91.51 | 31.19 | 91.35 | ||
Comparison of time and computing resources used by the compared binning methods.
| Kraken (superDB) | Kraken (minisuperDB) | RAIphy | FCP Blastn+LCA | Diamond (Megan) | Blastx (Megan & ProViDE) | NBC | |
|---|---|---|---|---|---|---|---|
| 33 mins | < 1 min | 30 mins | 75 mins | 3.3 hrs | > 5 days | NA | |
| 78G | 4.2G | 3G | 2.5G | 9.5G | 10G | NA | |
| 1 | 1 | 1 | 8 | 4 | 8 | NA |
*These data are not available for NBC as it was run online.
Fig 1Comparison of the binning methods.
The represented percentages are an average of validations obtained with the 8 simulated datasets.
Viral sequences detected in the Spanish Honeybees dataset.
Comparison of the number of reads classified as viruses by Granberg et al. (Blastn-LCA method) and the number of reads classified as viruses by MetLab with Kraken and vFam methods.
| Granberg | MetLab results | |||
|---|---|---|---|---|
| Taxon | Blastn-LCA | Kraken | vFam | MetLab total |
| Secoviridae | 1968 (TuRSV) | 936 (TuRSV) | 279 | 1215 |
| Dicistriviridae | 1048 (IAPV) | 583 (IAPV) | 0 | 583 |
| 664 (ALPV) | 878 (ALPV) | 0 | 878 | |
| Tymoviridae | 563 (TYMV) | 0 | 206 | 206 |
| Caudovirales (Phages) | 30 | 22 | 0 | 22 |
| Retroviridae | 16 | 68 | 0 | 68 |
| Lake Sinai Virus | 14 | 38 | 0 | 38 |
| Baculoviridae | 0 | 11 | 535 | 546 |
| Phycodnaviridae | 0 | 8 | 193 | 201 |
| Others | 7 | 769 | 613 | 1382 |
TuRSV: Turnip Ringspot Virus, IAPV: Israel Acute Paralysis Virus, ALPV: Aphid Lethal Paralysis Virus, TYMV: Turnip Yellow Mosaic Virus.
Fig 2Main workflow of the MetLab analysis pipelines.
Fig 3MetLab GUI: the experimental design module.