| Literature DB >> 27327495 |
Daniel H Huson1,2, Sina Beier1, Isabell Flade3, Anna Górska1,4, Mohamed El-Hadidi1, Suparna Mitra5, Hans-Joachim Ruscheweyh1, Rewati Tappu1.
Abstract
There is increasing interest in employing shotgun sequencing, rather than amplicon sequencing, to analyze microbiome samples. Typical projects may involve hundreds of samples and billions of sequencing reads. The comparison of such samples against a protein reference database generates billions of alignments and the analysis of such data is computationally challenging. To address this, we have substantially rewritten and extended our widely-used microbiome analysis tool MEGAN so as to facilitate the interactive analysis of the taxonomic and functional content of very large microbiome datasets. Other new features include a functional classifier called InterPro2GO, gene-centric read assembly, principal coordinate analysis of taxonomy and function, and support for metadata. The new program is called MEGAN Community Edition (CE) and is open source. By integrating MEGAN CE with our high-throughput DNA-to-protein alignment tool DIAMOND and by providing a new program MeganServer that allows access to metagenome analysis files hosted on a server, we provide a straightforward, yet powerful and complete pipeline for the analysis of metagenome shotgun sequences. We illustrate how to perform a full-scale computational analysis of a metagenomic sequencing project, involving 12 samples and 800 million reads, in less than three days on a single server. All source code is available here: https://github.com/danielhuson/megan-ce.Entities:
Mesh:
Year: 2016 PMID: 27327495 PMCID: PMC4915700 DOI: 10.1371/journal.pcbi.1004957
Source DB: PubMed Journal: PLoS Comput Biol ISSN: 1553-734X Impact factor: 4.475
Fig 1The compressed fastq files in a metagenome or metatranscriptome sequencing project are (1) compared against a protein reference database such as NCBI-nr using DIAMOND. (2) Taxonomic and functional analysis is then performed on the diamond files using Meganizer. (3) The resulting meganized diamond files remain on the server and are accessed via the MeganServer software. (4) Researchers work interactively with the data using MEGAN CE.
Fig 2The new InterPro2Go viewer.
High-level nodes represent the metagenomic GO-slim [13], whereas low-level nodes are based on InterPro [5]. Here we have uncollapsed the GO “biological process” domain node to show the second tier nodes attached below it. Each node is labeled by a bar chart representing the number of reads assigned to the node, or below it, for 12 different human stool samples [16]. In this example, 27.5% of 816 million reads are assigned to an InterPro family by MEGAN CE.
Fig 3Spreadsheet for entry and analysis of metadata associated with samples.
Fig 4A PCoA analysis of 12 human gut samples [16] computed using species-level profiles and Bray-Curtis distances.
Samples are labeled by subject pseudonym, day 0–34 and whether antibiotics were taken (+) or not (-) on the given day. For both subjects, the plot clearly shows that the taxonomic profiles move further and further away from the original during the course of antibiotics, but then return back close to the original at the end of the study. The top five bi-vectors are also shown, labeled by species name.
For twelve shotgun metagenome samples [16], we report (a) the number of reads, (b) wall-clock time required to align the reads against NCBI-nr using DIAMOND, (c) the number of matches obtained, (d) the number of reads that have at least one alignment and (e) the time required to run Meganizer to perform taxonomic and functional classification of all reads.
The total wall-clock time is 67 hours on a single server with 32 cores.
| (a) | (b) | (c) | (d) | (e) | |
|---|---|---|---|---|---|
| Sample | Reads | DIAMOND (s) | Alignments | Aligned reads | Meganizer (s) |
| Alice 0 | 66 393 401 | 19 062 | 627 405 772 | 44 900 227 | 9 299 |
| Alice 1 | 64 923 975 | 15 771 | 595 715 349 | 43 498 105 | 11 338 |
| Alice 3 | 55 092 349 | 13 435 | 515 249 349 | 37 675 494 | 8 621 |
| Alice 6 | 66 289 376 | 16 801 | 910 892 059 | 52 627 776 | 11 771 |
| Alice 8 | 57 957 661 | 14 134 | 790 946 244 | 45 358 448 | 13 911 |
| Alice 34 | 64 380 386 | 15 615 | 608 114 143 | 44 741 897 | 11 962 |
| Bob 0 | 61 232 588 | 14 573 | 825 213 917 | 48 882 884 | 12 058 |
| Bob 1 | 65 763 766 | 16 203 | 841 038 616 | 51 408 892 | 12 270 |
| Bob 3 | 89 034 641 | 34 598 | 1 233 571 041 | 72 017 720 | 15 789 |
| Bob 6 | 89 339 172 | 27 333 | 1 138 796 522 | 70 344 161 | 15 507 |
| Bob 8 | 78 001 118 | 19 734 | 1 049 831 855 | 63 336 241 | 13 423 |
| Bob 34 | 57 627 119 | 15 406 | 780 844 319 | 45 568 158 | 11 433 |
| Total | 816 035 552 | 222 665 | 9 917 619 186 | 620 360 003 | Max: 15 789 |
| Time | ≈ 62 h | ≈ 5 h |