| Literature DB >> 30942867 |
Stuart M Brown1, Hao Chen1, Yuhan Hao1, Bobby P Laungani2,3, Thahmina A Ali2,3, Changsu Dong2,3, Carlos Lijeron2,3, Baekdoo Kim2,3, Claudia Wultsch3,4, Zhiheng Pei5, Konstantinos Krampis2,3,6.
Abstract
BACKGROUND: Current methods used for annotating metagenomics shotgun sequencing (MGS) data rely on a computationally intensive and low-stringency approach of mapping each read to a generic database of proteins or reference microbial genomes.Entities:
Keywords: Docker; Galaxy; annotation; cloud computing; metagenomics
Mesh:
Year: 2019 PMID: 30942867 PMCID: PMC6446249 DOI: 10.1093/gigascience/giz020
Source DB: PubMed Journal: Gigascience ISSN: 2047-217X Impact factor: 6.524
Figure 1:Multidimensional scaling (MDS) abundance plot generated using the R package edgeR and a mixture model with a negative binomial distribution, for the KEGG annotations generated by MGS-Fast using as input data from gut microbiomes of healthy patients (HV) and patients with liver cirrhosis (LD).
Processing times for the MGS-Fast pipeline in comparison with other workflows (Kraken, HumanAn2, GOTTCHA) used for WGS metagenomics analysis
| Workflow | Data set ID: No. of reads × paired | Data quality control (FASTQC) (min) | Data preparation (Groomer) (min) | Trimmomatic (min) | Human filtering (Bowtie) (min) | Taxonomic classification (MetaPhlAn) (min) | Annotation (Bowtie IGC) (min) | KEGG count (min) | Total (without filtering) (min) |
|---|---|---|---|---|---|---|---|---|---|
| MGS-Fast | ERR526291: 15,181,542 × 2 | 2 | 6 | 2 | 6 | 23 | 22 | 4 | 49 (16) |
| Kraken | N/A | N/A | N/A | N/A | 254 | 254 | |||
| HumanAn2 | N/A | N/A | N/A | N/A | 162 | 162 | |||
| GOTTCHA | N/A | N/A | (1*) | N/A | 17 | 17 | |||
Time for data preprocessing steps (quality control of metagenomic data, filtering of host DNA) performed by MGS-Fast is listed in parentheses. Time is with 8× threads. N/A: not applicable. (1*) The only tool that has a built-in trimming option.
Figure 2:Processing times for all MGS-Fast pipeline steps when used for analysis of different patient metagenomic data sets ranging from 1.5 to 11 GB in size.
Figure 3:Comparison of running time for MGS-Fast, Kraken, and DIAMOND, using the same database built by the IGC sequence set for each tool. All input samples, compiled database, and outputs can be found at http://146.95.173.35:9988/MGS-FAST/.
Bowtie 2 alignment of different metagenomic samples to our IGC/HOMD derivative database
| Metagenome | Sample accession/source | Alignment to database (%) |
|---|---|---|
| Human gut | SRR2822459 | 95.62 |
| Human gut, liver disease | ENA ERP005860 | 96.03 |
| Mouse gut | MG-RAST 4535626.3 | 89.71 |
| Human mouth | SRS016533 | 82.32 |
| Human esophagus | SRS065335 | 71.39 |
| Human vagina | SRS014465 | 66.39 |
| Human skin | SRR1646957 | 33.02 |
| Human genome GRCh38 | MetaSim simulated | 7.35 (false positives) |
|
| MetaSim simulated | 98.50 |
| HMP Mock | SRR172902 | 28.82 |
| Synthetic microbial reads | SRR3732372 | 10.23 |
| Copper mine waste | MG-RAST 4664533.3 | 8.69 |
| Randomly generated reads | XS simulator | 0.53 |
Figure 4:(A) MGS-Fast pipeline on the Galaxy workflow canvas, running on a Docker container. Read quality tools are outlined in red, score adjustment and trimming in blue, Bowtie 2 alignment to the IGC/HOMD or human reference in green, the MetaPhlAn analysis in orange, and annotation parsing from the Bowtie 2 results in yellow. (B) Interface of MGS-Fast pipeline on Galaxy web server running in the Docker container. Users can select the input data and parameters for the pipelines through dropdown menus and input boxes (details in Suppl. Software Manual). (C) The pipeline output for the MetaPhlAn tool, visualized within the Galaxy web interface.