| Literature DB >> 30078138 |
Tiphaine C Martin1,2, Alessia Visconti1, Tim D Spector3, Mario Falchi4.
Abstract
Owing to the increased cost-effectiveness of high-throughput technologies, the number of studies focusing on the human microbiome and its connections to human health and disease has recently surged. However, best practices in microbiology and clinical research have yet to be clearly established. Here, we present an overview of the challenges and opportunities involved in conducting a metagenomic study, with a particular focus on data processing and analytical methods.Entities:
Keywords: Human microbiome; Metagenomics; Microbiology and clinical research; Next generation sequencing
Mesh:
Year: 2018 PMID: 30078138 PMCID: PMC6153607 DOI: 10.1007/s00253-018-9209-9
Source DB: PubMed Journal: Appl Microbiol Biotechnol ISSN: 0175-7598 Impact factor: 4.813
Fig. 1Metagenomics analysis pipeline. Hexagons represent the analysis steps. Rectangles and parallelepipeds denote the output data and reports, respectively. Cylinders represent additional data to be provided in input
Resource usage. The reported figures were obtained by applying the proposed metagenomics pipeline to 842 raw paired-end FASTQ files with an average 26M reads per sample. Experiments were run on an HPC facility using 4 threads and limiting the available RAM to a maximum of 32 GB
| Step | Data format | Tool | Virtual memory peak | Time | Storage |
|---|---|---|---|---|---|
| (average [min–max]) | (average [min–max]) | (average [min–max]) | |||
| Raw data | (Compressed) | – | – | – | 4.48 GB |
| FASTQ | [1.45-9.32GB] | ||||
| Quality | html + text | FastQC | 385.53 MB | 4 min 12 s | 1.05 MB |
| assessment | [326.00–492.90 MB] | [1 min 49 s–7 min 28s] | [0.80–1.24 MB] | ||
| De-duplication | (Compressed) | Clumpify | 27.74 GB | 16 min 35 s | 3.26 GB |
| FASTQ | [18.10–31.40 GB] | [6 min 12 s –38 min 50 s] | [1.05–8.18 GB] | ||
| Trimming | FASTQ | BBduk | 8.43 GB | 9 min 11 s | 13.81 GB |
| [8.00–10.90 GB] | [3 min 15 s–25 min 24 s] | [5.23–27.42 GB] | |||
| Decontamination | FASTQ | BBwrap | 16.82 GB | 30 min 13 s | 13.80 GB |
| [15.50–23.90 GB] | [6 min–59 min 6 s] | [5.23–27.40 GB] | |||
| Taxonomic binning | text + biom | MetaPhlan2 | 1.52 GB | 18 min 4 s | 93.12 MB |
| and profiling | [1.30–2.50 GB] | [2 min–35 min 21 s] | [21.77–281.74 MB] |