| Literature DB >> 25478562 |
Efthymios Ladoukakis1, Fragiskos N Kolisis1, Aristotelis A Chatziioannou2.
Abstract
The rapid evolution of all sequencing technologies, described by the term Next Generation Sequencing (NGS), have revolutionized metagenomic analysis. They constitute a combination of high-throughput analytical protocols, coupled to delicate measuring techniques, in order to potentially discover, properly assemble and map allelic sequences to the correct genomes, achieving particularly high yields for only a fraction of the cost of traditional processes (i.e., Sanger). From a bioinformatic perspective, this boils down to many GB of data being generated from each single sequencing experiment, rendering the management or even the storage, critical bottlenecks with respect to the overall analytical endeavor. The enormous complexity is even more aggravated by the versatility of the processing steps available, represented by the numerous bioinformatic tools that are essential, for each analytical task, in order to fully unveil the genetic content of a metagenomic dataset. These disparate tasks range from simple, nonetheless non-trivial, quality control of raw data to exceptionally complex protein annotation procedures, requesting a high level of expertise for their proper application or the neat implementation of the whole workflow. Furthermore, a bioinformatic analysis of such scale, requires grand computational resources, imposing as the sole realistic solution, the utilization of cloud computing infrastructures. In this review article we discuss different, integrative, bioinformatic solutions available, which address the aforementioned issues, by performing a critical assessment of the available automated pipelines for data management, quality control, and annotation of metagenomic data, embracing various, major sequencing technologies and applications.Entities:
Keywords: bioinformatics; cloud computing; distributed computing; metagenomics; workflow engines
Year: 2014 PMID: 25478562 PMCID: PMC4237130 DOI: 10.3389/fcell.2014.00070
Source DB: PubMed Journal: Front Cell Dev Biol ISSN: 2296-634X
Figure 1Typical workflow for analysis of metagenomic sequencing data.
Figure 2Raw sequence reads in FASTA format.
Figure 3Raw sequence reads in FASTQ format.
Figure 4Distribution of quality scores of raw sequence reads from FASTQC software. Taxonomic sorting of sequencing reads from MEGAN software (rank level: “species”).
Figure 5Taxonomic sorting of sequencing reads from MEGAN software (rank level: “species”).
Display of features of current bioinformatic pipelines for metagenomic data analysis.
.