| Literature DB >> 28654083 |
Léa Siegwald1,2,3,4, Christophe Audebert1,2, Gaël Even1,2, Eric Viscogliosi4, Ségolène Caboche2,4, Magali Chabé4.
Abstract
In the past decade, metagenomics studies have become widespread due to the arrival of second-generation sequencing platforms characterized by low costs, high throughput and short read lengths. Today, although benchtop sequencers are considered to be accurate platforms to deliver data for targeted metagenomics studies, the limiting factor has become the analysis of these data. In a previous paper, we performed an Ion Torrent PGM 16S rDNA gene sequencing of faecal DNAs from 48 Blastocystis-colonized patients and 48 Blastocystis-negative subjects, in order to decipher the impact of this widespread protist on gut microbiota composition and diversity. We report here on the Ion Torrent targeted metagenomic sequencing and analysis of these 96 human faecal samples, and the complete datasets from raw to analysed data. We also provide the key steps of the bioinformatic analyses, from library preparation to data filtering and OTUs tables generation. This data represents a valuable resource for the scientific community, enabling re-processing of these targeted metagenomic datasets through various pipelines and a comparative evaluation of microbiota analysis methods.Entities:
Mesh:
Year: 2017 PMID: 28654083 PMCID: PMC5486356 DOI: 10.1038/sdata.2017.81
Source DB: PubMed Journal: Sci Data ISSN: 2052-4463 Impact factor: 6.444
Figure 1Schematic representation of the home-made bioinformatics pipeline for the analysis of targeted metagenomic Ion Torrent sequencing data, that included several publicly available tools (e.g. Mothur[10], EspritTree[12], QIIME[17] or DESeq2[16]), databases (the Silva small subunit RNA database[11] and Ribosomal Database Project (RDP)[13]) and home-made Perl/Python scripts.
OTU_count_tables files (Data Citation 2) summary.
| Home-made Galaxy pipeline | TSV | 93 | 1 | 25 (s.d.: 258.6) | 1,184 (s.d.: 846) | 0.77 |
Raw Global BIOM file (Data Citation 2) summary.
| Global OTU table | Biom | 93 | 474 | 2,742,108 | 0.187 | 29,485 (s.d.: 5,739) | 29,419 | Taxonomy |
Normalized Global BIOM file (Data Citation 2) summary.
| Global OTU table—biom format | Biom | 93 | 405 | 33,624 | 0.217 | 361 (s.d.:138) | 353 | Taxonomy |
Figure 2Quality scores across all bases box-and-whiskers plot (FastQC Read Quality reports (Galaxy Version 0.67)). Red line=median value, blue line=mean value, yellow box=inter-quartile range, upper and lower whiskers = 10% and 90% points respectively.
Read metrics from output PGM quality-approved, trimmed and filtered sequence data.
| Raw data | 272.67 | 273 | 42,603 | 41,956 | 14,617 |
*Three outlier samples (indexes 18, 63 and 50 belonging to groups 1, 2 and 3 respectively) were discarded before the analyses.
Mean read length and read number per index from output PGM quality-approved, trimmed and filtered sequence data in the four groups of patients.
| Mean read number per index | 43,058 | 44,150.04 | 41,374.65 | 41,862.54 |
| Mean read length (bases) | 270 | 273.96 | 275.13 | 271.62 |
*Three outlier samples (indexes 18, 63 and 50 belonging to groups 1, 2 and 3 respectively), were discarded before the analyses.