| Literature DB >> 34592929 |
Li Shao1,2, Jie Liao3, Jingyang Qian3, Wenbin Chen4, Xiaohui Fan5,6,7.
Abstract
BACKGROUND: Microbiome big data from population-scale cohorts holds the key to unleash the power of microbiomes to overcome critical challenges in disease control, treatment and precision medicine. However, variations introduced during data generation and processing limit the comparisons among independent studies in respect of interpretability. Although multiple databases have been constructed as platforms for data reuse, they are of limited value since only raw sequencing files are considered. DESCRIPTION: Here, we present MetaGeneBank, a standardized database that provides details on sample collection and sequencing, and abundances of genes, microbiota and molecular functions for 4470 raw sequencing files (over 12 TB) collected from 16 studies covering over 10 types of diseases and 14 countries using a unified data-processing pipeline. The incorporation of tools that enable browsing and searching with descriptive attributes, gene sequences, microbiota and functions makes the database user-friendly. We found that the source of specimen contributes more than sequencing centers or platforms to the variations of microbiota. Special attention should be paid when re-analyzing sequencing files from different countries.Entities:
Keywords: Database; Deep sequenced metagenomes; Gut microbiome; Human disease
Mesh:
Year: 2021 PMID: 34592929 PMCID: PMC8485520 DOI: 10.1186/s12866-021-02321-z
Source DB: PubMed Journal: BMC Microbiol ISSN: 1471-2180 Impact factor: 3.605
Fig. 1Overview of MetaGeneBank database. a General workflow for acquiring, processing and achiving metagenomic data. The workflow consists of four processes. The data acquisiton of raw sequencing files (FASTQ) and the study metadata, followed by data processing and annotation procedures. All outputs are integrated into the MetaGeneBank database. Users can browse, search and download the datasets, annotation results and corresponding figures. b Database scheme of MetaGeneBank. Main data structure and relationships between the different tables are illustrated
Detailed information about the data source collected in the database
| Assay | Accession number | Number of sequencing files | Number of paired files | Number of single files |
|---|---|---|---|---|
| AS1.as1 | SRP100575 | 211 | 211 | |
| ACD1.as1 | ERP023788 | 385 | 385 | |
| CFS1.as1 | SRP102150 | 100 | 41 | 59 |
| CC1.as1 | ERP008729 | 310 | 154 | 156 |
| IBD1.as1, IBD2.as1, IBD3.as1, IBD4.as1 | PRJNA389280; ERA000116; PRJEB15371;ERP002061 | 1476 | 1476 | |
| LC1.as1 | ERP005860 | 237 | 237 | |
| Obesity1.as1 | ERP003612 | 595 | 595 | |
| RA1.as1 | PRJEB6997 | 137 | 137 | |
| T1D1.as1 | PRJNA231909 | 126 | 126 | |
| T2D1.as1, T2D1.as2, T2D2.as1, T2D3.as1 | SRA050230; SRA045646; ERP004605; ERP02469 | 807 | 802 | 5 |
| NAFLD1.as1 | PRJNA373901 | 86 | 86 |
Fig. 2The ‘Dataset’ page in MetaGeneBank database. An illustration of a ‘Metagenomic Studies’, b ‘Metagenomic Assays’, c ‘Metagenomic Samples’, d ‘Metagenomic Data’ views, respectively
Fig. 3The ‘Search’ page in MetaGeneBank database. An illustration of descriptive attributes (I) and search modes (II)
Fig. 4An illustration of ‘General’ (a) and ‘Gene’ (b) search modes and corresponding outputs
Fig. 5The PCA score plots for all healthy controls based on the microbial abundance in phylum level. The solid circles in (a-d) are colored according to studies, sequencing centers, sample collection strategies, and sample collection countries, respectively