| Literature DB >> 34788838 |
Die Dai1, Jiaying Zhu1, Chuqing Sun1, Min Li1, Jinxin Liu2, Sicheng Wu1, Kang Ning1, Li-Jie He3, Xing-Ming Zhao2,4,5, Wei-Hua Chen1,6.
Abstract
GMrepo (data repository for Gut Microbiota) is a database of curated and consistently annotated human gut metagenomes. Its main purposes are to increase the reusability and accessibility of human gut metagenomic data, and enable cross-project and phenotype comparisons. To achieve these goals, we performed manual curation on the meta-data and organized the datasets in a phenotype-centric manner. GMrepo v2 contains 353 projects and 71,642 runs/samples, which are significantly increased from the previous version. Among these runs/samples, 45,111 and 26,531 were obtained by 16S rRNA amplicon and whole-genome metagenomics sequencing, respectively. We also increased the number of phenotypes from 92 to 133. In addition, we introduced disease-marker identification and cross-project/phenotype comparison. We first identified disease markers between two phenotypes (e.g. health versus diseases) on a per-project basis for selected projects. We then compared the identified markers for each phenotype pair across datasets to facilitate the identification of consistent microbial markers across datasets. Finally, we provided a marker-centric view to allow users to check if a marker has different trends in different diseases. So far, GMrepo includes 592 marker taxa (350 species and 242 genera) for 47 phenotype pairs, identified from 83 selected projects. GMrepo v2 is freely available at: https://gmrepo.humangut.info.Entities:
Mesh:
Substances:
Year: 2022 PMID: 34788838 PMCID: PMC8728112 DOI: 10.1093/nar/gkab1019
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Disease markers identified between two phenotypes in a project. Here data from BioProject PRJEB6070 are used as an example; health and disease (adenoma) enriched species are plotted in green and pink respectively. The markers were identified using LEfSe. LDA (linear discriminant analysis) scores (X-axis) were used to show the extents of their enrichment. For whole-genome metagenomic dataset like PRJEB6070, genus level markers were also identified. Users can use the widgets (blue buttons) to choose the markers to show. For 16S rRNA datasets, only genus level markers were identified; thus, the ‘Species’ button will be unclickable.
Figure 2.Cross-study comparison of microbial markers. (A) Comparison of marker species for colorectal cancer in seven metagenomic projects. (B) Comparison of marker species for ‘arthritis, rheumatoid’ in two projects. Marker taxa with LDA <−2 are health enriched, while those with LDA >2 are disease enriched. Health and disease enriched markers are shown in green and red respectively, with deeper color indicate increased enrichment. To facilitate users to explore the markers, a few widgets are included to allow users to 1) filter markers according to the number of projects they are identified, 2) filter markers according to the absolute LDA scores, 3) exclude markers that show inconsistent trends (e.g. those are significantly decreased in disease in one project but significantly increased in others) among projects and 4) change the size of the tiles. Users can also save the resulting visualization as SVG or PNG format. Please consult https://gmrepo.humangut.info/phenotypes/comparisons/D006262/D015179 and https://gmrepo.humangut.info/phenotypes/comparisons/D006262/D001172 for the interactive versions on our website; for the second link, please change the value of the ‘NR.PROJECTS (> = ):’ widget on the webpage to ‘1’ in order to show the markers.
Figure 3.Cross-disease comparison of marker taxa. (A) Enrichment trends of Fusobacterium nucleatum across diseases and projects. (B) A marker-centric view of Prevotella copri across diseases and projects. Please consult https://gmrepo.humangut.info/taxon/851 and https://gmrepo.humangut.info/taxon/165179 for the online versions.