Literature DB >> 31504765

GMrepo: a database of curated and consistently annotated human gut metagenomes.

Sicheng Wu¹, Chuqing Sun¹, Yanze Li¹, Teng Wang¹, Longhao Jia¹, Senying Lai¹, Yaling Yang^1,2, Pengyu Luo¹, Die Dai¹, Yong-Qing Yang³, Qibin Luo⁴, Na L Gao^1,5, Kang Ning^1,6, Li-Jie He⁷, Xing-Ming Zhao^8,9, Wei-Hua Chen^1,6,10.

Abstract

GMrepo (data repository for Gut Microbiota) is a database of curated and consistently annotated human gut metagenomes. Its main purpose is to facilitate the reusability and accessibility of the rapidly growing human metagenomic data. This is achieved by consistently annotating the microbial contents of collected samples using state-of-art toolsets and by manual curation of the meta-data of the corresponding human hosts. GMrepo organizes the collected samples according to their associated phenotypes and includes all possible related meta-data such as age, sex, country, body-mass-index (BMI) and recent antibiotics usage. To make relevant information easier to access, GMrepo is equipped with a graphical query builder, enabling users to make customized, complex and biologically relevant queries. For example, to find (1) samples from healthy individuals of 18 to 25 years old with BMIs between 18.5 and 24.9, or (2) projects that are related to colorectal neoplasms, with each containing >100 samples and both patients and healthy controls. Precomputed species/genus relative abundances, prevalence within and across phenotypes, and pairwise co-occurrence information are all available at the website and accessible through programmable interfaces. So far, GMrepo contains 58 903 human gut samples/runs (including 17 618 metagenomes and 41 285 amplicons) from 253 projects concerning 92 phenotypes. GMrepo is freely available at: https://gmrepo.humangut.info.

Entities: CellLine Chemical Disease Gene Species

Mesh：

Year: 2020 PMID： 31504765 PMCID： PMC6943048 DOI： 10.1093/nar/gkz764

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

INTRODUCTION

Increasing evidence has linked gut microbiota to many aspects of human life, including health (1–3), diseases (4–13), development (14–18), responses to drugs and treatments (19–23). In recent years, the number and total volume of human gut metagenomic data (including both 16S and metagenomic sequencing data) have been increasing rapidly (24). Most of the raw sequencing data have been deposited into several general purpose databases, such as NCBI Sequence Read Archive (SRA) (25) (https://www.ncbi.nlm.nih.gov/sra) and European Nucleotide Archive (ENA) (26) (https://www.ebi.ac.uk/ena). A few other databases, including EBI Metagenomics (now MGnify) (24), gcMeta (27), MSE (28) and Qiita (29), have provided processed data and organized them according to the habitats from which the samples were taken. These public resources greatly facilitate data reuse, especially meta-analyses across multiple related studies for the purpose of cross-study validation and discovery of novel causal microbial taxa underlying certain phenotypes (11,12,30). Despite these existing efforts to deposit, organize and analyze the rapidly growing human metagenomic data, major obstacles to their reusability and accessibility remain, especially incomplete and/or inaccurate phenotype information and/or missing meta-data. Recently, a study reported an initial effort to curate human metagenomic data; however, the data were limited in the number of samples reported (5716 samples collected from 26 projects as of January 2017 when the results were first published), contained metagenomic data from other body sites and could only be accessed using R (31). In addition, so far there have been no systematic efforts to help users filter human gut samples and/or projects with biologically relevant questions. For example, there is no easy way to find fecal samples that were taken from healthy individuals of 18–25 years of age with healthy body mass indexes (BMIs, 18.9–24.9) from any of the existing databases and data sources; also it is very difficult to find all the projects that are related to colorectal neoplasms studies, contain >100 samples and contain both patients and healthy controls. To address these issues, and more importantly to facilitate the reusability and accessibility of the rapidly growing human metagenomic data, we developed GMrepo as a database of curated human gut metagenomic data (including both 16S and metagenomic sequencing data). The main features of GMrepo include: (i) manually curated phenotype information for each collected run/sample and all possible related meta-data, such as the age, sex, country, body-mass-index (BMI) and even recent antibiotics usage; more meta-data could be included in the future; (ii) consistently annotated microbial contents, including taxonomic assignments of sequencing reads and precommutated species/genus relative abundances using state-of-art toolsets; (iii) collected samples organized according to their associated phenotypes and statistics, including species-prevalence, abundances and co-occurrences; (iv) in addition to the online database, GMrepo also provides programmable access to most of its contents through representational state transfer (REST) application programming interfaces (APIs); (v) more importantly, GMrepo is equipped with powerful and easy-to-use graphical query builders to allow users to make customized, biologically meaningful queries to the collected samples and projects.

CONSTRUCTION AND CONTENTS OF GMrepo

Figure 1 illustrates the overall workflow of GMrepo, while Figure 2 shows the detailed analysis pipeline of the collected human gut metagenomic data. Below is a brief summary of the materials and methods used in this study.

Figure 1.

Overall workflow of GMrepo. Processing steps are indicated in the blue rounded boxes.

Figure 2.

Schematic representations of the GMrepo metagenomics pipeline for amplicon data (A) and metagenomic data (B). Processing steps are indicated in the blue rounded boxes and tools are marked on the arrows. Input and output files as colored rectangles (black, green, red). Conditional judgments are in trapezoids. QC1: a run will be marked as ‘failed’ (QCStatus = = 0) if less than 20k reads or <50% of reads were retained after trimming; QC2: arun will be marked as ‘failed’ (QCStatus = = 0) if a single taxon accounts for >99.99% of the total abundance.

Overall workflow of GMrepo. Processing steps are indicated in the blue rounded boxes. Schematic representations of the GMrepo metagenomics pipeline for amplicon data (A) and metagenomic data (B). Processing steps are indicated in the blue rounded boxes and tools are marked on the arrows. Input and output files as colored rectangles (black, green, red). Conditional judgments are in trapezoids. QC1: a run will be marked as ‘failed’ (QCStatus = = 0) if less than 20k reads or <50% of reads were retained after trimming; QC2: arun will be marked as ‘failed’ (QCStatus = = 0) if a single taxon accounts for >99.99% of the total abundance.

Data acquisition of sequencing reads and manual curation of meta-data

Raw sequencing reads were downloaded from the EBI ENA (26) (European Nucleotide Archive, https://www.ebi.ac.uk/ena) and NCBI SRA (25) (Sequence Read Archive, https://www.ncbi.nlm.nih.gov/sra) databases using command line tools from enaBrowserTools (https://github.com/enasequence/enaBrowserTools) and SRA-Tools (https://ncbi.github.io/sra-tools/) facilitated by Aspera (a high-speed data transfer tool). Related meta-data of the sequencing platforms, corresponding biosamples, experiments, projects and the human hosts from which the fecal samples were taken, were obtained from EBI Metagenomics (now MGnify) (24) and related databases of the NCBI. Two rounds of manual curation were then performed on the meta-data. For the first round, meta-information, such as phenotypes (health or diseases), age, sex and BMI of the associated samples/runs were extracted using in-house R and Perl scripts and were manually curated and supplemented with the materials obtained from the related publications and/or even from the authors (Figure 1). The extracted meta-data include sequencing related meta-data, including the sequencing platform, type of sequences obtained (i.e. 16S or metagenomic) number of sequences, and human host related meta-data including phenotypes (i.e., diseases or healthy), age, sex, country, BMI and antibiotic usage. For the second round, different curators from the first round reviewed the collected meta-data and made necessary corrections.

Processing of raw sequencing reads

FastQC (version 0.11.8, http://www.bioinformatics.babraham.ac.uk/projects/fastqc/) was used to evaluate the overall quality of the downloaded data, followed by the use of Trimmomatic (32) to remove sequencing vectors and low-quality bases. Sequences shorter than two-thirds of the original read length were removed from the subsequent analysis (Figure 2). For 16S sequences, single-ended sequencing reads were used directly for subsequent analysis, while the pair-ended reads were first merged using Casper (33) before downstream processing. Metagenomic sequences were used directly for subsequent analysis, regardless of whether they were single- or pair-ended. The processed data were referred as to ‘clean data’. When necessary, Seqtk (https://github.com/lh3/seqtk) was used to convert FASTQ sequences to FASTA format.

Taxonomic assignment to processed sequencing reads and calculation of relative abundances

For 16S sequences, MAPseq version 1.2 (34) was used to analyze the obtained clean data and assign taxonomic classification information to the reads. Reads with a combined score higher than 0.4 at the genus level were used for subsequent analysis, as recommended by the authors of MAPseq. Relative abundances were then calculated at the genus and species levels for each sample/run, with total abundance values of 100%. For metagenomic sequences, MetaPhlAn2 (35) was used with default parameters for the taxonomic assignments to the sequencing reads and calculating the relative abundances at species and genus levels.

Two-step quality controls

A two-step quality control process was used to ensure the quality of the data (Figure 2). First, amplicon sequencing samples/runs with <20 000 reads were removed from subsequent analysis and were marked as ‘failed QC (QC status = 0)’ in GMrepo. The second step of quality control is for both amplicon sequences and metagenomic sequences. After taxonomy assignment, samples/runs containing only a single taxon, i.e., a species or a genus accounted for more than or equal to 99.99% of the total abundance, will also be marked as ‘failed QC (QC status = 0)’.

Species co-occurrence analysis

Species co-occurrences were performed separately for phenotypes with more than 50 related samples/runs. For each species-species and genus-genus pair of phenotypes, Fisher's exact text (fisher.test() function in R) was used; the four required numbers as input are: the number of samples/runs in which both taxa are found, the numbers of samples in which either taxa are found and the number of samples in which neither of the taxa are found. Taxon pairs with an Odds ratio (OR) value larger than 1 and a P-value < 0.05 are considered to significantly co-occur in a phenotype. In addition to the presence/absence information, the relative abundances of the co-occurring pairs were used to calculate Person and Spearman correlations in order to further describe the directions of the interactions between the two taxa.

Database construction and web development

All data were loaded into a MySQL database. The frontend (the webpages) of the website was coded using HTML and JavaScript, while the backend was coded using PHP with a Slim framework to support queries to the MySQL database and provide representational state transfer (REST) application programming interfaces (APIs) for programmable access to our data. The AngularJS framework was used to bride the front- and back- ends. D3.js and plotly.js were used for visualizations at the front-end. Various other open-source JavaScript libraries were also used, including jQuery and jQuery QueryBuilder. The website is hosted on an Apache server.

USAGE, UTILITY AND FUTURE DIRECTIONS

Human gut metagenomic data organized according to host phenotypes

Through multiple rounds of manual curation, we collected meta-data for a total of 58 903 runs of human gut metagenomic data from 253 projects, including 17 618 metagenomes and 41 285 amplicons spanning 92 phenotypes (health and 91 diseases). Figure 3 summarizes statistics of some of the metadata we have collected. For example, we were able to assign explicit phenotype information to most of the collected samples (88.17%, Figure 3A); however, despite our efforts, we were able to obtain only very basic meta-data including age, sex and BMI for only one third of the samples. As shown in Figure 3B, 30.86% of the samples contained none of the basic meta-data, while the rest contained only one or two (25.95% and 10.31%, respectively). These results highlight the challenges in reusing metagenomic data and call for reporting standards of minimal meta-data information or metagenomic samples.

Figure 3.

Statistics of some of the metadata we collected. (A) The unknown phenotype means that the health status of the sample provider is not clearly indicated. For data from the American Gut Project (AGP), we only use diagnoses from medical professionals (doctor, physician assistant). Samples with unknown phenotypes are mainly from AGP. (B) The integrity of the metadata is assessed based on age, sex and BMI. In addition to the project-run relationships, we organized the collected gut metagenomic data according to their associated host phenotypes. We adopted the MeSH system (36) (Medical Subject Headings, a hierarchically organized controlled vocabulary for biomedical information) to describe and organize these phenotypes. Listed in Table 1 are the top 10 phenotypes included in GMrepo.

Table 1.

Top 10 phenotypes included in GMrepo

Phenotype	No. of runs	No. of processed runs	No. of valid runs	No. of failed runs	No. of associated species	No. of associated genus
Health	27 329	20 320	12 485	7835	6189	1613
Colitis, Ulcerative	2509	2440	1175	1265	4183	1285
Irritable Bowel Syndrome	2092	2091	954	1137	3320	1064
Infant, Premature	1443	1443	1240	203	260	97
Colorectal Neoplasms	1374	1374	1256	118	4596	1380
Diarrhea	1355	1354	470	884	2775	906
Constipation	1244	1244	611	633	3146	1022
Migraine Disorders	1235	1235	574	661	2894	964
Lung Diseases	1228	1228	592	636	2817	958
Autoimmune Diseases	1154	1154	547	607	2848	956

No. of runs: all runs with curated meta-data,

No. of processed runs: number of all runs with the sequence data processed; please note all runs will be processed eventually,

No. of valid runs: number of runs whose data passed our QC procedure and the corresponding species/genus relative abundances are available in our database,

No. of failed runs: number of runs whose data DID NOT passed our QC procedure,

No. of associated species: number of species associated with the processed and valid runs.

No. of associated genus: number of genus associated with the processed and valid runs.

Top 10 phenotypes included in GMrepo No. of runs: all runs with curated meta-data, No. of processed runs: number of all runs with the sequence data processed; please note all runs will be processed eventually, No. of valid runs: number of runs whose data passed our QC procedure and the corresponding species/genus relative abundances are available in our database, No. of failed runs: number of runs whose data DID NOT passed our QC procedure, No. of associated species: number of species associated with the processed and valid runs. No. of associated genus: number of genus associated with the processed and valid runs. For each phenotype, we summarized the total numbers of associated species and genera. For example, in total there are 6189 species (and/or strains) associated with healthy individuals (https://gmrepo.humangut.info/phenotypes/D006262), which were assigned to a total of 1613 genera. However, only 389 (∼6.3% of the total) species, assigned to 91 (∼5.6% of total) genera, were found in more than one sample with a median relative abundance higher than 0.01%. Similar results were found in other phenotypes. These results indicate that most of these taxa were found in only a small number of runs, and/or are presented with limited abundances. In all of the 28 252 valid runs in our database, we found that in total 6973 species were assigned to 1710 genera. Among these, 2685 species, assigned to 834 genera, were found in more than one sample with a median relative abundance higher than 0.01% within one or more phenotypes (Figure 4, the phylogenetic relationships of these species were obtained from the NCBI taxonomy database (37) and were visualized using Evolview v3 (38)); these numbers are close to recently published results (39). Although the prevalence of most species is low, our results have expanded the known species repertoire of the collective human gut microbiota. Diet, region, and disease are known to affect the abundance and diversity of the human gut microbiota. We believe that the total number of species/strains in the human gut flora will be further increased as more samples are analyzed in the future.

Figure 4.

Phylogenetic tree comprising the 2685 included species, based on NCBI taxonomy. These 2685 species were found in more than one samples with a median relative abundance higher than 0.01% within one or more phenotypes. The three inner layers show the statistics of these species in our database, including the median relative abundance of the species (red) and the species prevalence in the samples (brown) and phenotypes (yellow). The outermost layer shows the corresponding phyla of these species. Additional links to the NCBI BioProject, NCBI SRA and NCBI MeSH Browser were also provided for each of the projects, runs and phenotypes, in order to facilitate researchers to obtain more information or download raw sequencing data. More external databases will be included in the future.

Species abundance, prevalence and co-occurrence within and across phenotypes

With the availability of precalculated relative abundance information for all valid runs in GMrepo, we allow users to visualize the species/genus abundance distribution in a phenotype of interest as a scatter plot (Supplementary Figure S1); if a user chooses a disease (e.g. Crohn's Disease or colorectal neoplasms), the abundances of the same taxon in healthy controls will also be retrieved and visualized side-by-side with the disease in the scatter plot and boxplot (Figure 5A, B). Visualization and comparison of the taxon abundances across all phenotypes is also supported (Supplementary Figures S2 and S3).

Figure 5.

Details of a species in Crohn's Diseases. Faecalibacterium prausnitzii is chosen to show its distributions (A) and relative abundances (B) in Crohn's Disease. For various disease phenotypes, the relative abundances of the species of interest in healthy controls (green) will also be retrieved and visualized side-by-side with the disease (red). (C) A species co-occurrence network constructed based on the significantly co-occurred pairs for a phenotype (Crohn's Disease). Nodes: species co-occurred with others in samples of this phenotype with sizes proportional to the number of connected nodes in the network. Links: indicate co-occurring relationships between species with widths proportional to the absolute value of the correlation coefficient (Pearson correlation), while the colors indicate positive (green) or negative (red) correlations. Placing a mouse over a node can highlight the node and its direct neighbors and show the names of the node and its direct neighbors. We also calculated the species/genus prevalence for each species (Supplementary Figure S4). Based on the presence/absence information, we calculated pairwise co-occurrences within each phenotype for all possible species-species and genus-genus pairs. For significantly co-occurred pairs (see the ‘Construction and contents of GMrepo’ for details), we also provided precalculated Person and Spearman correlation coefficient values based on their relative abundances, in order to further describe the directions of the interactions between the two taxa. For example, a significant positive correlation coefficient may indicate the two taxa prefer similar environments and/or are beneficial to each other's’ growth, while a significant negative correlation coefficient may indicate the two taxa prefer different environments and/or are competitive. A co-occurrence network can then be constructed based on significantly co-occurred pairs, as shown in Figure 5C. Additional links to external databases were also provided for each of the species and genera identified in GMrepo, in order to facilitate researchers in obtaining related information on these taxa. So far we have linked GMrepo to NCBI taxonomy, ENA taxonomy, genome annotations (40), microbe to bacteriophage interactions (41), bacteria to drug interactions (http://www.bugdrug-db.info) and a few others (42). More external databases will be included in the future.

Complex and biologically relevant queries to our data are facilitated by graphical query builders

One of the most important features of GMrepo is the collection and manual curation of related meta-data. To further take advantage of this data, we equipped GMrepo with graphical query builders (powered by the jQueryBuilder widget) to allow users to perform complex queries. We provided two query builders and three examples for each. As shown in Figure 6, the query builders are easy to use because of their straightforward and self-explanatory interface. They support complex logic combinations (AND, OR and grouping) that allow users to perform biologically relevant queries. For example, Figure 6A shows how to find runs/samples from healthy individuals with BMIs between 18.5 and 24.9; Figure 6B allows users to find fecal samples of Americans who have not used antibiotics recently; Figure 6C shows how to find projects that are related to neurological diseases (including autism spectrum disorder, bipolar disorder and depression) and each contains healthy controls. More examples can be found at https://gmrepo.humangut.info.

Figure 6.

Graphical selectors and three examples. These selectors support complex logic combinations (AND, OR and grouping) that allow users to perform biologically relevant queries. (A) Shows how to find samples from healthy individuals with BMIs between 18.5 and 24.9; (B) allows users to find fecal samples of Americans who did not recently use antibiotics; (C) shows how to find projects that are related to neurological diseases (e.g. including autism spectrum disorder, bipolar disorder and depression) and each contains healthy controls.

Future directions

In addition to the continuous collection of new human gut metagenomic data in the coming years, we plan to add new contents to GMrepo, including (but not limited to) viral abundances, functional profiles and metabolic pathway profiles for the collected samples. We also plan to include more utilities, allowing users to perform on-site cross-sample comparisons, differential abundance analysis and mathematical modeling. These will further facilitate the reusability and accessibility of human gut metagenomic data and will contribute to better understanding of the relationships between gut microbiota dysbiosis and human diseases.

CONCLUSIONS

In this study, we introduced GMrepo, an online database of curated, consistently annotated meta-data and human gut metagenomic data. With 58 903 samples/runs collected from 253 projects and 92 phenotypes, GMrepo is one of the largest databases dedicated to human gut metagenomes (including both 16S and metagenomic sequences). We carefully curated meta-data and applied stringent criteria to keep only high quality data. To facilitate reusability and accessibility, we included precomputed species/genus relative abundances, prevalence within and across phenotypes, as well as pairwise co-occurrence information. These data are available at the website and can be accessed through programmable interfaces. To make relevant information easier to access, we equipped GMrepo with a graphical query builder, allowing users to make customized, complex and biologically relevant queries. We will continue developing GMrepo in the near future by including more manually curated human gut metagenomic data, more functional annotated data, and more utilities.

DATA AVAILABILITY

All data are freely accessible to all academic users. This work is licensed under a Creative Commons Attribution-Non-Commercial 3.0 Unported License (CC BY-NC 3.0). In addition to various download functions on many webpages, users can download all data from the ‘Data downloads’ section of the ‘Help’ page. Programmable access through REST APIs is also supported; detailed instructions on using R, Perl and Python to access our data can be found at the ‘Programmable access’ section of the ‘Help’ page or our GitHub page: https://github.com/evolgeniusteam/GMrepoProgrammableAccess. Click here for additional data file.

42 in total

1. A human gut microbial gene catalogue established by metagenomic sequencing.

Authors: Junjie Qin; Ruiqiang Li; Jeroen Raes; Manimozhiyan Arumugam; Kristoffer Solvsten Burgdorf; Chaysavanh Manichanh; Trine Nielsen; Nicolas Pons; Florence Levenez; Takuji Yamada; Daniel R Mende; Junhua Li; Junming Xu; Shaochuan Li; Dongfang Li; Jianjun Cao; Bo Wang; Huiqing Liang; Huisong Zheng; Yinlong Xie; Julien Tap; Patricia Lepage; Marcelo Bertalan; Jean-Michel Batto; Torben Hansen; Denis Le Paslier; Allan Linneberg; H Bjørn Nielsen; Eric Pelletier; Pierre Renault; Thomas Sicheritz-Ponten; Keith Turner; Hongmei Zhu; Chang Yu; Shengting Li; Min Jian; Yan Zhou; Yingrui Li; Xiuqing Zhang; Songgang Li; Nan Qin; Huanming Yang; Jian Wang; Søren Brunak; Joel Doré; Francisco Guarner; Karsten Kristiansen; Oluf Pedersen; Julian Parkhill; Jean Weissenbach; Peer Bork; S Dusko Ehrlich; Jun Wang
Journal: Nature Date: 2010-03-04 Impact factor: 49.962

2. Dynamics and Stabilization of the Human Gut Microbiome during the First Year of Life.

Authors: Fredrik Bäckhed; Josefine Roswall; Yangqing Peng; Qiang Feng; Huijue Jia; Petia Kovatcheva-Datchary; Yin Li; Yan Xia; Hailiang Xie; Huanzi Zhong; Muhammad Tanweer Khan; Jianfeng Zhang; Junhua Li; Liang Xiao; Jumana Al-Aama; Dongya Zhang; Ying Shiuan Lee; Dorota Kotowska; Camilla Colding; Valentina Tremaroli; Ye Yin; Stefan Bergman; Xun Xu; Lise Madsen; Karsten Kristiansen; Jovanna Dahlgren; Jun Wang; Wang Jun
Journal: Cell Host Microbe Date: 2015-05-13 Impact factor: 21.023

3. Human gut microbes impact host serum metabolome and insulin sensitivity.

Authors: Helle Krogh Pedersen; Valborg Gudmundsdottir; Henrik Bjørn Nielsen; Tuulia Hyotylainen; Trine Nielsen; Benjamin A H Jensen; Kristoffer Forslund; Falk Hildebrand; Edi Prifti; Gwen Falony; Emmanuelle Le Chatelier; Florence Levenez; Joel Doré; Ismo Mattila; Damian R Plichta; Päivi Pöhö; Lars I Hellgren; Manimozhiyan Arumugam; Shinichi Sunagawa; Sara Vieira-Silva; Torben Jørgensen; Jacob Bak Holm; Kajetan Trošt; Karsten Kristiansen; Susanne Brix; Jeroen Raes; Jun Wang; Torben Hansen; Peer Bork; Søren Brunak; Matej Oresic; S Dusko Ehrlich; Oluf Pedersen
Journal: Nature Date: 2016-07-13 Impact factor: 49.962

4. A metagenome-wide association study of gut microbiota in type 2 diabetes.

Authors: Junjie Qin; Yingrui Li; Zhiming Cai; Shenghui Li; Jianfeng Zhu; Fan Zhang; Suisha Liang; Wenwei Zhang; Yuanlin Guan; Dongqian Shen; Yangqing Peng; Dongya Zhang; Zhuye Jie; Wenxian Wu; Youwen Qin; Wenbin Xue; Junhua Li; Lingchuan Han; Donghui Lu; Peixian Wu; Yali Dai; Xiaojuan Sun; Zesong Li; Aifa Tang; Shilong Zhong; Xiaoping Li; Weineng Chen; Ran Xu; Mingbang Wang; Qiang Feng; Meihua Gong; Jing Yu; Yanyan Zhang; Ming Zhang; Torben Hansen; Gaston Sanchez; Jeroen Raes; Gwen Falony; Shujiro Okuda; Mathieu Almeida; Emmanuelle LeChatelier; Pierre Renault; Nicolas Pons; Jean-Michel Batto; Zhaoxi Zhang; Hua Chen; Ruifu Yang; Weimou Zheng; Songgang Li; Huanming Yang; Jian Wang; S Dusko Ehrlich; Rasmus Nielsen; Oluf Pedersen; Karsten Kristiansen; Jun Wang
Journal: Nature Date: 2012-09-26 Impact factor: 49.962

5. A new genomic blueprint of the human gut microbiota.

Authors: Alexandre Almeida; Alex L Mitchell; Miguel Boland; Samuel C Forster; Gregory B Gloor; Aleksandra Tarkowska; Trevor D Lawley; Robert D Finn
Journal: Nature Date: 2019-02-11 Impact factor: 49.962

6. CASPER: context-aware scheme for paired-end reads from high-throughput amplicon sequencing.

Authors: Sunyoung Kwon; Byunghan Lee; Sungroh Yoon
Journal: BMC Bioinformatics Date: 2014-09-10 Impact factor: 3.169

7. proGenomes: a resource for consistent functional and taxonomic annotations of prokaryotic genomes.

Authors: Daniel R Mende; Ivica Letunic; Jaime Huerta-Cepas; Simone S Li; Kristoffer Forslund; Shinichi Sunagawa; Peer Bork
Journal: Nucleic Acids Res Date: 2016-10-24 Impact factor: 16.971

8. Identifying and Predicting Novelty in Microbiome Studies.

Authors: Xiaoquan Su; Gongchao Jing; Daniel McDonald; Honglei Wang; Zengbin Wang; Antonio Gonzalez; Zheng Sun; Shi Huang; Jose Navas; Rob Knight; Jian Xu
Journal: MBio Date: 2018-11-13 Impact factor: 7.867

9. EBI Metagenomics in 2017: enriching the analysis of microbial communities, from sequence reads to assemblies.

Authors: Alex L Mitchell; Maxim Scheremetjew; Hubert Denise; Simon Potter; Aleksandra Tarkowska; Matloob Qureshi; Gustavo A Salazar; Sebastien Pesseat; Miguel A Boland; Fiona M I Hunter; Petra Ten Hoopen; Blaise Alako; Clara Amid; Darren J Wilkinson; Thomas P Curtis; Guy Cochrane; Robert D Finn
Journal: Nucleic Acids Res Date: 2018-01-04 Impact factor: 16.971

10. gcMeta: a Global Catalogue of Metagenomics platform to support the archiving, standardization and analysis of microbiome data.

Authors: Wenyu Shi; Heyuan Qi; Qinglan Sun; Guomei Fan; Shuangjiang Liu; Jun Wang; Baoli Zhu; Hongwei Liu; Fangqing Zhao; Xiaochen Wang; Xiaoxuan Hu; Wei Li; Jia Liu; Ye Tian; Linhuan Wu; Juncai Ma
Journal: Nucleic Acids Res Date: 2019-01-08 Impact factor: 16.971

30 in total

1. Bottom-Up Community Proteome Analysis of Saliva Samples and Tongue Swabs by Data-Dependent Acquisition Nano LC-MS/MS Mass Spectrometry.

Authors: Alexander Rabe; Manuela Gesell Salazar; Uwe Völker
Journal: Methods Mol Biol Date: 2021

2. Enlightening the taxonomy darkness of human gut microbiomes with a cultured biobank.

Authors: Chang Liu; Meng-Xuan Du; Rexiding Abuduaini; Hai-Ying Yu; Dan-Hua Li; Yu-Jing Wang; Nan Zhou; Min-Zhi Jiang; Peng-Xia Niu; Shan-Shan Han; Hong-He Chen; Wen-Yu Shi; Linhuan Wu; Yu-Hua Xin; Juncai Ma; Yuguang Zhou; Cheng-Ying Jiang; Hong-Wei Liu; Shuang-Jiang Liu
Journal: Microbiome Date: 2021-05-21 Impact factor: 16.837

3. Metagenomic Analysis of Common Intestinal Diseases Reveals Relationships among Microbial Signatures and Powers Multidisease Diagnostic Models.

Authors: Puzi Jiang; Sicheng Wu; Qibin Luo; Xing-Ming Zhao; Wei-Hua Chen
Journal: mSystems Date: 2021-05-04 Impact factor: 6.496

Review 4. The food-gut axis: lactic acid bacteria and their link to food, the gut microbiome and human health.

Authors: Francesca De Filippis; Edoardo Pasolli; Danilo Ercolini
Journal: FEMS Microbiol Rev Date: 2020-07-01 Impact factor: 16.408

5. mAML: an automated machine learning pipeline with a microbiome repository for human disease classification.

Authors: Fenglong Yang; Quan Zou
Journal: Database (Oxford) Date: 2020-01-01 Impact factor: 3.451

6. Microbiome Search Engine 2: a Platform for Taxonomic and Functional Search of Global Microbiomes on the Whole-Microbiome Level.

Authors: Gongchao Jing; Lu Liu; Zengbin Wang; Yufeng Zhang; Li Qian; Chunxiao Gao; Meng Zhang; Min Li; Zhenkun Zhang; Xiaohan Liu; Jian Xu; Xiaoquan Su
Journal: mSystems Date: 2021-01-19 Impact factor: 6.496