Literature DB >> 35265838

OMARU: a robust and multifaceted pipeline for metagenome-wide association study.

Toshihiro Kishikawa1, Yoshihiko Tomofuji1, Hidenori Inohara2, Yukinori Okada1.   

Abstract

Microbiome is an essential omics layer to elucidate disease pathophysiology. However, we face a challenge of low reproducibility in microbiome studies, partly due to a lack of standard analytical pipelines. Here, we developed OMARU (Omnibus metagenome-wide association study with robustness), a new end-to-end analysis workflow that covers a wide range of microbiome analysis from phylogenetic and functional profiling to case-control metagenome-wide association studies (MWAS). OMARU rigorously controls the statistical significance of the analysis results, including correction of hidden confounding factors and application of multiple testing comparisons. Furthermore, OMARU can evaluate pathway-level links between the metagenome and the germline genome-wide association study (i.e. MWAS-GWAS pathway interaction), as well as links between taxa and genes in the metagenome. OMARU is publicly available (https://github.com/toshi-kishikawa/OMARU), with a flexible workflow that can be customized by users. We applied OMARU to publicly available type 2 diabetes (T2D) and schizophrenia (SCZ) metagenomic data (n = 171 and 344, respectively), identifying disease biomarkers through comprehensive, multilateral, and unbiased case-control comparisons of metagenome (e.g. increased Streptococcus vestibularis in SCZ and disrupted diversity in T2D). OMARU improves accessibility and reproducibility in the microbiome research community. Robust and multifaceted results of OMARU reflect the dynamics of the microbiome authentically relevant to disease pathophysiology.
© The Author(s) 2022. Published by Oxford University Press on behalf of NAR Genomics and Bioinformatics.

Entities:  

Year:  2022        PMID: 35265838      PMCID: PMC8900191          DOI: 10.1093/nargab/lqac019

Source DB:  PubMed          Journal:  NAR Genom Bioinform        ISSN: 2631-9268


INTRODUCTION

Microbiome is one of the major research areas in human diseases towards implementation of personalized medicine based on multi-layer omics data. Recent interests are on multidimensional integration of metagenome data with other omics layers such as host genome and metabolome, as well as deep analysis within the single metagenomic layer (1,2). Analytical approaches of microbiome are shifting from amplicon sequencing of 16S ribosomal RNA genes to whole-genome shotgun sequencing. However, we face a challenge of low reproducibility in findings of microbiome studies. Differences in physiological variables and lifestyles of the samples also have been reported as a factor yielding this problem (1,3). In addition, we still lack a gold standard analytical pipeline which can overcome the problem of low reproducibility (3,4). Here, we introduce OMARU (mnibus etagenome-wide ssociation study [MWAS] with obstness), a new end-to-end metagenome analysis workflow (Figure 1). Through implementation of rigorous quality control (QC) of shotgun sequence reads, samples, clades, and genes, OMARU constructs phylogenetic and functional profiling of the metagenome, the two main analytical pipelines. Three major components of the case–control association tests of MWAS (i.e. phylogenetic, gene, and biological pathway analyses) are subsequently conducted with rigorous handling of false positives in statistical analysis (5–7). In addition to solving the low reproducibility of metagenomic study, OMARU provides integrative analyses. As an example, OMARU can evaluate pathway-level links between the metagenome and the germline genome-wide association studies (GWAS) of the host genome. Furthermore, OMARU identifies the links between taxa and genes in the metagenome utilizing the results of phylogenetic and gene analyses. OMARU also visualizes attractive figures which enable comprehensive summary of the association test results. The referenced databases, which substantially affect the analytic results, is currently being rapidly expanded (8,9). OMARU is a flexible and extensible workflow that can be customized, such as adding an up-to-date database.
Figure 1.

OMARU workflow and details as bioinformatics pipelines for the metagenome-wide association study. OMARU workflow. Using shotgun sequencing data, metagenome-wide association studies (phylogenetic, gene and pathway analyses) and additional analyses are performed, including comparing pathway analyses between genome-wide association studies (GWAS) and metagenome-wide association study (MWAS).

OMARU workflow and details as bioinformatics pipelines for the metagenome-wide association study. OMARU workflow. Using shotgun sequencing data, metagenome-wide association studies (phylogenetic, gene and pathway analyses) and additional analyses are performed, including comparing pathway analyses between genome-wide association studies (GWAS) and metagenome-wide association study (MWAS).

MATERIALS AND METHODS

Quality control

OMARU handles the shotgun sequencing data in the FASTQ format as input (currently, 16S rRNA data is not supported). QC of the sequencing reads is applied to maximize the quality of datasets as follows: (i) trimming of low-quality bases using Trimmomatic (10), (ii) identification and masking of human reads using bowtie2 (11) and BMTagger (12) and (iii) removal of duplicated reads using PRINSEQ-lite (13). As for QC of samples, there exist three factors for selecting samples to be excluded as follows; (i) overall quality of sequencing reads, (ii) status of phylogenetic abundance and mapping rates, (iii) status of contigs and open reading frames (ORFs) in assembly-based approach and mapping rates in mapping-based approach, and (iv) principal component analyses (PCA) in the phylogenetic data and gene abundance data. OMARU sequentially outputs graphical figures and tables representing statistical matrixes of each procedure, helping users select samples to be excluded at each step (Figures 1 and 2A). Clades and genes detected in less than the pre-defined threshold of the samples (e.g. 20%), or in no sample in either cases or controls, are removed. Besides, clades with an average relative abundance less than the pre-defined threshold of total abundance are removed (default: 0.001%).
Figure 2.

MWAS results of QC and phylogenetic analysis. (A) Principal component analysis (PCA) in phylogenetic and gene abundance of the schizophrenia (SCZ) data. The green dots represent the excluded sample as a result of quality control. (B) Quantile-quantile plots of the phylogenetic MWAS P-values of the clades in the SCZ data. The x-axis indicates log-transformed empirically estimated median P. The y-axis indicates observed –log10(P). The diagonal dashed line represents y = x, which corresponds to the null hypothesis. The left and right figures show the results of including 0 principal component (PC) and 30 PCs as covariates, respectively, which indicates that PCs suppress the inflation of P-values. (C) A histogram of minimum P-values in the phenotype permutation procedure in the SCZ data. Vertical lines of red and purple indicate an empirical Bonferroni significance threshold at a significance level of 0.05 and a standard Bonferroni significance threshold in multiple comparison procedure (0.05/692 = 7.23 × 10–5), respectively. (D) A phylogenetic tree. Levels L2–L7 are from the inside layer to the outside layer in the SCZ data. The size and the color of the dots represent relative abundance and effect sizes, respectively. The three clades with significant case–control associations (false discovery rate < 0.05) are outlined in red.

MWAS results of QC and phylogenetic analysis. (A) Principal component analysis (PCA) in phylogenetic and gene abundance of the schizophrenia (SCZ) data. The green dots represent the excluded sample as a result of quality control. (B) Quantile-quantile plots of the phylogenetic MWAS P-values of the clades in the SCZ data. The x-axis indicates log-transformed empirically estimated median P. The y-axis indicates observed –log10(P). The diagonal dashed line represents y = x, which corresponds to the null hypothesis. The left and right figures show the results of including 0 principal component (PC) and 30 PCs as covariates, respectively, which indicates that PCs suppress the inflation of P-values. (C) A histogram of minimum P-values in the phenotype permutation procedure in the SCZ data. Vertical lines of red and purple indicate an empirical Bonferroni significance threshold at a significance level of 0.05 and a standard Bonferroni significance threshold in multiple comparison procedure (0.05/692 = 7.23 × 10–5), respectively. (D) A phylogenetic tree. Levels L2–L7 are from the inside layer to the outside layer in the SCZ data. The size and the color of the dots represent relative abundance and effect sizes, respectively. The three clades with significant case–control associations (false discovery rate < 0.05) are outlined in red.

Case-control association test for phylogenetic data

OMARU adopts a mapping-based approach to utilize the advantages of paired-end reads and reduce mapping errors. Users can flexibly customize the reference data in a FASTA format to the appropriate one: Default is modified DNA sequences of the Unified Human Gastrointestinal Genome (8). After read-mapping using bowtie2 (11), relative abundance of each clade is quantified for each sample up to the six taxonomic levels (L2: phyla, L3: classes, L4: orders, L5: families, L6: genera and L7: species). Subsequently, the relative abundance profiles are normalized using log transformation. Case-control association tests are performed using the lm function implemented in the R statistical software. Users can incorporate covariates for adjustment, such as sex and age. OMARU generally requires a sufficient number of principal components as covariates to robustly adjust the effect of hidden confounding factors and suppress P-value inflation (Figure 2B). Empirical null distributions of the minimum P-values (= Pmin) are calculated based on a phenotype permutation procedure (× 10,000 iterations) to control the type I error rates (14). The empirical Bonferroni significance threshold is defined at a significance level of 0.05, as the 95th percentile of Pmin (= Psig). The 95% confidence interval for Pmin is calculated by a bootstrapping method of the Harrell-Davis distribution-free quantile estimator (Figure 2C). In addition to the standard figures to visualize distribution of statistics such as quantile-quantile and volcano plots (Supplementary Figure S1), OMARU illustrates a phylogenetic tree indicating the case–control association results of multilayered taxonomic levels (Figure 2D).

Case–control association test for functional data (gene and pathways)

Gene abundance data of metagenome are constructed by the assembly-based approach as follows; (i) de novo assembly of the sequencing reads into contigs using MEGAHIT (15), (ii) prediction of open reading frames (ORFs) on the contigs with Prokka (16), (iii) alignment of ORF against an appropriate database (default: UniRef90 (17)) with DIAMOND (18) and (iv) quantification of gene abundance by mapping the sequencing reads to the assembled contigs using bowtie2 (11). Normalization of gene abundance is conducted by the two steps. First, the ORF abundance is defined as the depth of each ORF’s region of the ORF catalog according to the mapping result to avoid the bias of the gene lengths. Second, the gene abundance is adjusted by the sum of the ORF abundance for each sample to correct potential bias of heterogeneity in the total amount of sequence reads among the samples. Next, a rank-based inverse normal transformation is applied to correct the heterogeneity of each gene's abundance and distribution. Association tests are in the same way as phylogenetic analysis, including covariates and empirical threshold (Figure 3A).
Figure 3.

MWAS results of functional analysis. (A) Results of functional association tests in schizophrenia (SCZ) and type 2 diabetes (T2D). Left figures are quantile-quantile plots of the P-values in the gene association tests. The x-axes indicate empirically estimated median -log10(P). The y-axes indicate observed -log10(P). The diagonal grey lines represent y = x, which correspond to the null hypothesis. The horizontal red lines indicate the empirical Bonferroni-corrected threshold (α = 0.05), and the brown line indicates the empirically estimated FDR threshold (FDR = 0.05). Center figures are volcano plots. The x-axes indicate effect sizes in linear regression. The y-axes, horizontal lines, and dot colors are the same as in the left quantile-quantile plots. Right figures are quantile-quantile plots of the P-values in the pathway association tests. Genes and pathways with P-values less than the Bonferroni threshold are plotted as red dots. Genes and pathways with FDR less than 0.05 are plotted as brown dots, and others are plotted as black dots. FDR; false discovery rate. (B) Links in the metagenome data between taxa and Vpar_1847, one of the schizophrenia-associated genes. Stacked bar graphs indicate the species of origin for each gene and their percentage, divided into cases and controls. The parentheses in each title represent the organism registered as the origin of the genes in the database. (C) Comparison of P-values of GO analyses between the SCZ MWAS and GWAS data. The x-axis indicates the P-value in the SCZ GWAS data. The y-axis indicates the P-value in the SCZ MWAS data. The horizontal and vertical black lines indicate P of 0.05. The overlap of the GO enrichment was evaluated by classifying the GO terms based on the significance threshold of P < 0.05 or P ≥ 0.05 and using Fisher's exact test.

MWAS results of functional analysis. (A) Results of functional association tests in schizophrenia (SCZ) and type 2 diabetes (T2D). Left figures are quantile-quantile plots of the P-values in the gene association tests. The x-axes indicate empirically estimated median -log10(P). The y-axes indicate observed -log10(P). The diagonal grey lines represent y = x, which correspond to the null hypothesis. The horizontal red lines indicate the empirical Bonferroni-corrected threshold (α = 0.05), and the brown line indicates the empirically estimated FDR threshold (FDR = 0.05). Center figures are volcano plots. The x-axes indicate effect sizes in linear regression. The y-axes, horizontal lines, and dot colors are the same as in the left quantile-quantile plots. Right figures are quantile-quantile plots of the P-values in the pathway association tests. Genes and pathways with P-values less than the Bonferroni threshold are plotted as red dots. Genes and pathways with FDR less than 0.05 are plotted as brown dots, and others are plotted as black dots. FDR; false discovery rate. (B) Links in the metagenome data between taxa and Vpar_1847, one of the schizophrenia-associated genes. Stacked bar graphs indicate the species of origin for each gene and their percentage, divided into cases and controls. The parentheses in each title represent the organism registered as the origin of the genes in the database. (C) Comparison of P-values of GO analyses between the SCZ MWAS and GWAS data. The x-axis indicates the P-value in the SCZ GWAS data. The y-axis indicates the P-value in the SCZ MWAS data. The horizontal and vertical black lines indicate P of 0.05. The overlap of the GO enrichment was evaluated by classifying the GO terms based on the significance threshold of P < 0.05 or P ≥ 0.05 and using Fisher's exact test. As for the pathway analysis, OMARU adopts a gene set enrichment analysis using the ranking of the genes by z-values in case–control gene association tests. The pathway database could be flexibly customized (Default is Gene Ontology (19)).

Links between the microbe MWAS and the germline GWAS of host

OMARU identifies disease-specific biological pathway links between the microbe MWAS and the germline GWAS of host (5–7). The result of pathway analysis using summary statistics of GWAS for the target disease is required as input. OMARU evaluates the overlap between the MWAS and GWAS in the pathway enrichment by Fisher's exact test, based on the classification of pathways with P-value threshold of 0.05 (Figure 3C).

Links between taxa and genes in the metagenome

Organisms of origin for each gene are an important factor to understand microbiome biology. While gene databases such as Kyoto Encyclopedia of Genes and Genomes (KEGG) (20) and UniProt(21) collect organisms of origin, such information are based on the specific link between the registered gene and organisms, and may not reflect the real link in the target metagenomic sample. By tracing back to the level of sequencing reads, OMARU can directly estimate organisms of the origin for each gene in the target data (Figure 3B, Supplementary Figure S2).

Case-control difference between α-diversity and β-diversity of the metagenome

For calculating diversities, all samples should be down-sampled at the appropriate same number of reads. OMARU calculates α-diversity (within-sample diversity) as a Shannon index based on the gene abundance and the six levels of phylogenetic relative abundance (L2–L7) for each sample. Case–control comparison are performed with pre-defined covariates and the effect size of disease state is evaluated. To evaluate β-diversity, multidimensional scaling (MDS) on the Bray-Curtis dissimilarity is used. For evaluating case–control differences in the dissimilarity, OMARU performs permutational multivariate analysis of variance (PERMANOVA) (22) using the adonis() function in R package vegan.

RESULTS

We adopted the two public fecal metagenomic data of schizophrenia (SCZ; 90 SCZ patients and 81 healthy controls) and type 2 diabetes (T2D; 170 T2D patients and 174 healthy controls) for a practical example of operation of OMARU (23,24). In sample QC of the SCZ data, we excluded one SCZ sample that had singleton genes beyond four standard deviations and was an outlier of both phylogenetic and gene abundance data (Figure 2A). We used a phylogenetic reference, which was constructed by integrating those registered by Nishijima et al. (25) and those newly identified from the human gut bacteria projects (9,26,27), as previously described (5,6). We had 692 clades for the SCZ case–control association test, including 10 phyla (L2), 23 classes (L3), 34 orders (L4), 69 families (L5), 156 genera (L6) and 400 species (L7). We adopted sex, age, body mass index (BMI) and the top 30 principal components as covariates. In multiple test correction, empirically estimated Bonferroni threshold was lower than the standard Bonferroni threshold (Figure 2C). It could reflect that microbiome composition within an individual was not independent between clades. We identified the three clades significantly increased in SCZ (FDR < 0.05; Figure 2D, Supplementary Figure S3, Supplementary Table S1). We had 789 clades for the T2D case–control association test and identified no clades with significant association. In both diseases, the numbers of disease-associated clades were considerably lower than those in the reference papers and other metagenome studies(23,24). Correction of hidden confounding factors mainly led to this result. The quantile-quantile plots of P-values in the SCZ data showed that the analysis without adopting no PCs as covariates demonstrated severe inflation of P-values and a large number of false positives (Figure 2B). Streptococcus vestibularis, one of the three SCZ-associated clades identified by OMARU, was reported to induce deficits in social behavior and alter neurotransmitter levels in peripheral tissues in recipient mice(23). Thus, OMARU is featured by its ability to specify robustly disease-associated clades by optimally adjusting confounding factors. We selected KEGG database (20) as references of gene and biological pathway. After gene-level QC, we retained 185 663 and 104 487 genes for SCZ and T2D case–control comparison, respectively. In functional association tests, we obtained results with suppression of the inflation of P-values by adjusting covariates in the same way as the phylogenetic analyses. We identified four SCZ-associated genes, four SCZ-associated pathways, and four T2D-associated pathways (FDR < 0.05; Figure 3A, Supplementary Table S2 and S3). In the analysis of link between phylogenetic and gene data, Vpar_1847, one of the four SCZ-associated genes, was estimated to be derived from multiple Veillonella spp. (Figure 3B, Supplementary Figure S2). These clades were not significantly associated in our phylogenetic analyses, while their increase in SCZ was highlighted in the referenced paper (23). The cross-sectional assessment of OMARU could suggest that this gene may be an essential factor in the effect on the SCZ pathogenesis of Veillonella spp. As for the MWAS–GWAS interaction, we used PASCAL with summary statistics from the SCZ GWAS (22,778 cases and 35 362 controls) (28) and the T2D GWAS (77 418 cases and 433 5440 controls) (29) in order to determine GO term enrichment of the human genome. We compared the P-values of the each GO term shared between the metagenome data and GWAS data. We found significant overlaps between the pathways enriched in the MWAS and GWAS (PFisher = 0.011 and 0.008 in SCZ and T2D, respectively; Figure 3C). Our results suggested that there was disease-specific links between human genome and metagenome, namely MWAS- GWAS interaction, in the pathology of SCZ and T2D. We performed case–control comparison of α-diversity and β-diversity in the phylogenetic data (L2–L7) and the gene abundance data based on KEGG database. In SCZ, no significant differences of α-diversity in the phylogenetic data (P > 0.05/6 = 0.0083) and the gene abundance data (P = 0.134) were observed, and neither was β-diversity (Supplementary Table S4). In T2D, α-diversity in the taxonomic level of L3 and L4 (P < 8.3 × 10–3) and the gene abundance data (P = 5.1 × 10–3) significantly increased, while significant differences of β-diversity in the taxonomic level of L5–L7 (P < 8.3 × 10–3) and the gene abundance data (P < 1.0 × 10–4) were observed (Supplementary Figure S4, Supplementary Table S4).

DISCUSSION

While several bioinformatic tools for microbiome has been developed recently (30–35), OMARU has a unique characteristic as highlighted in case–control MWAS analysis using shotgun sequencing data. In contrast to several existing tools which are limited to a single part of the analysis, such as phylogenetic or functional analysis, OMARU provides end-to-end analysis from the processing of sequencing data, such as QC of reads and samples, to the three major analyses and the assessment of diversities. It should be meaningful to perform those analyses in a single pipeline with integrative assessments of the results of each part of the analysis, providing deep interpretation of case–control differences in the microbiome. Further, evaluation of links between the metagenome and host genome is one of the novel features of OMARU. We demonstrated that OMARU yields robust and multifaceted results by using public metagenome data. OMARU identified a sample in the SCZ data to be excluded. It's quite difficult to perform sample QC manually in metagenome analyses and comprehensive decision based on multiple assessments is required. OMARU can provide users with multifaceted data to help them make the decision. By statistical processing in OMARU including reduction of false positives, SCZ-associated clades were narrowed down to the clade with functional support, which demonstrates the robustness of OMARU in identifying crucial biomarkers. While hidden confounding factors would better to be adjusted by integration of the covariates into a case–control model, it is not currently implemented in OMARU and thus considered to be one of the limitations. In addition, integrative analyses with multifaceted evaluation, such as the MWAS-GWAS interaction and the links between disease-associated genes and clades, provided a comprehensive understanding of the microbiome-associated pathology. The metagenome of SCZ had little difference of diversities while T2D had significant ones compared to healthy controls. Diversity analysis provides evidence of microbiome's role in disease pathology from a different aspect than other analyses. We note that the metagenome analysis is still highly dependent on reference databases and database development is a challenge for the future. In conclusion, OMARU, as a well-organized and user-friendly workflow, can improve the accessibility and reproducibility of MWAS in the microbiome research community. Robust and multifaceted results of OMARU, including the association with the host genome, reflect the dynamics of the microbiome authentically relevant to disease pathophysiology, leading to the identification of potential biomarkers.

DATA AVAILABILITY

OMARU is publicly available at https://github.com/toshi-kishikawa/OMARU and can be downloaded in the format of a Conda package. Click here for additional data file.
  33 in total

1.  A metagenome-wide association study of gut microbiota in type 2 diabetes.

Authors:  Junjie Qin; Yingrui Li; Zhiming Cai; Shenghui Li; Jianfeng Zhu; Fan Zhang; Suisha Liang; Wenwei Zhang; Yuanlin Guan; Dongqian Shen; Yangqing Peng; Dongya Zhang; Zhuye Jie; Wenxian Wu; Youwen Qin; Wenbin Xue; Junhua Li; Lingchuan Han; Donghui Lu; Peixian Wu; Yali Dai; Xiaojuan Sun; Zesong Li; Aifa Tang; Shilong Zhong; Xiaoping Li; Weineng Chen; Ran Xu; Mingbang Wang; Qiang Feng; Meihua Gong; Jing Yu; Yanyan Zhang; Ming Zhang; Torben Hansen; Gaston Sanchez; Jeroen Raes; Gwen Falony; Shujiro Okuda; Mathieu Almeida; Emmanuelle LeChatelier; Pierre Renault; Nicolas Pons; Jean-Michel Batto; Zhaoxi Zhang; Hua Chen; Ruifu Yang; Weimou Zheng; Songgang Li; Huanming Yang; Jian Wang; S Dusko Ehrlich; Rasmus Nielsen; Oluf Pedersen; Karsten Kristiansen; Jun Wang
Journal:  Nature       Date:  2012-09-26       Impact factor: 49.962

2.  Large-scale association analyses identify host factors influencing human gut microbiome composition.

Authors:  Alexander Kurilshikov; Carolina Medina-Gomez; Rodrigo Bacigalupe; Djawad Radjabzadeh; Jun Wang; Ayse Demirkan; Caroline I Le Roy; Juan Antonio Raygoza Garay; Casey T Finnicum; Xingrong Liu; Daria V Zhernakova; Marc Jan Bonder; Tue H Hansen; Fabian Frost; Malte C Rühlemann; Williams Turpin; Jee-Young Moon; Han-Na Kim; Kreete Lüll; Elad Barkan; Shiraz A Shah; Myriam Fornage; Joanna Szopinska-Tokov; Zachary D Wallen; Dmitrii Borisevich; Lars Agreus; Anna Andreasson; Corinna Bang; Larbi Bedrani; Jordana T Bell; Hans Bisgaard; Michael Boehnke; Dorret I Boomsma; Robert D Burk; Annique Claringbould; Kenneth Croitoru; Gareth E Davies; Cornelia M van Duijn; Liesbeth Duijts; Gwen Falony; Jingyuan Fu; Adriaan van der Graaf; Torben Hansen; Georg Homuth; David A Hughes; Richard G Ijzerman; Matthew A Jackson; Vincent W V Jaddoe; Marie Joossens; Torben Jørgensen; Daniel Keszthelyi; Rob Knight; Markku Laakso; Matthias Laudes; Lenore J Launer; Wolfgang Lieb; Aldons J Lusis; Ad A M Masclee; Henriette A Moll; Zlatan Mujagic; Qi Qibin; Daphna Rothschild; Hocheol Shin; Søren J Sørensen; Claire J Steves; Jonathan Thorsen; Nicholas J Timpson; Raul Y Tito; Sara Vieira-Silva; Uwe Völker; Henry Völzke; Urmo Võsa; Kaitlin H Wade; Susanna Walter; Kyoko Watanabe; Stefan Weiss; Frank U Weiss; Omer Weissbrod; Harm-Jan Westra; Gonneke Willemsen; Haydeh Payami; Daisy M A E Jonkers; Alejandro Arias Vasquez; Eco J C de Geus; Katie A Meyer; Jakob Stokholm; Eran Segal; Elin Org; Cisca Wijmenga; Hyung-Lae Kim; Robert C Kaplan; Tim D Spector; Andre G Uitterlinden; Fernando Rivadeneira; Andre Franke; Markus M Lerch; Lude Franke; Serena Sanna; Mauro D'Amato; Oluf Pedersen; Andrew D Paterson; Robert Kraaij; Jeroen Raes; Alexandra Zhernakova
Journal:  Nat Genet       Date:  2021-01-18       Impact factor: 41.307

3.  Machine Learning Meta-analysis of Large Metagenomic Datasets: Tools and Biological Insights.

Authors:  Edoardo Pasolli; Duy Tin Truong; Faizan Malik; Levi Waldron; Nicola Segata
Journal:  PLoS Comput Biol       Date:  2016-07-11       Impact factor: 4.475

4.  Empirical estimation of genome-wide significance thresholds based on the 1000 Genomes Project data set.

Authors:  Masahiro Kanai; Toshihiro Tanaka; Yukinori Okada
Journal:  J Hum Genet       Date:  2016-06-16       Impact factor: 3.172

5.  UniProt: a worldwide hub of protein knowledge.

Authors: 
Journal:  Nucleic Acids Res       Date:  2019-01-08       Impact factor: 16.971

6.  A human gut bacterial genome and culture collection for improved metagenomic analyses.

Authors:  Samuel C Forster; Nitin Kumar; Blessing O Anonye; Alexandre Almeida; Elisa Viciani; Mark D Stares; Matthew Dunn; Tapoka T Mkandawire; Ana Zhu; Yan Shao; Lindsay J Pike; Thomas Louie; Hilary P Browne; Alex L Mitchell; B Anne Neville; Robert D Finn; Trevor D Lawley
Journal:  Nat Biotechnol       Date:  2019-02-04       Impact factor: 54.908

7.  Metagenome-wide association study of gut microbiome revealed novel aetiology of rheumatoid arthritis in the Japanese population.

Authors:  Toshihiro Kishikawa; Yuichi Maeda; Takuro Nii; Daisuke Motooka; Yuki Matsumoto; Masato Matsushita; Hidetoshi Matsuoka; Maiko Yoshimura; Shoji Kawada; Satoru Teshigawara; Eri Oguro; Yasutaka Okita; Keisuke Kawamoto; Shinji Higa; Toru Hirano; Masashi Narazaki; Atsushi Ogata; Yukihiko Saeki; Shota Nakamura; Hidenori Inohara; Atsushi Kumanogoh; Kiyoshi Takeda; Yukinori Okada
Journal:  Ann Rheum Dis       Date:  2019-11-07       Impact factor: 19.103

8.  Microbiome meta-analysis and cross-disease comparison enabled by the SIAMCAT machine learning toolbox.

Authors:  Jakob Wirbel; Konrad Zych; Morgan Essex; Nicolai Karcher; Ece Kartal; Guillem Salazar; Peer Bork; Shinichi Sunagawa; Georg Zeller
Journal:  Genome Biol       Date:  2021-03-30       Impact factor: 13.583

9.  A Metagenome-Wide Association Study of Gut Microbiome in Patients With Multiple Sclerosis Revealed Novel Disease Pathology.

Authors:  Toshihiro Kishikawa; Kotaro Ogawa; Daisuke Motooka; Akiko Hosokawa; Makoto Kinoshita; Ken Suzuki; Kenichi Yamamoto; Tatsuo Masuda; Yuki Matsumoto; Takuro Nii; Yuichi Maeda; Shota Nakamura; Hidenori Inohara; Hideki Mochizuki; Tatsusada Okuno; Yukinori Okada
Journal:  Front Cell Infect Microbiol       Date:  2020-12-11       Impact factor: 5.293

10.  MetaLAFFA: a flexible, end-to-end, distributed computing-compatible metagenomic functional annotation pipeline.

Authors:  Alexander Eng; Adrian J Verster; Elhanan Borenstein
Journal:  BMC Bioinformatics       Date:  2020-10-21       Impact factor: 3.169

View more
  1 in total

Review 1.  Combination of Whole Genome Sequencing and Metagenomics for Microbiological Diagnostics.

Authors:  Srinithi Purushothaman; Marco Meola; Adrian Egli
Journal:  Int J Mol Sci       Date:  2022-08-30       Impact factor: 6.208

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.