| Literature DB >> 33713107 |
Jiwoong Kim1, Shuang Jiang2, Yiqing Wang2, Guanghua Xiao1,3,4, Yang Xie1,3,4, Dajiang J Liu5, Qiwei Li6, Andrew Koh3,7,8, Xiaowei Zhan1,3,9.
Abstract
In microbiome research, metagenomic sequencing generates enormous amounts of data. These data are typically classified into taxa for taxonomy analysis, or into genes for functional analysis. However, a joint analysis where the reads are classified into taxa-specific genes is often overlooked. To enable the analysis of this biologically meaningful feature, we developed a novel bioinformatic toolkit, MetaPrism, which can analyze sequence reads for a set of joint taxa/gene analyses to: 1) classify sequence reads and estimate the abundances for taxa-specific genes; 2) tabularize and visualize taxa-specific gene abundances; 3) compare the abundances between groups; and 4) build prediction models for clinical outcome. We illustrated these functions using a published microbiome metagenomics dataset from patients treated with immune checkpoint inhibitor therapy and showed the joint features can serve as potential biomarkers to predict therapeutic responses. MetaPrism is a toolkit for joint taxa and gene analysis. It offers biological insights on the taxa-specific genes on top of the taxa-alone or gene-alone analysis. MetaPrism is open-source software and freely available at https://github.com/jiwoongbio/MetaPrism. The example script to reproduce the manuscript is also provided in the above code repository.Entities:
Keywords: joint analysis; metagenomics sequence analysis; microbiome biomarker
Year: 2021 PMID: 33713107 PMCID: PMC8049424 DOI: 10.1093/g3journal/jkab046
Source DB: PubMed Journal: G3 (Bethesda) ISSN: 2160-1836 Impact factor: 3.154
Figure 1.A schematic illustration of the algorithm and the functions in MetaPrism. A) Illustration of the MetaPrism algorithm to infer taxa-specific gene abundances. While function profiling infers that three reads are mapped to a gene, it cannot provide further taxonomic information. Through joint profiling, MetaPrism can utilize de novo assembled contigs to estimate taxa-specific features: two gene copies are from species A and one copy is from species B; B) An overview of the joint analysis workflow in MetaPrism. The hexagon shapes represent implemented functions in MetaPrism.
Figure 2.Comparison of gene abundances reported by FMAP and MetaPrism. We used simulations to compare the estimated gene abundances using FMAP and MetaPrism. The Pearson correlation coefficients between true abundances and the software-estimated abundances were listed on the bottom right.
Figure 3.Heatmap of joint features for predicting immune checkpoint therapy response. We used MetaPrism_heatmap.pl to visualize four joint features (taxa-specific gene abundances, with variable importance values greater than 50%) in the immune checkpoint therapy study. The colors from red to green represent the increased gene abundances, the mean depth normalized by the contig lengths. P10, P14, P23, P25, P34, and P39 are patients who respond to the therapy; P8, P16, P24, P30, P32, and P42 are patients having progressive outcomes. K00826, branched-chain amino acid aminotransferase; K03205, type IV secretion system protein VirD4; K01006, pyruvate, orthophosphate dikinase; K06187, recombination protein RecR.
Prediction models and performances for taxonomical analysis, functional analysis, and joint analysis. We tabularized the details of prediction models used in three types of analyses and their prediction performances.
| Taxonomic profiling | Functional profiling | Joint profiling | |
|---|---|---|---|
|
| Random forest | Random forest | Random forest |
|
| 500 | 500 | 500 |
|
| 1,048 | 5,227 | 62,086 |
|
| |||
| | Chondromyces (100) | K07705 (100) | K00826 |
| | Roseateles (65) | – | K03205 |
| | – | – | K01006 |
| | – | – | K06187 |
|
| 53.8% | 61.5% | 69.2% |
: The variable importance values are listed in parentheses.
: Prediction accuracy was evaluated using leave-one-out cross-validations.