| Literature DB >> 30864326 |
Abstract
The microbiome research is going through an evolutionary transition from focusing on the characterization of reference microbiomes associated with different environments/hosts to the translational applications, including using microbiome for disease diagnosis, improving the effcacy of cancer treatments, and prevention of diseases (e.g., using probiotics). Microbial markers have been identified from microbiome data derived from cohorts of patients with different diseases, treatment responsiveness, etc, and often predictors based on these markers were built for predicting host phenotype given a microbiome dataset (e.g., to predict if a person has type 2 diabetes given his or her microbiome data). Unfortunately, these microbial markers and predictors are often not published so are not reusable by others. In this paper, we report the curation of a repository of microbial marker genes and predictors built from these markers for microbiome-based prediction of host phenotype, and a computational pipeline called Mi2P (from Microbiome to Phenotype) for using the repository. As an initial effort, we focus on microbial marker genes related to two diseases, type 2 diabetes and liver cirrhosis, and immunotherapy efficacy for two types of cancer, non-small-cell lung cancer (NSCLC) and renal cell carcinoma (RCC). We characterized the marker genes from metagenomic data using our recently developed subtractive assembly approach. We showed that predictors built from these microbial marker genes can provide fast and reasonably accurate prediction of host phenotype given microbiome data. As understanding and making use of microbiome data (our second genome) is becoming vital as we move forward in this age of precision health and precision medicine, we believe that such a repository will be useful for enabling translational applications of microbiome data.Entities:
Mesh:
Substances:
Year: 2019 PMID: 30864326 PMCID: PMC6417824
Source DB: PubMed Journal: Pac Symp Biocomput ISSN: 2335-6928
Summary of the microbiome datasets for training the predictors.
| Abr. | Disease | Reference | # of samples | Total base pairs |
|---|---|---|---|---|
| T2D | Type 2 diabetes | [ | 93 | 225 GB |
| Cirrhosis | Liver cirrhosis | [ | 181 | 817 GB |
| NSCLC | Non-small-cell lung cancer | [ | 65 | 153 GB |
| RCC | Renal cell carcinoma | [ | 62 | 147 GB |
Fig. 1:Schematic representations of the model curation based on CoSA (a) and Mi2P (Microbiome to Phenotype) pipeline (b).
Fig. 2:Receiver operating characteristic (ROC) plots of the liver cirrhosis predictors using different ML approaches. We also tested two feature selection methods: tree-based feature selection and L1-based feature selection, and the results are shown in (a) and (b), respectively. The ROC curves were averaged over five cross validation results.
Accuracy of microbiome-based predictors for liver cirrhosis.
| methods | # of marker genes | SVM | RF (100 trees) | NN | KN | |
|---|---|---|---|---|---|---|
| Cross[ | Qin et al. | 15[ | 0.84[ | N/A | N/A | N/A |
| Our approach | 46 | 0.92 | 0.92 | 0.88 | 0.71 | |
| Validation[ | Qin et al. | 15[ | 0.84[ | N/A | N/A | N/A |
| Our approach | 46 | 0.83 | 0.93 | 0.81 | 0.72 | |
the “cross” columns show the leave-one-out validation result (see Figure 2 (a) for 5 fold cross-validation results).
validation using microbiome data unseen in the training of the predictor.
numbers taken from the paper.[7]
Accuracy of microbiome-based prediction of responders versus non-responders to cancer treatment using RF (with 10, 100, and 1000 trees), DT and NN approaches.
| Cancer type | # of marker genes | RF | DT | NN | ||
|---|---|---|---|---|---|---|
| 10 | 100 | 1000 | mean AUC | mean AUC | ||
| NSCLC | 116 | 0.86 | 0.91 | 0.89 | 0.72 | 0.81 |
| RCC | 85 | 0.84 | 0.83 | 0.81 | 0.79 | 0.78 |
Examples of microbial marker genes for liver cirrhosis prediction.
| Depleted in liver cirrhosis microbiome | ||
| H_k99_23554_31_534_− | Tripartite ATP-independent periplasmic transporters | DctQ |
| H_k99_23763_1365_1613_− | Helix-turn-helix domain | HTH_31 |
| H_k99_38620_1_453_+ | Acyltransferase family | Acyl_transf_3 |
| H_k99_59586_373_654_− | Amidohydrolase | Amidohydro_2 |
| H_k99_64410_1_617_− | REC lobe of CRISPR-associated endonuclease Cas9 | Cas9_REC |
| Enriched in liver cirrhosis microbiome | ||
| L_k99_1592_1_390_− | Polysaccharide biosynthesis C-terminal domain | Polysacc_synt_C |
| L_k99_7366_1_565_− | Carbon starvation protein CstA | CstA |
| L_k99_13622_1_326_+ | Septation ring formation regulator, EzrA | EzrA |
| L_k99_52773_82_623_+ | Sodium:sulfate symporter transmembrane region | Na_sulph_symp |
| L_k99_52825_1_408_+ | D-isomer specific 2-hydroxyacid dehydrogenase | 2-Hacid_dh_C |