| Literature DB >> 24023808 |
Elizabeth M Ross1, Peter J Moate, Leah C Marett, Ben G Cocks, Ben J Hayes.
Abstract
Mammals have a large cohort of endo- and ecto- symbiotic microorganisms (the microbiome) that potentially influence host phenotypes. There have been numerous exploratory studies of these symbiotic organisms in humans and other animals, often with the aim of relating the microbiome to a complex phenotype such as body mass index (BMI) or disease state. Here, we describe an efficient methodology for predicting complex traits from quantitative microbiome profiles. The method was demonstrated by predicting inflammatory bowel disease (IBD) status and BMI from human microbiome data, and enteric greenhouse gas production from dairy cattle rumen microbiome profiles. The method uses unassembled massively parallel sequencing (MPS) data to form metagenomic relationship matrices (analogous to genomic relationship matrices used in genomic predictions) to predict IBD, BMI and methane production phenotypes with useful accuracies (r = 0.423, 0.422 and 0.466 respectively). Our results show that microbiome profiles derived from MPS can be used to predict complex phenotypes of the host. Although the number of biological replicates used here limits the accuracy that can be achieved, preliminary results suggest this approach may surpass current prediction accuracies that are based on the host genome. This is especially likely for traits that are largely influenced by the gut microbiota, for example digestive tract disorders or metabolic functions such as enteric methane production in cattle.Entities:
Mesh:
Substances:
Year: 2013 PMID: 24023808 PMCID: PMC3762846 DOI: 10.1371/journal.pone.0073056
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Metagenomic predictions of qualitative and quantitative traits.
| Trait | Validation Method | Ref.Pop. (N) | Val. Pop. (N) | Accuracy (r) | 95% CI | Significant |
| IBD | 3-fold CV | Spain (25+13) | Spain (25+13) | 0.429 | 0.156∶ 0.647 | Y |
| BMI | 2-fold CV | Denmark (84) | Denmark (84) | 0.391 | 0.175∶ 0.491 | Y |
| BMI | 2 Populations | Denmark (84) | Spain-c (13) | 0.101 | −0.228∶ 0.624 | N |
| Methane | 2 Populations | bovGMC (31) | bovFT-t (8) | 0.788 | 0.132∶0.961 | Y |
| Methane | 2 Populations | bovGMC (31) | bovFT-c (7) | 0.404 | 0.330∶0.985 | Y |
| Methane | 2 Populations | bovGMC (31) | bovFCE (16) | 0.466 | 0.165∶0.734 | Y |
| Methane | 2 Populations | bovFT (15) | bovFCE (16) | 0.394 | 0.078∶0.711 | Y |
| Methane | 2 Populations | bovFT (15) | bovGMC-c (11) | −0.167 | −0.677∶0.247 | N |
| Methane | 2 Populations | bovFT (15) | bovGMC-t (20) | 0.277 | 0.347∶0.735 | Y |
| Methane | 2 Populations | bovFCE (16) | bovFT-t (8) | 0.285 | −0.283∶0.872 | N |
| Methane | 2 Populations | bovFCE (16) | bovFT-c (7) | 0.780 | 0.127∶0.973 | Y |
| Methane | 2 Populations | bovFCE (16) | bovGMC-c (11) | 0.084 | −0.283∶0.528 | N |
| Methane | 2 Populations | bovFCE (16) | bovGMC-t (20) | 0.376 | 0.049∶0.730 | Y |
Accuracy of prediction with confidence intervals for human and bovine metagenomic predictions. Phenotypes were predicted from metagenomic profiles using BLUP, performed in ASReml. To evaluate the accuracy of metagenomic predictions the predicted phenotype was correlated with the measured (real) phenotype. IBD and BMI data is from [8].
95% confidence interval of the Pearson’s correlation coefficient r based on 10,000 bootstraps.
Ref.Pop. = Reference population.
Val.Pop = Validation population.
Spain-c = Control samples from Spain (no IBD).
bovFT-t/bovGMC-t = Animals on the treatment diet only.
bovFT-c/bovGMC-c = Animals on the control diet only.
N = total number of samples used.
CV = Cross Validation.
2 Populations = Validation on a second independent population.
Phenotypes used were IBD = 0, nonIBD = 1.
Figure 1Reference population characteristics effect on metagenomic prediction accuracy.
Prediction of residual enteric methane production from cattle (Red in panels a-c), and body mass index (BMI) from humans (Blue in panels a-c). Bovine predictions all use bovGMC as the reference population and bovFCE as the validation population. A) Lines: effect of reference population size on prediction accuracy. Line indicates the average accuracy of prediction from 20 random replicate populations sampled from the whole dataset. Squares: Accuracy of prediction when the most extreme phenotypes were used in the reference. Triangles: Accuracy of prediction when least extreme samples were used in the reference. B) Comparison of prediction accuracy using the BLUP and randomForests methods. The same reference and validation populations were used in the BLUP and randomForest methods. The randomForest predictions were performed with default settings, and the average correlation of 100 replicate runs is reported. C) Prediction accuracies under different sequence depths in the bovine dataset, phenotype is residual methane production, reference population is bovGMC, validation population is bovFCE. D) Prediction accuracy when different sized contig databases were used. Phenotype is residual methane production, reference population is bovGMC, and validation population is bovFCE. Blue diamonds: N contigs were randomly selected from the whole dataset. Red triangles: Contigs were randomly assigned to 4 groups of 100,000 contigs (no overlap between contig groups).