| Literature DB >> 23990900 |
Nina Melzer1, Dörte Wittenburg, Dirk Repsilber.
Abstract
In this study the benefit of metabolome level analysis for the prediction of genetic value of three traditional milk traits was investigated. Our proposed approach consists of three steps: First, milk metabolite profiles are used to predict three traditional milk traits of 1,305 Holstein cows. Two regression methods, both enabling variable selection, are applied to identify important milk metabolites in this step. Second, the prediction of these important milk metabolite from single nucleotide polymorphisms (SNPs) enables the detection of SNPs with significant genetic effects. Finally, these SNPs are used to predict milk traits. The observed precision of predicted genetic values was compared to the results observed for the classical genotype-phenotype prediction using all SNPs or a reduced SNP subset (reduced classical approach). To enable a comparison between SNP subsets, a special invariable evaluation design was implemented. SNPs close to or within known quantitative trait loci (QTL) were determined. This enabled us to determine if detected important SNP subsets were enriched in these regions. The results show that our approach can lead to genetic value prediction, but requires less than 1% of the total amount of (40,317) SNPs., significantly more important SNPs in known QTL regions were detected using our approach compared to the reduced classical approach. Concluding, our approach allows a deeper insight into the associations between the different levels of the genotype-phenotype map (genotype-metabolome, metabolome-phenotype, genotype-phenotype).Entities:
Mesh:
Year: 2013 PMID: 23990900 PMCID: PMC3749218 DOI: 10.1371/journal.pone.0070256
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1Scheme of the invariable double 10-fold cross-validation (CV) design.
To obtain the outer 10-fold CV, which represents a classical CV [5], the whole data set is first divided into ten equal parts considering the half-sib structure, which results in the 10 outer test sets. The outer training set for each test set is created by merging the remaining nine test sets. The outer cross-validation is only used for the genetic value prediction. To enable optimization, i.e., in our case to find optimal milk metabolites for a milk trait, an inner CV is necessary. The inner 10-fold CV is created based on the 10 outer training sets, where each outer training set is again divided in 10 equal parts.)
Information about the important metabolites detected within inner 10-fold cross-validation (CV) runs.
| Milk trait | metabolite |
| P | Counts in 10-CV | No. of SNPs |
|
|
|
| Fat content | 1,3-Dihydroxyaceton | 0.09 | 0.19 | 10 | 5.30 | 0.06 | 0.03 | 0.38 |
| Arabitol | 0.21 | 0.19 | 10 | 16.90 | 0.14 | 0.11 | 0.23 | |
| Aspartic acid | 0.17 | −0.13 | 10 | 29.00 | 0.14 | 0.06 | 0.03 | |
| Butanoic acid, 4-amino- | 0.18 | −0.09 | 1 | 4.00 | 0.13 | −0.04 | n.a. | |
| Galactitol | 0.00 | −0.14 | 10 | 18.10 | −0.01 | −0.08 | 0.06 | |
| Glucaric acid-1,4-lactone | 0.05 | −0.12 | 10 | 7.40 | 0.05 | −0.05 | 0.03 | |
| Muramic acid, N-acetyl- | 0.04 | 0.12 | 1 | 6.00 | −0.03 | −0.02 | n.a. | |
| myo-Inositol-1-phosphate | 0.18 | 0.13 | 8 | 6.88 | 0.12 | 0.03 | 0.01 | |
| Pyroglutamic acid | 0.15 | −0.12 | 10 | 41.50 | 0.15 | 0.06 | 0.01 | |
| Pyruvic acid | 0.08 | 0.11 | 4 | 10.75 | 0.05 | −0.02 | 0.25 | |
| Sedoheptulose, 2,7-anhydro-, beta | 0.00 | −0.10 | 1 | 4.00 | −0.02 | 0.03 | n.a. | |
| pH value | Alanine, beta- | 0.22 | −0.18 | 8 | 10.00 | 0.14 | 0.09 | 0.08 |
| Arabitol | 0.21 | 0.11 | 3 | 18.00 | 0.17 | 0.11 | 0.25 | |
| Glutaric acid, 2-hydroxy- | 0.48 | 0.11 | 4 | 33.25 | 0.38 | 0.32 | 0.12 | |
| Glycerol-2-phosphate | 0.11 | −0.15 | 10 | 25.60 | 0.14 | 0.14 | 1 | |
| Glycerol-3-phosphate | 0.18 | −0.13 | 7 | 53.57 | 0.22 | 0.19 | 0.58 | |
| Glycine | 0.21 | −0.18 | 10 | 20.60 | 0.15 | 0.08 | 0.03 | |
| Phenylalanine | 0.03 | −0.13 | 1 | 8.00 | −0.02 | 0.13 | n.a. | |
| Threonic acid | 0.11 | 0.00 | 1 | 4.00 | −0.01 | 0.04 | n.a. | |
| Tryptophan | 0.05 | −0.11 | 1 | 10.00 | 0.14 | 0.08 | n.a. | |
| Tyrosine | 0.01 | −0.13 | 1 | 15.00 | 0.14 | 0.17 | n.a. | |
| Protein content | 2-Piperidinecarboxylic acid | 0.37 | −0.21 | 10 | 23.50 | 0.17 | 0.09 | 0.03 |
| Adipic acid, 2-amino- | 0.19 | −0.28 | 10 | 24.60 | 0.16 | 0.06 | 0.00 | |
| Alanine | 0.16 | −0.13 | 3 | 9.00 | 0.05 | 0.04 | 1 | |
| Arabitol | 0.21 | 0.34 | 10 | 16.30 | 0.14 | 0.09 | 0.08 | |
| Asparagine | 0.06 | −0.23 | 10 | 12.30 | 0.12 | 0.02 | 0.01 | |
| Aspartic acid | 0.17 | −0.22 | 10 | 28.40 | 0.14 | 0.05 | 0.01 | |
| Butanoic acid, 2-amino- | 0.27 | −0.24 | 10 | 29.10 | 0.14 | 0.07 | 0.05 | |
| Cinnamic acid, 3,4,5-trimethoxy-, trans- | 0.02 | 0.30 | 10 | 4.90 | 0.06 | 0.04 | 0.56 | |
| Glyceric acid-3-phosphate | 0.09 | 0.23 | 4 | 23.50 | 0.10 | 0.08 | 0.88 | |
| Glycerol-2-phosphate | 0.11 | 0.19 | 1 | 29.00 | 0.06 | 0.13 | n.a. | |
| Glycerol-3-phosphate | 0.18 | 0.30 | 10 | 55.10 | 0.22 | 0.17 | 0.11 | |
| myo-Inositol-1-phosphate | 0.18 | 0.27 | 10 | 6.50 | 0.12 | 0.07 | 0.19 | |
| Phosphoenolpyruvic acid | 0.41 | 0.25 | 7 | 36.29 | 0.22 | 0.17 | 0.11 | |
| Pyroglutamic acid | 0.15 | −0.18 | 10 | 41.80 | 0.15 | 0.10 | 0.03 | |
| Spermidine | 0.02 | 0.28 | 10 | 11.20 | 0.07 | 0.02 | 0.28 | |
| Thiazole, 4-methyl-5-hydroxyethyl- | 0.20 | 0.21 | 7 | 11.43 | 0.05 | 0.04 | 0.69 |
The table presents the number of occurrences in 10 CV runs (Counts in 10-CV) as well as the average number of selected SNPs (No. of SNPs) over the corresponding CV runs for each important metabolite. In addition, results of the genetic value prediction using all SNPs () and selected SNPs () for each metabolite are presented, using the outer 10-fold cross-validation runs. Moreover the -values (-value) obtained from the Wilcoxon signed rank test are listed. The test was applied when a milk metabolite was detected in more than two inner cross-validation runs (i.e., Counts in 10-CV). Additionally, the estimated narrow-sense heritabilities () as well as the Pearson correlation coefficients (P; adapted from [6]) obtained between each milk trait and important milk metabolite are presented. For the latter the whole data set was used.
- significant difference ().
P-values resulting from rating the important SNPs for the reduced classical approach and the metabolite approach for each outer training set using an over-representation analysis.
| Reduced classical approach | Metabolite approach | |||||
| Trait |
| Expected | Observed |
| Expected | Observed |
| Fat content | 0.737 | 2.56 | 2 | 0.010 | 7.75 | 15 |
| 0.588 | 3.01 | 3 | 0.001 | 7.30 | 17 | |
| 0.930 | 2.56 | 1 | 0.032 | 8.13 | 14 | |
| 0.048 | 2.03 | 5 | 0.001 | 9.26 | 20 | |
| 0.897 | 2.18 | 1 | 0.005 | 7.90 | 16 | |
| 0.395 | 2.26 | 3 | 0.003 | 9.56 | 19 | |
| 0.904 | 2.26 | 1 | 0.006 | 9.48 | 18 | |
| 0.395 | 2.26 | 3 | 0.014 | 6.62 | 13 | |
| 0.613 | 2.03 | 2 | 0.008 | 8.28 | 16 | |
| 0.807 | 1.58 | 1 | 0.001 | 6.77 | 16 | |
| Protein content | 0.270 | 7.04 | 9 | 0.002 | 20.76 | 35 |
| 0.400 | 8.91 | 10 | 0.073 | 20.14 | 27 | |
| 0.488 | 8.56 | 9 | 0.017 | 23.35 | 34 | |
| 0.921 | 6.86 | 4 | 0.011 | 24.24 | 36 | |
| 0.815 | 7.93 | 6 | 0.025 | 18.98 | 28 | |
| 0.717 | 7.04 | 6 | 0.004 | 27.36 | 42 | |
| 0.690 | 7.93 | 7 | 0.069 | 28.79 | 37 | |
| 0.268 | 7.93 | 10 | 0.025 | 23.97 | 34 | |
| 0.307 | 7.31 | 9 | 0.013 | 20.41 | 31 | |
| 0.688 | 9.00 | 8 | 0.050 | 18.54 | 26 | |
The values of “Expected” and “Observed” correspond to the number of expected and observed important SNPs located in the belonging QTL.
significance level = 0.05.
Figure 2Boxplots of the observed precision of genetic value prediction over the 10 outer cross-validation runs for the classical approach (all), reduced classical approach (red), metabolite approach (met), QTL approach (QTL) and peakQTL-approach (pQTL).
The following milk traits were investigated: fat content (A), protein content (B) and pH value (C). If two approaches differ significantly ( = 0.05), this is marked with a black dashed line and the observed P-value is given. The gray line represents a possible upper bound for the accuracy of prediction given as the square root of the estimated narrow-sense heritability based on a sire model.