| Literature DB >> 26068103 |
Mathieu Blondel1, Akio Onogi2, Hiroyoshi Iwata2, Naonori Ueda1.
Abstract
BACKGROUND: Genomic selection (GS) is a recent selective breeding method which uses predictive models based on whole-genome molecular markers. Until now, existing studies formulated GS as the problem of modeling an individual's breeding value for a particular trait of interest, i.e., as a regression problem. To assess predictive accuracy of the model, the Pearson correlation between observed and predicted trait values was used. CONTRIBUTIONS: In this paper, we propose to formulate GS as the problem of ranking individuals according to their breeding value. Our proposed framework allows us to employ machine learning methods for ranking which had previously not been considered in the GS literature. To assess ranking accuracy of a model, we introduce a new measure originating from the information retrieval literature called normalized discounted cumulative gain (NDCG). NDCG rewards more strongly models which assign a high rank to individuals with high breeding value. Therefore, NDCG reflects a prerequisite objective in selective breeding: accurate selection of individuals with high breeding value.Entities:
Mesh:
Year: 2015 PMID: 26068103 PMCID: PMC4466774 DOI: 10.1371/journal.pone.0128570
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
General method ranking, obtained by sorting methods according to their average ranking across 6 datasets.
| Method | Correlation | Kendall’s | NDCG@1 | NDCG@5 | NDCG@10 | Mean NDCG@10 |
|---|---|---|---|---|---|---|
| Ordinal McRank | 2 | 2 | 3 | 1 | 1 | 1 |
| RF | 3 | 3 | 1 | 1 | 3 | 2 |
| RKHS | 1 | 1 | 4 | 4 | 1 | 3 |
| RankSVM | 4 | 3 | 2 | 3 | 4 | 4 |
| GBRT | 6 | 7 | 5 | 6 | 5 | 5 |
| LambdaMART | 8 | 10 | 8 | 6 | ||
| Ridge | 12 | 12 | 7 | 7 | 6 | 7 |
| BL | 7 | 6 | 5 | 5 | 8 | 8 |
| MIX | 11 | 11 | 10 | 11 | 10 | 9 |
| SSVS | 8 | 9 | 11 | 8 | 12 | 10 |
| BayesC | 8 | 8 | 12 | 9 | 11 | 11 |
| EBL | 10 | 10 | 9 | 12 | 7 | 12 |
| wBSR | 4 | 5 | 13 | 13 | 13 | 13 |
Cross-validation results on the Arabidopsis thaliana dataset, averaged across 3 traits.
| Method | Correlation | Kendall’s | NDCG@1 | NDCG@5 | NDCG@10 | Mean NDCG@10 |
|---|---|---|---|---|---|---|
| RKHS | 0.651 | 0.481 | 0.836 | 0.884 | 0.907 | 0.883 |
| SSVS | 0.628 | 0.470 | 0.855 | 0.885 | 0.903 | 0.881 |
| MIX | 0.630 | 0.468 | 0.844 | 0.872 | 0.899 | 0.878 |
| RF | 0.628 | 0.450 | 0.841 | 0.879 | 0.899 | 0.877 |
| LambdaMART | 0.814 | 0.870 | 0.904 | 0.876 | ||
| BL | 0.627 | 0.468 | 0.861 | 0.876 | 0.894 | 0.874 |
| Ordinal McRank | 0.636 | 0.455 | 0.833 | 0.879 | 0.899 | 0.873 |
| BayesC | 0.617 | 0.462 | 0.840 | 0.871 | 0.888 | 0.872 |
| RankSVM | 0.594 | 0.437 | 0.825 | 0.877 | 0.892 | 0.869 |
| GBRT | 0.619 | 0.442 | 0.802 | 0.863 | 0.896 | 0.866 |
| EBL | 0.625 | 0.468 | 0.821 | 0.872 | 0.897 | 0.863 |
| wBSR | 0.630 | 0.468 | 0.819 | 0.878 | 0.873 | 0.857 |
| Ridge | 0.464 | 0.319 | 0.839 | 0.846 | 0.866 | 0.848 |
Results on the Wheat (Pérez-Rodríguez) dataset, averaged across 2 traits.
| Method | Correlation | Kendall’s | NDCG@1 | NDCG@5 | NDCG@10 | Mean NDCG@10 |
|---|---|---|---|---|---|---|
| RKHS | 0.662 | 0.448 | 0.981 | 0.979 | 0.982 | 0.981 |
| RankSVM | 0.649 | 0.448 | 0.969 | 0.981 | 0.982 | 0.980 |
| RF | 0.658 | 0.438 | 0.973 | 0.980 | 0.981 | 0.979 |
| Ordinal McRank | 0.665 | 0.452 | 0.975 | 0.979 | 0.982 | 0.979 |
| Ridge | 0.602 | 0.398 | 0.982 | 0.979 | 0.980 | 0.979 |
| GBRT | 0.649 | 0.434 | 0.976 | 0.977 | 0.980 | 0.976 |
| LambdaMART | 0.973 | 0.976 | 0.975 | 0.975 | ||
| BL | 0.608 | 0.398 | 0.972 | 0.977 | 0.975 | 0.974 |
| EBL | 0.596 | 0.387 | 0.959 | 0.969 | 0.975 | 0.969 |
| BayesC | 0.586 | 0.374 | 0.953 | 0.969 | 0.972 | 0.967 |
| MIX | 0.568 | 0.365 | 0.944 | 0.963 | 0.974 | 0.964 |
| SSVS | 0.570 | 0.373 | 0.936 | 0.964 | 0.971 | 0.961 |
| wBSR | 0.578 | 0.381 | 0.915 | 0.951 | 0.965 | 0.959 |
Fig 1Comparison of RKHS regression and RankSVM when using the RBF kernel with parameter and when varying the regularization parameter , where N = n (RKHS regression) or N = ∣P∣ (RankSVM).
The scores indicated are the test Mean NDCG@10 averaged over 10 CV iterations and across all traits.
Fig 2Effect of the learning rate (LR) parameter on GBRT when varying the number of trees.
The scores indicated are the test Mean NDCG@10 averaged over 10 CV iterations and across all traits.
Fig 3Effect of the number of bins on multiclass and ordinal McRank.
The straight line indicates the results of RF for comparison. The scores indicated are the test Mean NDCG@10 averaged over 10 CV iterations and across all traits.
Spearman’s rank correlation coefficient of evaluation measures averaged across 6 datasets.
| Method | Correlation | Kendall’s | NDCG@1 | NDCG@5 | NDCG@10 | Mean NDCG@10 |
|---|---|---|---|---|---|---|
| Correlation | - | 0.899 | 0.604 | 0.774 | 0.775 | 0.744 |
| Kendall’s | 0.899 | - | 0.564 | 0.665 | 0.674 | 0.664 |
| NDCG@1 | 0.604 | 0.564 | - | 0.811 | 0.827 | 0.901 |
| NDCG@5 | 0.774 | 0.665 | 0.811 | - | 0.923 | 0.920 |
| NDCG@10 | 0.775 | 0.674 | 0.827 | 0.923 | - | 0.962 |
| Mean NDCG@10 | 0.744 | 0.664 | 0.901 | 0.920 | 0.962 | - |
Cross-validation results on the Barley dataset.
| Method | Correlation | Kendall’s | NDCG@1 | NDCG@5 | NDCG@10 | Mean NDCG@10 |
|---|---|---|---|---|---|---|
| RankSVM | 0.581 | 0.436 | 0.816 | 0.832 | 0.850 | 0.830 |
| Ordinal McRank | 0.566 | 0.432 | 0.783 | 0.803 | 0.829 | 0.808 |
| LambdaMART | 0.729 | 0.824 | 0.804 | 0.805 | ||
| RF | 0.568 | 0.425 | 0.764 | 0.809 | 0.833 | 0.802 |
| RKHS | 0.604 | 0.447 | 0.766 | 0.799 | 0.834 | 0.795 |
| GBRT | 0.554 | 0.409 | 0.722 | 0.768 | 0.820 | 0.775 |
| SSVS | 0.585 | 0.428 | 0.718 | 0.771 | 0.809 | 0.774 |
| Ridge | 0.572 | 0.421 | 0.700 | 0.756 | 0.820 | 0.763 |
| BL | 0.581 | 0.432 | 0.790 | 0.782 | 0.813 | 0.762 |
| MIX | 0.582 | 0.434 | 0.745 | 0.765 | 0.805 | 0.759 |
| BayesC | 0.593 | 0.438 | 0.682 | 0.765 | 0.814 | 0.756 |
| EBL | 0.578 | 0.419 | 0.764 | 0.744 | 0.808 | 0.746 |
| wBSR | 0.592 | 0.435 | 0.492 | 0.758 | 0.768 | 0.733 |
Cross-validation results on the Maize dataset.
| Method | Correlation | Kendall’s | NDCG@1 | NDCG@5 | NDCG@10 | Mean NDCG@10 |
|---|---|---|---|---|---|---|
| Ordinal McRank | 0.427 | 0.298 | 0.762 | 0.783 | 0.795 | 0.773 |
| GBRT | 0.419 | 0.283 | 0.793 | 0.721 | 0.768 | 0.768 |
| RankSVM | 0.445 | 0.317 | 0.780 | 0.771 | 0.794 | 0.765 |
| RF | 0.444 | 0.309 | 0.726 | 0.763 | 0.776 | 0.757 |
| LambdaMART | 0.696 | 0.697 | 0.740 | 0.741 | ||
| Ridge | 0.403 | 0.255 | 0.716 | 0.740 | 0.743 | 0.741 |
| MIX | 0.361 | 0.229 | 0.603 | 0.702 | 0.733 | 0.739 |
| RKHS | 0.431 | 0.278 | 0.675 | 0.736 | 0.761 | 0.737 |
| BL | 0.383 | 0.241 | 0.626 | 0.755 | 0.725 | 0.725 |
| BayesC | 0.393 | 0.242 | 0.582 | 0.773 | 0.744 | 0.708 |
| wBSR | 0.404 | 0.258 | 0.360 | 0.716 | 0.719 | 0.705 |
| SSVS | 0.398 | 0.240 | 0.592 | 0.734 | 0.730 | 0.687 |
| EBL | 0.390 | 0.236 | 0.654 | 0.725 | 0.739 | 0.644 |
Cross-validation results on the Rice dataset, averaged across 14 traits.
| Method | Correlation | Kendall’s | NDCG@1 | NDCG@5 | NDCG@10 | Mean NDCG@10 |
|---|---|---|---|---|---|---|
| RF | 0.719 | 0.535 | 0.930 | 0.941 | 0.946 | 0.941 |
| Ordinal McRank | 0.717 | 0.533 | 0.921 | 0.940 | 0.946 | 0.940 |
| RankSVM | 0.702 | 0.525 | 0.921 | 0.937 | 0.944 | 0.937 |
| GBRT | 0.713 | 0.527 | 0.917 | 0.941 | 0.945 | 0.937 |
| RKHS | 0.720 | 0.536 | 0.916 | 0.937 | 0.945 | 0.936 |
| Ridge | 0.694 | 0.511 | 0.909 | 0.932 | 0.940 | 0.930 |
| LambdaMART | 0.920 | 0.931 | 0.934 | 0.929 | ||
| BL | 0.714 | 0.529 | 0.896 | 0.924 | 0.938 | 0.924 |
| EBL | 0.708 | 0.526 | 0.889 | 0.921 | 0.935 | 0.922 |
| MIX | 0.676 | 0.502 | 0.886 | 0.923 | 0.933 | 0.920 |
| SSVS | 0.686 | 0.506 | 0.889 | 0.918 | 0.932 | 0.916 |
| BayesC | 0.688 | 0.505 | 0.899 | 0.918 | 0.932 | 0.914 |
| wBSR | 0.693 | 0.508 | 0.830 | 0.909 | 0.925 | 0.904 |
Cross-validation results on the Wheat (CIMMYT) dataset, averaged across 4 traits.
| Method | Correlation | Kendall’s | NDCG@1 | NDCG@5 | NDCG@10 | Mean NDCG@10 |
|---|---|---|---|---|---|---|
| RKHS | 0.503 | 0.359 | 0.698 | 0.752 | 0.780 | 0.748 |
| Ordinal McRank | 0.486 | 0.351 | 0.688 | 0.745 | 0.764 | 0.740 |
| RF | 0.482 | 0.346 | 0.697 | 0.739 | 0.760 | 0.736 |
| RankSVM | 0.463 | 0.329 | 0.713 | 0.718 | 0.760 | 0.733 |
| BL | 0.454 | 0.314 | 0.671 | 0.718 | 0.751 | 0.733 |
| GBRT | 0.472 | 0.331 | 0.690 | 0.725 | 0.757 | 0.732 |
| Ridge | 0.451 | 0.310 | 0.651 | 0.724 | 0.755 | 0.719 |
| BayesC | 0.464 | 0.320 | 0.650 | 0.697 | 0.732 | 0.711 |
| SSVS | 0.463 | 0.319 | 0.654 | 0.707 | 0.728 | 0.711 |
| MIX | 0.458 | 0.318 | 0.658 | 0.709 | 0.735 | 0.706 |
| EBL | 0.448 | 0.312 | 0.675 | 0.690 | 0.735 | 0.699 |
| LambdaMART | 0.636 | 0.695 | 0.715 | 0.697 | ||
| wBSR | 0.465 | 0.322 | 0.524 | 0.627 | 0.677 | 0.666 |