| Literature DB >> 26956885 |
Juan Manuel González-Camacho1, José Crossa2, Paulino Pérez-Rodríguez3, Leonardo Ornella4, Daniel Gianola5.
Abstract
BACKGROUND: Multi-layer perceptron (MLP) and radial basis function neural networks (RBFNN) have been shown to be effective in genome-enabled prediction. Here, we evaluated and compared the classification performance of an MLP classifier versus that of a probabilistic neural network (PNN), to predict the probability of membership of one individual in a phenotypic class of interest, using genomic and phenotypic data as input variables. We used 16 maize and 17 wheat genomic and phenotypic datasets with different trait-environment combinations (sample sizes ranged from 290 to 300 individuals) with 1.4 k and 55 k SNP chips. Classifiers were tested using continuous traits that were categorized into three classes (upper, middle and lower) based on the empirical distribution of each trait, constructed on the basis of two percentiles (15-85 % and 30-70 %). We focused on the 15 and 30 % percentiles for the upper and lower classes for selecting the best individuals, as commonly done in genomic selection. Wheat datasets were also used with two classes. The criteria for assessing the predictive accuracy of the two classifiers were the area under the receiver operating characteristic curve (AUC) and the area under the precision-recall curve (AUCpr). Parameters of both classifiers were estimated by optimizing the AUC for a specific class of interest.Entities:
Mesh:
Year: 2016 PMID: 26956885 PMCID: PMC4784384 DOI: 10.1186/s12864-016-2553-1
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Maize datasets – three classes
| Number of individuals | Number of individuals | Number of individuals | |||||||
|---|---|---|---|---|---|---|---|---|---|
| Data set | Trait-environment combination | Number of SNP markers | Total number of individuals | Upper | Upper | Middle | Middle | Lower | Lower |
| 15 % | 30 % | 40 % | 70 % | 15 % | 30 % | ||||
| GY-HI | Yield in high yielding environment | 46374 | 267 | 40 | 80 | 107 | 187 | 40 | 80 |
| GY-LO | Yield in low yielding environment | 46374 | 269 | 40 | 81 | 107 | 189 | 40 | 81 |
| GY-WW | Yield in well watered | 46374 | 242 | 36 | 73 | 96 | 170 | 36 | 73 |
| GY-SS | Yield in drought stressed | 46374 | 242 | 36 | 73 | 96 | 170 | 36 | 73 |
| ASI-WW | Anthesis-silking interval in well watered | 46374 | 258 | 39 | 79 | 102 | 180 | 39 | 77 |
| ASI-SS | Anthesis-silking interval in drought stressed | 46374 | 258 | 40 | 77 | 103 | 179 | 39 | 78 |
| MFL-WW | Male flowering time in well watered | 46374 | 258 | 40 | 139 | 103 | 178 | 40 | 78 |
| MFL-SS | Male flowering time in drought stressed | 46374 | 258 | 39 | 77 | 104 | 179 | 40 | 77 |
| FFL-WW | Female flowering time in well watered | 46374 | 258 | 39 | 77 | 104 | 179 | 40 | 77 |
| FFL-SS | Female flowering time in drought stressed | 46374 | 258 | 39 | 77 | 104 | 180 | 39 | 77 |
| GLS-1 | Gray leaf spot in environment 1 | 46374 | 272 | 42 | 87 | 68 | 170 | 60 | 117 |
| GLS-2 | Gray leaf spot in environment 2 | 46374 | 280 | 48 | 85 | 77 | 176 | 56 | 118 |
| GLS-3 | Gray leaf spot in environment 3 | 46374 | 278 | 47 | 85 | 107 | 168 | 63 | 86 |
| GLS-4 | Gray leaf spot in environment 4 | 46374 | 261 | 48 | 96 | 74 | 154 | 59 | 91 |
| GLS-5 | Gray leaf spot in environment 5 | 46374 | 279 | 48 | 97 | 84 | 188 | 43 | 98 |
| GLS-6 | Gray leaf spot in environment 6 | 46374 | 281 | 63 | 85 | 90 | 140 | 78 | 106 |
Trait–environment combination, number of markers, total number of individuals, number of individuals in the upper 15 and 30 % classes, in the middle 40 and 70 % classes, and in the lower 15 and 30 % classes from the empirical cumulative distribution function
Wheat datasets – three classes
| Number of individuals | Number of individuals | Number of individuals | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Data set | Agronomic management | Site in Mexico | Year | Number of SNP markers | Total number of individuals | Upper | Upper | Middle | Middle | Lower | Lower |
| 15 % | 30 % | 40 % | 70 % | 15 % | 30 % | ||||||
| GY-1 | Drought-bed | Cd. Obregon | 2009 | 1717 | 306 | 46 | 92 | 119 | 211 | 49 | 95 |
| GY-2 | Drought-bed | Cd. Obregon | 2010 | 1717 | 306 | 47 | 92 | 122 | 213 | 46 | 92 |
| GY-3 | Drought-flat | Cd. Obregon | 2010 | 1717 | 263 | 39 | 80 | 104 | 185 | 39 | 79 |
| GY-4 | Full irrigation-bed | Cd. Obregon | 2009 | 1717 | 304 | 46 | 92 | 120 | 212 | 46 | 92 |
| GY-5 | Full irrigation-bed | Cd. Obregon | 2010 | 1717 | 306 | 46 | 94 | 118 | 214 | 46 | 94 |
| GY-6 | Heat-bed | Cd. Obregon | 2010 | 1717 | 306 | 46 | 94 | 120 | 214 | 46 | 92 |
| GY-7 | Full irrigation-flat | Cd. Obregon | 2010 | 1717 | 263 | 39 | 79 | 105 | 185 | 39 | 79 |
| DTH-1 | Drought-bed | Cd. Obregon | 2009 | 1717 | 306 | 53 | 100 | 93 | 197 | 56 | 113 |
| DTH-2 | Drought-bed | Cd. Obregon | 2010 | 1717 | 306 | 50 | 93 | 117 | 198 | 58 | 96 |
| DTH-3 | Drought-flat | Cd. Obregon | 2010 | 1717 | 263 | 40 | 86 | 77 | 177 | 46 | 100 |
| DTH-4 | Full irrigation-bed | Cd. Obregon | 2009 | 1717 | 306 | 59 | 107 | 107 | 173 | 74 | 92 |
| DTH-5 | Full irrigation-bed | Cd. Obregon | 2010 | 1717 | 306 | 47 | 101 | 105 | 207 | 52 | 100 |
| DTH-6 | Toluca | Toluca | 2009 | 1717 | 306 | 122 | 122 | 75 | 93 | 91 | 109 |
| DTH-7 | El Batan | El Batan | 2009 | 1717 | 306 | 66 | 104 | 101 | 175 | 65 | 101 |
| DTH-8 | Small observation plot | Cd. Obregon | 2009 | 1717 | 301 | 58 | 101 | 100 | 182 | 61 | 100 |
| DTH-9 | Small observation plot | Cd. Obregon | 2010 | 1717 | 263 | 45 | 100 | 76 | 173 | 45 | 87 |
| DTH-10 | Agua Fria | Agua Fria | 2010 | 1717 | 261 | 49 | 81 | 93 | 125 | 87 | 87 |
Environment code of 12 combinations of sites in Mexico, agronomic management, and year for two wheat traits (grain yield, GY, and days to heading, DTH) from [11]. Number of markers, total number of individuals, number of individuals in the upper 15 and 30 % classes, in the middle 40 and 70 % classes, and in the lower 15 and 30 % classes from the empirical cumulative distribution
Wheat datasets – two classes
| Number of individuals | Number of individuals | ||||||
|---|---|---|---|---|---|---|---|
| Data set | Agronomic management | Site in Mexico | Year | Upper | Lower | Upper | Lower |
| 15 % | 85 % | 30 % | 70 % | ||||
| GY-1 | Drought-bed | Cd. Obregon | 2009 | 46 | 260 | 92 | 214 |
| GY-2 | Drought-bed | Cd. Obregon | 2010 | 47 | 259 | 92 | 214 |
| GY-3 | Drought-flat | Cd. Obregon | 2010 | 39 | 224 | 80 | 183 |
| GY-4 | Full irrigation-bed | Cd. Obregon | 2009 | 46 | 258 | 92 | 212 |
| GY-5 | Full irrigation-bed | Cd. Obregon | 2010 | 46 | 260 | 94 | 212 |
| GY-6 | Heat-bed | Cd. Obregon | 2010 | 46 | 260 | 94 | 212 |
| GY-7 | Full irrigation-flat-borders | Cd. Obregon | 2010 | 39 | 224 | 79 | 184 |
| Lower | Upper | Lower | Upper | ||||
| 15 % | 85 % | 30 % | 70 % | ||||
| DTH-1 | Drought-bed | Cd. Obregon | 2009 | 53 | 253 | 100 | 206 |
| DTH-2 | Drought-bed | Cd. Obregon | 2010 | 50 | 256 | 93 | 213 |
| DTH-3 | Drought-flat | Cd. Obregon | 2010 | 40 | 223 | 86 | 177 |
| DTH-4 | Full irrigation-bed | Cd. Obregon | 2009 | 59 | 247 | 107 | 199 |
| DTH-5 | Full irrigation-bed | Cd. Obregon | 2010 | 47 | 259 | 101 | 205 |
| DTH-6 | Toluca | Toluca | 2009 | 122 | 184 | 122 | 184 |
| DTH-7 | El Batan | El Batan | 2009 | 66 | 240 | 104 | 202 |
| DTH-8 | Small observation plot | Cd. Obregon | 2009 | 58 | 243 | 101 | 200 |
| DTH-9 | Small observation plot | Cd. Obregon | 2010 | 45 | 218 | 100 | 163 |
| DTH-10 | Agua Fria | Agua Fria | 2010 | 49 | 212 | 81 | 160 |
Environment code of 12 combinations of sites in Mexico, agronomic management, and year for two wheat traits (grain yield, GY, and days to heading, DTH) from [11]. Number of markers, total number of individuals, number of individuals in the upper 15 and 30 % classes, and in the lower 85 and 70 % classes
Fig. 1Architecture of classifier MLP with the input (markers) layer, hidden layer, and sum-output layer
Fig. 2Architecture of classifier PNN with the input (markers) layer, pattern layer, and sum-output layer
Description of a confusion matrix for binary classes with observed values and classifier predicted values
| Classifier predicted value | Sum | |||
|---|---|---|---|---|
| 1 | 0 | |||
| Observed | 1 |
|
|
|
| 0 |
|
|
| |
| Sum |
|
|
| |
tp true positive, fp false positive, fn false negative, tn true negative, n total number of individuals
Fig. 3Histograms of the AUC criterion and their standard deviation (error bars) for the maize datasets. a grain yield (GY) under optimal conditions (HI and WW) and stress conditions (LO and SS) of classifiers MLP and PNN in the upper 15 and 30 % classes; b anthesis-silking interval (ASI) under optimal conditions (WW) and drought stress conditions (SS) of MLP and PNN of the middle 40 and 70 % classes; c female flowering time (FFL), male flowering time (MFL) under optimal well-watered (WW) conditions and drought stress conditions (SS) of MLP and PNN of the lower 15 and 30 % classes; d gray leaf spot resistance (GLS) in 6 environments (1–6) of MLP and PNN of the lower 15 and 30 % classes
Fig. 4Histograms of the AUC criterion and their standard deviation (error bars) for the wheat datasets. a grain yield (GY) in seven environments (1–7) of classifiers MLP and PNN of the upper 15 and 30 % classes; b days to heading (DTH) in ten environments (1–10) of MLP and PNN in the lower 15 and 30 % classes
Maize datasets
| Upper class | ||||||||
| MLP15% | PNN15% | MLP30% | PNN30% | |||||
| GY-HI | 0.235 | (0.126) |
| (0.118) | 0.429 | (0.108) |
| (0.102) |
| GY-LO | 0.168 | (0.065) |
| (0.076) | 0.358 | (0.107) |
| (0.107) |
| GY-SS | 0.199 | (0.093) |
| (0.110) | 0.363 | (0.111) |
| (0.119) |
| GY-WW | 0.239 | (0.131) |
| (0.175) | 0.410 | (0.117) |
| (0.111) |
| Middle class | ||||||||
| MLP40% | PNN40% | MLP70% | PNN70% | |||||
| ASI-SS | 0.465 | (0.096) |
| (0.092) | 0.724 | (0.076) |
| (0.074) |
| ASI-WW | 0.436 | (0.091) |
| (0.088) | 0.706 | (0.072) |
| (0.084) |
| Lower class | ||||||||
| MLP15% | PNN15% | MLP30% | PNN30% | |||||
| FFL-SS | 0.185 | (0.087) |
| (0.137) | 0.383 | (0.106) |
| (0.096) |
| MFL-SS | 0.205 | (0.101) |
| (0.149) | 0.421 | (0.119) |
| (0.112) |
| FFL-WW | 0.197 | (0.102) |
| (0.161) | 0.413 | (0.120) |
| (0.133) |
| MFL-WW | 0.199 | (0.094) |
| (0.155) | 0.437 | (0.133) |
| (0.139) |
| GLS-1 | 0.269 | (0.096) |
| (0.135) | 0.476 | (0.096) |
| (0.092) |
| GLS-2 | 0.320 | (0.140) |
| (0.157) | 0.524 | (0.101) |
| (0.093) |
| GLS-3 | 0.372 | (0.138) |
| (0.149) | 0.496 | (0.128) |
| (0.116) |
| GLS-4 | 0.350 | (0.135) |
| (0.147) | 0.439 | (0.110) |
| (0.111) |
| GLS-5 | 0.161 | (0.072) |
| (0.107) | 0.429 | (0.098) |
| (0.118) |
| GLS-6 | 0.320 | (0.091) |
| (0.109) | 0.431 | (0.094) |
| (0.098) |
Mean values of the area under the precision-recall curve AUCpr AUCpr (standard deviation in parentheses) of 50 random partitions for 15 and 30 % upper classes for grain yield (GY) in four environments (HI, LO, SS, and WW), for 40 and 70 % middle class for anthesis-silking interval (ASI) in two environments (SS and WW), and for 15 and 30 % lower classes for four traits, female flowering (FFL) and male flowering (MFL) in two environments (SS and WW); for gray leaf spot resistance (GLS) in six environments (1–6) and for classifiers MLP and PNN. Numbers in bold are the highest AUCpr values between MLP and PNN for 15 and 30 %
Wheat datasets
| MLP15% | PNN15% | MLP30% | PNN30% | |||||
|---|---|---|---|---|---|---|---|---|
| Upper class | ||||||||
| GY-1 | 0.204 | (0.084) |
| (0.140) | 0.406 | (0.113) |
| (0.102) |
| GY-2 | 0.270 | (0.108) |
| (0.111) | 0.485 | (0.113) |
| (0.116) |
| GY-3 | 0.227 | (0.114) |
| (0.108) | 0.366 | (0.100) |
| (0.118) |
| GY-4 | 0.242 | (0.110) |
| (0.118) | 0.409 | (0.107) |
| (0.115) |
| GY-5 | 0.284 | (0.115) |
| (0.142) | 0.505 | (0.116) |
| (0.107) |
| GY-6 | 0.504 | (0.172) |
| (0.157) | 0.637 | (0.115) |
| (0.083) |
| GY-7 | 0.199 | (0.091) |
| (0.117) | 0.423 | (0.114) |
| (0.115) |
| Lower class | ||||||||
| DTH-1 | 0.304 | (0.113) |
| (0.124) | 0.522 | (0.107) |
| (0.091) |
| DTH-2 | 0.297 | (0.117) |
| (0.132) | 0.433 | (0.110) |
| (0.104) |
| DTH-3 | 0.364 | (0.149) |
| (0.151) | 0.547 | (0.115) |
| (0.095) |
| DTH-4 | 0.254 | (0.077) |
| (0.089) | 0.297 | (0.070) |
| (0.097) |
| DTH-5 | 0.275 | (0.131) |
| (0.164) | 0.440 | (0.104) |
| (0.087) |
| DTH-6 | 0.380 | (0.091) |
| (0.094) | 0.465 | (0.099) |
| (0.112) |
| DTH-7 | 0.368 | (0.114) |
| (0.113) | 0.521 | (0.124) |
| (0.115) |
| DTH-8 | 0.264 | (0.097) |
| (0.103) | 0.452 | (0.102) |
| (0.095) |
| DTH-9 | 0.261 | (0.103) |
| (0.112) | 0.416 | (0.099) |
| (0.107) |
| DTH-10 | 0.447 | (0.109) |
| (0.114) | 0.462 | (0.112) |
| (0.124) |
Mean values of the area under the precision-recall curve AUCpr (standard deviation in parentheses) of 50 random partitions for the 15 and 30 % upper classes for grain yield (GY) in 7 environments (1–7) and 15 and 30 % lower classes for days to heading (DTH) in 10 environments (1–10) for classifiers MLP and PNN. Numbers in bold are the highest AUCpr values between MLP and PNN for 15 and 30 %
Wheat datasets
| PNN15% (two classes) | PNN15% (three classes) | PNN30% (two classes) | PNN30% (three classes) | |||||
|---|---|---|---|---|---|---|---|---|
| Upper class | ||||||||
| GY-1 | 0.658 | (0.140) |
| (0.135) | 0.708 | (0.082) |
| (0.085) |
| GY-2 | 0.691 | (0.091) |
| (0.100) | 0.765 | (0.081) |
| (0.076) |
| GY-3 | 0.694 | (0.123) |
| (0.120) |
| (0.115) | 0.663 | (0.115) |
| GY-4 | 0.674 | (0.120) |
| (0.105) | 0.701 | (0.112) |
| (0.107) |
| GY-5 | 0.710 | (0.123) |
| (0.115) | 0.775 | (0.083) |
| (0.089) |
| GY-6 |
| (0.097) | 0.878 | (0.100) | 0.830 | (0.075) |
| (0.070) |
| GY-7 | 0.649 | (0.160) |
| (0.158) | 0.708 | (0.116) |
| (0.106) |
| Lower class | ||||||||
| DTH-1 | 0.724 | (0.112) |
| (0.109) |
| (0.074) |
| (0.072) |
| DTH-2 | 0.773 | (0.094) |
| (0.090) |
| (0.100) | 0.751 | (0.092) |
| DTH-3 | 0.840 | (0.100) |
| (0.101) | 0.802 | (0.074) |
| (0.074) |
| DTH-4 | 0.584 | (0.098) |
| (0.097) | 0.568 | (0.094) |
| (0.102) |
| DTH-5 | 0.763 | (0.121) |
| (0.128) | 0.754 | (0.075) |
| (0.072) |
| DTH-6 | 0.708 | (0.086) |
| (0.085) |
| (0.097) |
| (0.098) |
| DTH-7 | 0.765 | (0.096) |
| (0.095) | 0.775 | (0.097) |
| (0.088) |
| DTH-8 | 0.750 | (0.080) |
| (0.082) |
| (0.065) | 0.799 | (0.067) |
| DTH-9 | 0.764 | (0.105) |
| (0.090) | 0.736 | (0.088) |
| (0.087) |
| DTH-10 | 0.763 | (0.763) |
| (0.098) | 0.774 | (0.094) |
| (0.102) |
Mean values of the area under the ROC curve AUC (standard deviation in parentheses) of 50 random partitions for the 15 and 30 % upper class for grain yield (GY) in 7 environments (1–7) and for 15 and 30 % lower class for days to heading (DTH) for classifier PNN with two and three classes. Numbers in bold are the highest AUC values
Wheat datasets
| PNN15% (two classes) | PNN15% (three classes) | PNN30% (two classes) | PNN30% (three classes) | |||||
|---|---|---|---|---|---|---|---|---|
| Upper class | ||||||||
| GY-1 | 0.270 | (0.134) |
| (0.140) |
| (0.118) | 0.475 | (0.102) |
| GY-2 |
| (0.118) | 0.307 | (0.111) | 0.538 | (0.117) |
| (0.116) |
| GY-3 |
| (0.138) | 0.268 | (0.108) | 0.452 | (0.117) |
| (0.118) |
| GY-4 | 0.319 | (0.121) |
| (0.118) | 0.482 | (0.112) |
| (0.115) |
| GY-5 |
| (0.161) | 0.326 | (0.142) | 0.545 | (0.104) |
| (0.107) |
| GY-6 |
| (0.159) | 0.561 | (0.157) | 0.668 | (0.087) |
| (0.083) |
| GY-7 | 0.263 | (0.124) |
| (0.117) | 0.503 | (0.117) |
| (0.115) |
| Lower class | ||||||||
| DTH-1 | 0.370 | (0.114) |
| (0.124) | 0.629 | (0.091) |
| (0.091) |
| DTH-2 | 0.417 | (0.134) |
| (0.132) |
| (0.116) | 0.521 | (0.104) |
| DTH-3 | 0.506 | (0.158) |
| (0.151) | 0.641 | (0.093) |
| (0.095) |
| DTH-4 | 0.292 | (0.090) |
| (0.089) | 0.350 | (0.094) |
| (0.097) |
| DTH-5 | 0.355 | (0.158) |
| (0.164) |
| (0.091) | 0.546 | (0.087) |
| DTH-6 | 0.444 | (0.087) |
| (0.094) |
| (0.119) | 0.520 | (0.112) |
| DTH-7 | 0.462 | (0.116) |
| (0.113) | 0.580 | (0.122) |
| (0.115) |
| DTH-8 |
| (0.104) | 0.382 | (0.103) |
| (0.089) | 0.599 | (0.095) |
| DTH-9 |
| (0.138) | 0.367 | (0.112) | 0.532 | (0.105) |
| (0.107) |
| DTH-10 |
| (0.112) | 0.553 | (0.114) | 0.575 | (0.117) |
| (0.124) |
Mean values of the area under the precision-recall curve AUCpr (standard deviation in parentheses) of 50 random partitions for the 15 and 30 % upper classes for grain yield (GY) in 7 environments (1–7) and for 15 and 30 % lower classes for days to heading (DTH) for classifier PNN with two and three classes. Numbers in bold are the highest AUCpr values
Fig. 5The upper curve is the ROC curve (AUC) with recall vs false positive rate. The lower curve is the precision-recall curve AUCpr with precision vs recall for the a upper 15 % class of grain yield under well-watered conditions (GY-WW) of classifiers MLP (green) and PNN (blue); b upper 30 % class of trait grain yield under well-watered conditions (GY-WW) of MLP (green) and PNN (blue); c middle 40 % class of trait anthesis-silking interval under well-watered conditions (ASI-WW) of MLP (green) and PNN (blue) and d middle 70 % class of trait anthesis-silking interval under well-watered conditions (ASI-WW) of MLP (green) and PNN (blue); e lower 15 % class of trait female flowering under well-watered conditions (FFL-WW) of MLP (green) and PNN (blue); f lower 30 % class of trait female flowering under well-watered conditions (FFL-WW) of MLP (green) and PNN (blue)
Fig. 6The upper curve is the ROC curve (AUC) with recall (sensitivity) vs false positive rate. The lower curve is the precision-recall curve AUCpr with precision vs recall for the a upper 15 % class of grain yield in environment 6 (GY-6) of classifiers MLP (green) and PNN (blue); b upper 30 % class of grain yield in environment 6 (GY-6) of MLP (green) and PNN (blue); c lower 15 % class of days to heading in environment 3 (DTH-3) of MLP (green) and PNN (blue); d lower 30 % class of days to heading in environment 3 (DTH-3) of MLP (green) and PNN (blue)