| Literature DB >> 26599001 |
Bahareh Torkzaban1, Amir Hossein Kayvanjoo2, Arman Ardalan1,3, Soraya Mousavi1, Roberto Mariotti4, Luciana Baldoni4, Esmaeil Ebrahimie5,6,7,8, Mansour Ebrahimi2, Mehdi Hosseini-Mazinani1.
Abstract
Finding efficient analytical techniques is overwhelmingly turning into a bottleneck for the effectiveness of large biological data. Machine learning offers a novel and powerful tool to advance classification and modeling solutions in molecular biology. However, these methods have been less frequently used with empirical population genetics data. In this study, we developed a new combined approach of data analysis using microsatellite marker data from our previous studies of olive populations using machine learning algorithms. Herein, 267 olive accessions of various origins including 21 reference cultivars, 132 local ecotypes, and 37 wild olive specimens from the Iranian plateau, together with 77 of the most represented Mediterranean varieties were investigated using a finely selected panel of 11 microsatellite markers. We organized data in two '4-targeted' and '16-targeted' experiments. A strategy of assaying different machine based analyses (i.e. data cleaning, feature selection, and machine learning classification) was devised to identify the most informative loci and the most diagnostic alleles to represent the population and the geography of each olive accession. These analyses revealed microsatellite markers with the highest differentiating capacity and proved efficiency for our method of clustering olive accessions to reflect upon their regions of origin. A distinguished highlight of this study was the discovery of the best combination of markers for better differentiating of populations via machine learning models, which can be exploited to distinguish among other biological populations.Entities:
Mesh:
Substances:
Year: 2015 PMID: 26599001 PMCID: PMC4658005 DOI: 10.1371/journal.pone.0143465
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1Map of Iran with the main provinces where olive accessions had been sampled.
Blue) local ecotypes; green) cuspidata specimens.
Fig 2Examples of indigenous Iranian olive.
A) Torang cuspidata specimen, Kerman; B) Mavi local ecotype, Khuzestan; C) Gardineko local ecotype, Ilam; D) Pirzeytun local ecotype, Fars; adapted from Hosseini-Mazinani et al. [36].
Fig 3Decision Tree generated model showing separation of olive populations in the 4-targeted (4-t) experiment by different alleles.
In this model, DCA14-149 was selected as the root.
Fig 4Decision Tree generated model showing separation of olive populations in the 16-targeted (16-t) experiment by different alleles.
In this model, DCA-178 was selected as the main classifying attribute.
Microsatellite allele lengths, loci and the total alleles.
| Locus | Allele lengths (bp) | Total alleles |
|---|---|---|
| DCA3 | 227- | 32 |
| DCA5 |
| 16 |
| DCA9 | 162-164-166-169-170-172-174-176-178-180-182-184-186-188-190-192-194-196-198-200-202-204-206-208-210-214-216-218-220 | 29 |
| DCA14 |
| 18 |
| DCA16 | 122-124-126-128-130- | 39 |
| DCA18 |
| 21 |
| EMO-90 | 182-184-186-188-190-192-194-196-198-200-202-213 | 12 |
| GAPU71B | 118-121-122-124-126-127-128-130-132-134-136-138-140-142-144-146- | 18 |
| GAPU101 | 182- | 18 |
| GAPU103 | 134-136-139-141-144-146-148-150-154-157-159- | 28 |
| UDO-043 |
| 27 |
| Total | 258 |
Alleles private to the Iranian accessions are highlighted in bold.
Prediction rate (accuracy) details of each decision tree with 10-fold cross validation for each of the populations in the 4-targeted (4-t) experiment, i.e. reference cultivars, Mediterranean varieties, cuspidata specimens, and local ecotypes.
| True | Local ecotypes |
| Mediterranean varieties | Reference cultivars | |
|---|---|---|---|---|---|
| Predicted | |||||
|
| 120 (out of 132) | 3 | 10 | 12 | |
|
| 3 | 32 (out of 37) | 1 | 1 | |
|
| 5 | 2 | 66 (out of 77) | 1 | |
|
| 4 | 0 | 0 | 7 (out of 21) | |
Prediction rows indicate how records (olive accessions) were predicted by the model. True columns indicate how many records were predicted correctly.
Prediction rate (accuracy) details of each decision tree with 10-fold cross validation for each of the types in the 16-targeted (16-t) experiment.
| Sistan & Baluchestan | Hormozgan | Kerman | Zanjan | Khuzestan | Khorasan e Jonubi | Kermanshah | Fars | Charmahal & Bakhtiari | Yazd | Bushehr | Lorestan | Qom | Kohgiluye & Boyerahmad | Esfahan | Golestan | Ilam | True |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Predicted | |||||||||||||||||
| 0 | 0 | 2 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 2 | 3 |
|
| 0 | 0 | 0 | 0 | 2 | 0 | 3 | 1 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 17 | 4 |
|
| 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 3 | 0 | 0 |
|
| 0 | 0 | 0 | 0 | 1 | 0 | 0 | 2 | 0 | 1 | 1 | 2 | 0 | 32 | 0 | 1 | 1 |
|
| 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 1 |
|
| 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 3 | 0 | 0 | 0 |
|
| 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
|
| 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
|
| 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 3 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
|
| 0 | 0 | 0 | 0 | 0 | 0 | 2 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
|
| 0 | 0 | 0 | 1 | 2 | 0 | 7 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
|
| 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
|
| 0 | 0 | 0 | 2 | 4 | 0 | 2 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 |
|
| 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
|
| 3 | 4 | 23 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 3 |
|
| 0 | 6 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
|
| 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
|
Prediction rows indicate how records (olive accessions) were predicted by the model. True columns indicate how many records were predicted correctly.
Prediction rate (accuracy) details of each Bayesian algorithm with 10-fold cross validation for each of the populations in the 4-targeted (4-t) experiment, i.e. reference cultivars, Mediterranean varieties, cuspidata specimens, and local ecotypes.
| True | Local ecotypes |
| Mediterranean varieties | Reference cultivars | |
|---|---|---|---|---|---|
| Predicted | |||||
|
| 115 (out of 132) | 0 | 2 | 5 | |
|
| 0 | 37 (out of 37) | 0 | 0 | |
|
| 5 | 0 | 75 (out of 77) | 0 | |
|
| 12 | 0 | 0 | 16 (out of 21) | |
Prediction rows indicate how records (olive accessions) were predicted by the model. True columns indicate how many records were predicted correctly.
Prediction rate (accuracy) details of each Bayesian algorithm with 10-fold cross validation for each of the populations in the 16-targeted (16-t) experiment.
| Sistan & Baluchestan | Hormozgan | Kerman | Zanjan | Khuzestan | Khorasan e Jonubi | Kermanshah | Fars | Charmahal & Bakhtiari | Yazd | Bushehr | Lorestan | Qom | Kohgiluye & Boyerahmad | Esfahan | Golestan | Ilam | True |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Predicted | |||||||||||||||||
| 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 3 | 8 |
|
| 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 2 | 0 | 0 | 16 | 2 |
|
| 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 3 | 0 | 0 |
|
| 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 35 | 0 | 1 | 1 |
|
| 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 0 |
|
| 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 1 | 0 |
|
| 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 4 | 0 | 0 | 0 | 0 | 0 | 0 |
|
| 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
|
| 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 3 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
|
| 0 | 0 | 0 | 0 | 1 | 0 | 0 | 6 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
|
| 0 | 0 | 0 | 1 | 1 | 0 | 10 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
|
| 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
|
| 0 | 0 | 1 | 1 | 10 | 0 | 3 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
|
| 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
|
| 2 | 3 | 21 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
|
| 1 | 6 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
|
| 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
|
Prediction rows indicate how records (olive accessions) were predicted by the model. True columns indicate how many records were predicted correctly.