| Literature DB >> 22957050 |
Amir H Beiki1, Saba Saboor, Mansour Ebrahimi.
Abstract
Various methods have been used to identify cultivares of olive trees; herein we used different bioinformatics algorithms to propose new tools to classify 10 cultivares of olive based on RAPD and ISSR genetic markers datasets generated from PCR reactions. Five RAPD markers (OPA0a21, OPD16a, OP01a1, OPD16a1 and OPA0a8) and five ISSR markers (UBC841a4, UBC868a7, UBC841a14, U12BC807a and UBC810a13) selected as the most important markers by all attribute weighting models. K-Medoids unsupervised clustering run on SVM dataset was fully able to cluster each olive cultivar to the right classes. All trees (176) induced by decision tree models generated meaningful trees and UBC841a4 attribute clearly distinguished between foreign and domestic olive cultivars with 100% accuracy. Predictive machine learning algorithms (SVM and Naïve Bayes) were also able to predict the right class of olive cultivares with 100% accuracy. For the first time, our results showed data mining techniques can be effectively used to distinguish between plant cultivares and proposed machine learning based systems in this study can predict new olive cultivars with the best possible accuracy.Entities:
Mesh:
Substances:
Year: 2012 PMID: 22957050 PMCID: PMC3434224 DOI: 10.1371/journal.pone.0044164
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Names and the sequences of ISSR and RAPD marker.
| ISSR Primer | Sequence 5′–3′ | Primer ISSR | Sequence 5′–3′ | Primer RAPD | Sequence 5′–3′ | Primer RAPD | Sequence 5′–3′ |
|
| (AG)8T | UBC-835 | (AG)8CC | OPA-10 |
| OPA08 |
|
|
| (AG)8C | UBC-836 | (AG)8CA | OPA-11 |
| OPD05 |
|
|
| (GA)8T | UBC-841 | (GA)8CC | OPB-01 |
| OPD15 |
|
|
| (GA)8C | UBC-841Y | (GA)8CCY | OPE-06 |
| OPDP6 |
|
|
| (GA)8A | UBC-856 | (AC)8YA | OPE-16 |
| OPD01 |
|
|
| (CA)8G | UBC-868 | (GA)8A | OPF-05 |
| OPA01 |
|
|
| (AG)8CT | UBC-880 | (GGAGA)3 | OPA-04 |
| OPA00 |
|
The numbers and the averages of most important alleles (fragments) selected by different attribute weighting algorithms.
| Alleles (fragments) | Number of attribute weightings | Average of attribute weightings | Alleles (fragments) | Number of attribute weightings | Average of attribute weightings |
|
| 10 | 0.982 |
| 10 | 0.737 |
|
| 10 | 0.982 |
| 10 | 0.737 |
|
| 10 | 0.680 |
| 10 | 0.735 |
|
| 10 | 0.680 |
| 10 | 0.735 |
|
| 10 | 0.680 |
| 10 | 0.735 |
|
| 10 | 0.720 |
| 10 | 0.735 |
|
| 10 | 0.712 |
| 10 | 0.688 |
|
| 10 | 0.712 |
| 10 | 0.688 |
The attribute weighting models and the numbers of important protein features selected by each model and the most important variables selected by each attribute weighting algorithms.
| Attribute Weighting | Number of Variable | Important variable |
|
| 16 | UBC841A4; UBC868A7; OPA0A21; OPA0A8 |
|
| 16 | UBC841A4; UBC868A7; OPA0A21; OPA0A8 |
|
| 57 | UBC841A4; UBC868A7; OPA0A21; OPA0A8 |
|
| 160 | UBC808A13; UBC808A15; OPA10A10; OPA11A7 |
|
| 2 | UBC841A4; UBC868A7; UBC841A14; OPA0A21 |
|
| 16 | UBC841A4; UBC868A7; UBC841A14; OPA0A21 |
|
| 16 | UBC841A4; UBC868A7; UBC841A14; OPA0A21 |
|
| 16 | UBC841A4; UBC868A7; UBC841A14; OPA0A21; |
|
| 115 | OPD1A1; OPA0A7; UBC841A4; UBC868A7 |
|
| 76 | UBC834A7; UBC834A8; UBC856A3; UBC856A6; |
|
| 400 |
The numbers of olive cultivars correctly predicted by three different unsupervised clustering algorithms ran on all databases.
| Database | K-Means | K-Medoids | SV | ||||
| Cultivar | Predicted Number | Correct predicted Number | Predicted Number | Correct predicted Number | Predicted Number | Correct predicted Number | |
|
| Iranian | 10 | 5 | 10 | 5 | 0 | 0 |
| Foreign | 0 | 0 | 0 | 0 | 10 | 5 | |
|
| Iranian | 10 | 5 | 10 | 5 | 0 | 0 |
| Foreign | 0 | 0 | 0 | 0 | 10 | 5 | |
|
| Iranian | 7 | 4 | 5 | 1 | Noise | – |
| Foreign | 3 | 2 | 5 | 1 | Noise | – | |
|
| Iranian | 10 | 5 | 10 | 5 | 0 | 0 |
| Foreign | 0 | 0 | 0 | 0 | 10 | 5 | |
|
| Iranian | 10 | 5 | 10 | 5 | 0 | 0 |
| Foreign | 0 | 0 | 0 | 0 | 10 | 5 | |
|
| Iranian | 10 | 5 | 10 | 5 | 0 | 0 |
| Foreign | 0 | 0 | 0 | 0 | 10 | 5 | |
|
| Iranian | 7 | 4 | 5 | 4 | Noise | – |
| Foreign | 3 | 2 | 5 | 4 | Noise | – | |
|
| Iranian | 10 | 5 | 10 | 5 | 0 | 0 |
| Foreign | 0 | 0 | 0 | 0 | 10 | 5 | |
|
| Iranian | 10 | 5 | 10 | 5 | 0 | 0 |
| Foreign | 0 | 0 | 0 | 0 | 10 | 5 | |
|
| Iranian | 7 | 5 | 5 | 5 | Noise | – |
| Foreign | 3 | 0 | 5 | 5 | Noise | – | |
|
| Iranian | 10 | 5 | 10 | 5 | 0 | 0 |
| Foreign | 0 | 0 | 0 | 0 | 10 | 5 | |
Figure 1Application of K-Medoids to the SVM was able to categorize each cultivar into right cluster.
Figure 2Decision Tree generated from three models ran with Gini Index criterion.
As may be inferred from the figure, UBC841A4 and UBC868A7 fragments were the most important attribute alleles in distinguishing Iranian from foreign cultivars.
The accuracies, precisions and recalls of tree induction models on Final Cleaned database (FCdb) computed on 5-fold cross validation.
| Models | Algorithm | Gain Ratio | Information Gain | Gini Index | Accuracy |
|
|
| 70 | 70 | 70 | 70 |
|
| 60 | 60 | 60 | 60 | |
|
| 80 | 80 | 80 | 80 | |
|
| 75 | 75 | 75 | 75 | |
|
| 66.7 | 66.7 | 66.7 | 66.7 | |
|
|
| 70 | 70 | 50 | 50 |
|
| 60 | 60 | 0 | 0 | |
|
| 80 | 80 | 100 | 100 | |
|
| 75 | 75 | unknown | unknown | |
|
| 66.7 | 66.7 | 50 | 50 | |
|
|
| 70 | 70 | 50 | 50 |
|
| 60 | 60 | 0 | 0 | |
|
| 80 | 80 | 100 | 100 | |
|
| 75 | 75 | unknown | Unknown | |
|
| 66.7 | 66.7 | 50 | 50 | |
|
|
| 70 | 70 | 70 | 70 |
|
| 60 | 60 | 60 | 60 | |
|
| 80 | 80 | 80 | 80 | |
|
| 75 | 75 | 75 | 75 | |
|
| 66.7 | 66.7 | 66.7 | 66.7 | |
|
|
| 70 | 70 | 70 | 70 |
|
| 60 | 60 | 60 | 60 | |
|
| 80 | 80 | 80 | 80 | |
|
| 75 | 75 | 75 | 75 | |
|
| 66.7 | 66.7 | 66.7 | 66.7 |
Figure 3Kernel distribution model distinguishing between two classes of Olive cultivares based on allele attribute type.