| Literature DB >> 29724166 |
Deborah Galpert1, Alberto Fernández2, Francisco Herrera2, Agostinho Antunes3,4, Reinaldo Molina-Ruiz5, Guillermin Agüero-Chapin6,7,8.
Abstract
BACKGROUND: The development of new ortholog detection algorithms and the improvement of existing ones are of major importance in functional genomics. We have previously introduced a successful supervised pairwise ortholog classification approach implemented in a big data platform that considered several pairwise protein features and the low ortholog pair ratios found between two annotated proteomes (Galpert, D et al., BioMed Research International, 2015). The supervised models were built and tested using a Saccharomycete yeast benchmark dataset proposed by Salichos and Rokas (2011). Despite several pairwise protein features being combined in a supervised big data approach; they all, to some extent were alignment-based features and the proposed algorithms were evaluated on a unique test set. Here, we aim to evaluate the impact of alignment-free features on the performance of supervised models implemented in the Spark big data platform for pairwise ortholog detection in several related yeast proteomes.Entities:
Keywords: Big data; Imbalance data; Ortholog detection; Pairwise protein similarity measures; Supervised classification
Mesh:
Substances:
Year: 2018 PMID: 29724166 PMCID: PMC5934817 DOI: 10.1186/s12859-018-2148-8
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Flowchart of Spark imbalanced big data classification pairwise ortholog detection algorithms
Big data supervised algorithms, imbalance management pre-processing methods and parameter values considered in this paper
| N | Algorithms | Pre-processing | Parameter values | |
|---|---|---|---|---|
| 1 | Spark Random Foresta | ROS/RUS | ||
| 2 | Spark Decision Treesb | ROS/RUS | ||
| 3 | Spark Support Vector Machinesc | ROS | ||
| 4 | Spark Logistic Regressiond | ROS | ||
| 5 | Spark Naive Bayese | ROS | ||
| 6 | MapReduce Random Forestsf | ROS | ||
ROS: Random Oversampling, RUS: Random Undersampling
a https://spark.apache.org/docs/latest/mllib-ensembles.html
b https://spark.apache.org/docs/latest/mllib-decision-tree.html
c https://spark.apache.org/docs/latest/mllib-linear-methods.html#linear-support-vector-machines-svms
d https://spark.apache.org/docs/latest/mllib-linear-methods.html#logistic-regression
https://spark.apache.org/docs/latest/mllib-naive-bayes.html
Random Forest implementation available in https://mahout.apache.org/
Unsupervised reference algorithms and parameter values proposed in [36]
| Algorithms | Parameter values |
|---|---|
| Reciprocal Best Hits (RBH)a | |
| Reciprocal Smallest Distance (RSD)b | |
| Orthologous MAtrix (OMA)c | Default parameter values |
aMatlab script and BLAST program available in http://www.ncbi.nlm.nih.gov/BLAST/
bPhyton script available in https://pypi.python.org/pypi/reciprocal_smallest_distance/1.1.4/
cStand-alone version available in http://omabrowser.org/standalone/OMA.0.99z.3.tgz
Datasets used in the experiments
| Dataset id | Proteome pair | Number of protein features | Protein pair per class (non-orthologs; orthologs) | Imbalance ratio |
|---|---|---|---|---|
|
|
| 29 | (31,218,485; 3062) | 10,195.456 |
|
|
| 29 | (30,562,272; 2843) | 10,750.008 |
|
|
| 29 | (27,778,732; 1573) | 17,659.715 |
|
|
| 29 | (27,772,372; 2606) | 10,657.088 |
The AUC and G-Mean values of all the algorithms (supervised and unsupervised) in testing datasets
| Algorithm/Dataset | Alignment-based Features | Alignment-free Features | Alignment-based + Alignment-free Features | |||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
| |||||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
| Supervised Algorithms | ||||||||||||||||||
| Spark Random Forest MLlib 1.6 ( | ||||||||||||||||||
| Normal | 0.3853 | 0.3119 | 0.3421 | 0.5742 | 0.5486 | 0.5585 | 0.0000 | 0.0000 | 0.0000 | 0.5000 | 0.5000 | 0.5000 | 0.6647 | 0.1009 | 0.6104 | 0.7209 | 0.5051 | 0.6863 |
| ROS-100 | 0.9962 | 0.9941 | 0.9966 | 0.9962 | 0.9941 | 0.9966 | 0.9375 | 0.9139 | 0.9186 | 0.9375 | 0.9148 | 0.9189 | 0.9972 | 0.9917 | 0.9950 | 0.9972 | 0.9917 | 0.9950 |
| ROS-130 |
|
| 0.9974 |
|
| 0.9974 | 0.9313 | 0.9162 | 0.9166 | 0.9315 | 0.9166 | 0.9166 | 0.9958 | 0.9929 | 0.9945 | 0.9958 | 0.9930 | 0.9945 |
| RUS | 0.9974 | 0.9953 |
| 0.9974 | 0.9953 |
| 0.9325 | 0.8917 | 0.9152 | 0.9325 | 0.8941 | 0.9153 | 0.9973 | 0.9950 | 0.9973 | 0.9973 | 0.9950 | 0.9973 |
| Spark Random Forest MLlib 1.6 ( | ||||||||||||||||||
| Normal | 0.7457 | 0.0365 | 0.3809 | 0.7780 | 0.1192 | 0.5725 | 0.0000 | 0.0000 | 0.0000 | 0.5000 | 0.0858 | 0.5000 | 0.6001 | 0.0064 | 0.3195 | 0.6801 | 0.0064 | 0.5510 |
| ROS-100 | 0.9971 | 0.9948 | 0.9969 | 0.9971 | 0.9948 | 0.9969 | 0.9333 |
| 0.9097 | 0.9333 |
| 0.9106 | 0.9971 | 0.9947 | 0.9965 | 0.9971 | 0.9947 | 0.9965 |
| ROS-130 | 0.9974 | 0.9950 | 0.9967 | 0.9974 | 0.9950 | 0.9967 | 0.9267 | 0.9101 | 0.9087 | 0.9267 | 0.9108 | 0.9088 | 0.9975 |
| 0.9945 | 0.9975 |
| 0.9945 |
| RUS |
| 0.9949 | 0.9976 |
| 0.9949 | 0.9976 | 0.9396 | 0.9081 | 0.9202 | 0.9397 | 0.9097 | 0.9207 | 0.9974 | 0.9948 |
| 0.9974 | 0.9948 |
|
| Spark Decision Trees MLlib 1.6 | ||||||||||||||||||
| Normal | 0.3751 | 0.2983 | 0.3301 | 0.5703 | 0.5445 | 0.5545 | 0.3848 | 0.0252 | 0.3548 | 0.5740 | 0.5003 | 0.5629 | 0.6505 | 0.5017 | 0.6107 | 0.7115 | 0.6259 | 0.6865 |
| ROS-100 | 0.9973 | 0.9941 | 0.9960 | 0.9973 | 0.9941 | 0.9960 |
| 0.9153 | 0.9258 |
| 0.9157 | 0.9262 |
| 0.9483 | 0.9954 |
| 0.9495 | 0.9954 |
| ROS-130 | 0.9957 | 0.9906 | 0.9961 | 0.9957 | 0.9906 | 0.9961 | 0.9464 | 0.8993 | 0.9293 | 0.9465 | 0.9002 | 0.9293 | 0.9972 | 0.9449 | 0.9965 | 0.9972 | 0.9463 | 0.9965 |
| RUS | 0.9970 | 0.9936 | 0.9975 | 0.9970 | 0.9936 | 0.9975 | 0.9473 | 0.9156 |
| 0.9473 | 0.9158 |
| 0.9971 | 0.9720 | 0.9966 | 0.9971 | 0.9723 | 0.9966 |
| Spark Support Vector Machines MLlib 1.6 | ||||||||||||||||||
| Normal (0.0) | 0.0000 | 0.0000 | 0.0000 | 0.5000 | 0.5000 | 0.5000 | 0.0000 | 0.0000 | 0.0000 | 0.5000 | 0.5000 | 0.5000 | 0.0000 | 0.0000 | 0.0000 | 0.5000 | 0.5000 | 0.5000 |
| Normal (0.5) | 0.0000 | 0.0000 | 0.0000 | 0.5000 | 0.5000 | 0.5000 | 0.0000 | 0.0000 | 0.0000 | 0.5000 | 0.5000 | 0.5000 | 0.0000 | 0.0000 | 0.0000 | 0.5000 | 0.5000 | 0.5000 |
| Normal (1.0) | 0.0000 | 0.0000 | 0.0000 | 0.5000 | 0.5000 | 0.5000 | 0.0000 | 0.0000 | 0.0000 | 0.5000 | 0.5000 | 0.5000 | 0.0000 | 0.0000 | 0.0000 | 0.5000 | 0.5000 | 0.5000 |
| ROS-100 (0.0) | 0.0000 | 0.0000 | 0.0000 | 0.5000 | 0.5000 | 0.5000 | 0.8486 | 0.8467 | 0.8482 | 0.8517 | 0.8482 | 0.8496 | 0.9682 | 0.9581 | 0.9677 | 0.9684 | 0.9585 | 0.9679 |
| ROS-100 (0.5) | 0.0000 | 0.0000 | 0.0000 | 0.5000 | 0.5000 | 0.5000 | 0.0000 | 0.0000 | 0.0000 | 0.5000 | 0.5000 | 0.5000 | 0.0000 | 0.0000 | 0.0000 | 0.5000 | 0.5000 | 0.5000 |
| ROS-100 (1.0) | 0.0000 | 0.0000 | 0.0000 | 0.5000 | 0.5000 | 0.5000 | 0.0000 | 0.0000 | 0.0000 | 0.5000 | 0.5000 | 0.5000 | 0.0000 | 0.0000 | 0.0000 | 0.5000 | 0.5000 | 0.5000 |
| ROS-130 (0.0) | 0.0000 | 0.0000 | 0.0000 | 0.5000 | 0.5000 | 0.5000 | 0.7719 | 0.7786 | 0.7779 | 0.7929 | 0.7950 | 0.7961 | 0.9708 | 0.9612 | 0.9683 | 0.9709 | 0.9615 | 0.9685 |
| ROS-130 (0.5) | 0.0000 | 0.0000 | 0.0000 | 0.5000 | 0.5000 | 0.5000 | 0.0000 | 0.0000 | 0.0000 | 0.5000 | 0.5000 | 0.5000 | 0.0000 | 0.0000 | 0.0000 | 0.5000 | 0.5000 | 0.5000 |
| ROS-130 (1.0) | 0.0000 | 0.0000 | 0.0000 | 0.5000 | 0.5000 | 0.5000 | 0.0000 | 0.0000 | 0.0000 | 0.5000 | 0.5000 | 0.5000 | 0.0000 | 0.0000 | 0.0000 | 0.5000 | 0.5000 | 0.5000 |
| Spark Logistic Regression MLlib 1.6 | ||||||||||||||||||
| Normal | 0.0000 | 0.0000 | 0.0000 | 0.5000 | 0.5000 | 0.5000 | 0.0000 | 0.0000 | 0.0000 | 0.5000 | 0.5000 | 0.5000 | 0.0000 | 0.0000 | 0.0000 | 0.5000 | 0.5000 | 0.5000 |
| ROS-100 | 0.3994 | 0.3663 | 0.3943 | 0.5012 | 0.4848 | 0.4981 | 0.2861 | 0.2867 | 0.2725 | 0.5028 | 0.5032 | 0.4989 | 0.0815 | 0.0665 | 0.0677 | 0.5007 | 0.4995 | 0.4996 |
| ROS-130 | 0.4056 | 0.3925 | 0.4060 | 0.5006 | 0.5089 | 0.5003 | 0.3008 | 0.3091 | 0.2954 | 0.5027 | 0.5054 | 0.5012 | 0.1416 | 0.1173 | 0.1274 | 0.5018 | 0.4987 | 0.4999 |
| Spark Naive Bayes MLlib 1.6 | ||||||||||||||||||
| Normal | 0.0000 | 0.0000 | 0.0000 | 0.5000 | 0.5000 | 0.5000 | 0.0000 | 0.0000 | 0.0000 | 0.5000 | 0.5000 | 0.5000 | 0.0000 | 0.0000 | 0.0000 | 0.5000 | 0.5000 | 0.5000 |
| ROS-100 | 0.4070 | 0.3943 | 0.4002 | 0.4990 | 0.4981 | 0.4949 | 0.4182 | 0.4371 | 0.4164 | 0.5009 | 0.5113 | 0.4999 | 0.1365 | 0.1498 | 0.1180 | 0.4996 | 0.5016 | 0.4972 |
| ROS-130 | 0.0171 | 0.4060 | 0.0172 | 0.5001 | 0.5003 | 0.5001 | 0.4823 | 0.4991 | 0.4825 | 0.4997 | 0.5202 | 0.4985 | 0.2067 | 0.2163 | 0.1953 | 0.5003 | 0.5024 | 0.4979 |
| MapReduce Random Forest Mahout 0.9 | ||||||||||||||||||
| Normal | 0.7178 | 0.6652 | 0.6864 | 0.7576 | 0.7212 | 0.7356 | ||||||||||||
| ROS-100 | 0.9903 | 0.9786 | 0.9859 | 0.9903 | 0.9789 | 0.9860 | ||||||||||||
| ROS-130 | 0.9905 | 0.9783 | 0.9846 | 0.9905 | 0.9785 | 0.9847 | ||||||||||||
| Unsupervised Algorithms | ||||||||||||||||||
| RBH | 0.8069 | 0.8052 | 0.8491 | 0.8255 | 0.8242 | 0.8605 | ||||||||||||
| RSD 0.2 1e-20 | 0.9309 | 0.9038 | 0.9654 | 0.9333 | 0.9092 | 0.966 | ||||||||||||
| RSD 0.5 1e-10 | 0.9426 | 0.9277 | 0.9818 | 0.9442 | 0.9294 | 0.9819 | ||||||||||||
| RSD 0.8 1e-05 |
|
|
|
|
|
| ||||||||||||
| OMA | 0.7311 | 0.7264 | 0.9388 | 0.7673 | 0.9163 | 0.9407 | ||||||||||||
Supervised algorithm performance is presented for the alignment-based, alignment-free and alignment-based + alignment-free feature combinations. The best results in each dataset are in bold face and the general best results are underlined. The Random Oversampling pre-processing (ROS) is accompanied by the corresponding resampling size value. RSD parameter values are the divergence and the E-value thresholds. Support Vector Machines are represented with their regulation parameter values
Percentage of true positives (%TP) identified by both outstanding supervised and unsupervised classifiers when detecting ortholog pairs in the twilight zone (< 30% of identity)
| Algorithm/Dataset | Alignment-based Features | Alignment-free Features | Alignment-based + Alignment-free Features | ||||||
|---|---|---|---|---|---|---|---|---|---|
| %TP | %TP | %TP | |||||||
| Supervised Algorithms |
|
|
|
|
|
|
|
|
|
| Spark Random Forest MLlib 1.6 | |||||||||
| Normal | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 3.54 | 0.00 | 2.25 |
| ROS-100 | 97.43 | 96.26 | 98.03 | 71.06 | 64.29 | 57.87 | 96.14 | 91.84 | 93.54 |
| ROS-130 | 98.71 |
| 98.31 | 76.21 | 64.97 | 65.45 | 95.18 | 93.88 | 93.54 |
| RUS |
| 96.26 | 98.60 | 74.60 | 64.29 | 61.24 | 96.78 | 93.88 | 95.51 |
| Spark Decision Trees MLlib 1.6 | |||||||||
| Normal | 0.32 | 0.68 | 0.28 | 0.00 | 0.00 | 0.56 | 12.54 | 7.82 | 9.55 |
| ROS-100 | 95.18 | 94.56 | 97.19 | 72.67 | 62.93 | 55.62 | 97.75 | 84.69 | 96.07 |
| ROS-130 | 95.82 | 91.50 | 97.47 | 79.74 | 61.56 | 63.48 | 98.71 | 87.41 | 96.35 |
| RUS | 98.07 | 95.24 |
| 76.53 | 67.01 | 65.45 | 98.07 | 90.82 | 97.47 |
| Unsupervised Algorithms | |||||||||
| RBH | 57.56 | 58.84 | 73.31 | ||||||
| RSD 0.2 1e-20 | 46.95 | 45.92 | 62.36 | ||||||
| RSD 0.5 1e-10 | 61.41 | 61.90 | 80.34 | ||||||
| RSD 0.8 1e-05 | 68.17 | 70.41 | 85.96 | ||||||
| OMA | 42.77 | 45.24 | 46.91 | ||||||
The best results in each dataset are in bold face and the general best results are underlined. Supervised algorithm performance is presented for the alignment-based, alignment-free and alignment-based + alignment-free feature combinations
Run time values (hh:mm:ss) comprising learning and classifying steps obtained by the highest quality Spark supervised algorithms (Decision Trees and Random Forest) together with the corresponding values of the Hadoop MapReduce Random Forest implementation. Supervised algorithm run time values are presented for the alignment-based, alignment-free and alignment-based + alignment-free feature combinations. The Random Oversampling pre-processing (ROS) is accompanied by the corresponding resampling size value
| Algorithm/Dataset | Alignment-based Features | Alignment-free Features | Alignment-based + Alignment-free Features | ||||||
|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
|
| |
| Spark Random Forest MLlib 1.6 | |||||||||
| NORMAL Learn | 00:00:49 | 00:00:57 | 00:00:57 | 00:01:03 | 00:01:05 | 00:01:07 | 00:00:57 | 00:00:57 | 00:01:00 |
| NORMAL Classify | 00:00:19 | 00:00:38 | 00:00:24 | 00:00:31 | 00:00:26 | 00:00:25 | 00:00:34 | 00:00:30 | 00:00:32 |
| ROS-100 Learn | 00:01:43 | 00:02:34 | 00:02:29 | 00:01:48 | 00:01:43 | 00:01:47 | 00:01:48 | 00:01:50 | 00:01:48 |
| ROS-100 Classify | 00:00:20 | 00:00:19 | 00:00:19 | 00:00:33 | 00:00:28 | 00:00:29 | 00:00:33 | 00:00:31 | 00:00:31 |
| ROS-130 Learn | 00:02:09 | 00:02:15 | 00:02:43 | 00:02:03 | 00:01:57 | 00:02:00 | 00:02:06 | 00:02:03 | 00:01:57 |
| ROS-130 Classify | 00:00:19 | 00:00:18 | 00:00:18 | 00:00:39 | 00:00:30 | 00:00:34 | 00:00:41 | 00:00:31 | 00:00:34 |
| RUS Learn |
|
|
|
|
|
|
|
|
|
| RUS Classify |
|
|
|
|
|
|
|
|
|
| Spark Decision Trees MLlib 1.6 | |||||||||
| NORMAL Learn | 00:00:31 | 00:00:31 | 00:00:35 | 00:00:35 | 00:00:33 | 00:00:35 | 00:00:49 | 00:00:38 | 00:00:40 |
| NORMAL Classify | 00:00:13 | 00:00:12 | 00:00:15 | 00:00:23 | 00:00:20 | 00:00:20 | 00:00:25 | 00:00:25 | 00:00:24 |
| ROS-100 Learn |
|
|
|
|
|
|
|
|
|
| ROS-100 Classify |
|
|
|
|
|
|
|
|
|
| ROS-130 Learn | 00:00:57 | 00:00:58 | 00:00:57 | 00:01:14 | 00:01:06 | 00:01:05 | 00:01:15 | 00:01:13 | 00:01:16 |
| ROS-130 Classify | 00:00:12 | 00:00:19 | 00:00:11 | 00:00:23 | 00:00:22 | 00:00:22 | 00:00:25 | 00:00:24 | 00:00:23 |
| RUS Learn |
|
|
|
|
|
|
|
|
|
| RUS Classify |
|
|
|
|
|
|
|
|
|
| MapReduce Random Forest Mahout 0.9 | |||||||||
| NORMAL Learn | 23:25:10 | 23:25:10 | 23:25:10 | ||||||
| NORMAL Classify | 00:14:25 | 00:13:07 | 00:13:04 | ||||||
Feature importance calculated for the highest quality Spark supervised algorithms (Decision Trees (DT) and Random Forest (RF)). The entropy, the number of nodes that included certain features in the Random Forest building with RUS pre-processing and the average impurity decrease of the MLlib 2.0 Random Forest with ROS variants are presented for the alignment-based, alignment-free and alignment-based + alignment-free feature combinations The Random Oversampling pre-processing (ROS) is accompanied by the corresponding resampling size value
| RUS + DT-Spark Weka | RUS + RF-Spark/Gini Weka | RF MLlib 2.0-Spark/Gini (Avg. Impurity Decrease) | |||||
|---|---|---|---|---|---|---|---|
| Entropy | Avg. Impurity Decrease | Number of Nodes | Normal | ROS-100 | ROS-130 | RUS | |
| Alignment-based Features/Algorithm | |||||||
| |
|
| 42 |
| 0.180 | 0.175 | 0.171 |
| |
|
|
| 0.035 |
|
|
|
| |
|
|
| 0.043 | 0.167 | 0.167 | 0.167 |
| | 0.732 | 0.290 | 235 | 0.033 | 0.004 | 0.001 | 0.007 |
| | 0.712 | 0.240 |
| 0.080 | 0.008 | 0.010 | 0.008 |
| Alignment-free Features | |||||||
| |
|
|
| 0.033 |
|
|
|
| | 0.000 | 0.310 | 64 | 0.000 | 0.000 | 0.000 | 0.000 |
| | 0.000 | 0.320 | 75 | 0.000 | 0.000 | 0.000 | 0.000 |
| | 0.000 |
| 1124 | 0.000 | 0.000 | 0.000 | 0.001 |
| | 0.408 | 0.310 | 1012 |
|
|
|
|
| |
| 0.300 |
|
| 0.060 | 0.062 | 0.066 |
| | 0.407 | 0.320 |
|
| 0.030 | 0.029 | 0.033 |
| | 0.529 | 0.290 |
|
| 0.028 | 0.035 | 0.036 |
| | 0.265 | 0.310 | 1010 | 0.012 | 0.004 | 0.021 | 0.021 |
| | 0.158 |
| 954 | 0.022 | 0.003 | 0.003 | 0.002 |
| | 0.000 | 0.320 | 847 | 0.000 | 0.000 | 0.000 | 0.000 |
| | 0.000 | 0.310 | 768 | 0.001 | 0.000 | 0.000 | 0.000 |
| | 0.000 | 0.260 | 772 | 0.000 | 0.000 | 0.000 | 0.001 |
| | 0.078 |
|
| 0.064 | 0.006 | 0.005 | 0.006 |
| | 0.000 | 0.290 | 600 | 0.001 | 0.000 | 0.000 | 0.001 |
| | 0.000 | 0.270 | 653 | 0.001 | 0.000 | 0.000 | 0.001 |
| | 0.000 | 0.270 | 602 | 0.002 | 0.000 | 0.000 | 0.001 |
| |
|
|
|
|
|
|
|
| | 0.109 | 0.260 | 902 | 0.009 | 0.000 | 0.000 | 0.001 |
| | 0.000 | 0.240 | 825 | 0.000 | 0.000 | 0.000 | 0.001 |
| |
|
|
| 0.022 |
|
|
|
| |
|
|
|
|
|
|
|
| | 0.280 | 0.240 | 1054 | 0.075 | 0.035 | 0.018 | 0.020 |
| | 0 | 0.250 | 513 | 0.001 | 0.000 | 0.000 | 0.001 |
| Alignment-based + Alignment-free Features/Algorithm | |||||||
| |
| 0.280 | 131 |
|
|
|
|
| |
|
|
| 0.005 |
|
|
|
| |
| 0.280 |
| 0.005 | 0.098 |
|
|
| |
| 0.290 |
|
|
|
|
|
| |
| 0.260 | 229 | 0.004 | 0.083 |
|
|
| |
| 0.190 |
|
| 0.073 |
|
|
| | 0.000 | 0.300 | 11 | 0.000 | 0.000 | 0.000 | 0.000 |
| | 0.000 | 0.270 | 11 | 0.000 | 0.000 | 0.000 | 0.000 |
| | 0.000 |
| 147 | 0.001 | 0.000 | 0.000 | 0.000 |
| | 0.411 | 0.360 | 109 | 0.005 | 0.000 | 0.000 | 0.000 |
| |
| 0.340 |
|
| 0.032 |
|
|
| | 0.411 | 0.390 | 151 | 0.009 | 0.002 | 0.001 | 0.001 |
| | 0.531 | 0.320 | 164 | 0.001 | 0.002 | 0.003 | 0.004 |
| | 0.260 | 0.300 | 154 | 0.005 | 0.000 | 0.000 | 0.001 |
| | 0.155 | 0.200 | 81 | 0.003 | 0.000 | 0.000 | 0.000 |
| | 0.000 |
| 104 | 0.000 | 0.000 | 0.000 | 0.000 |
| | 0.000 |
| 98 | 0.000 | 0.000 | 0.000 | 0.000 |
| | 0.000 |
| 82 | 0.001 | 0.000 | 0.000 | 0.000 |
| | 0.074 | 0.230 | 97 |
| 0.000 | 0.000 | 0.000 |
| | 0.000 | 0.390 | 69 | 0.000 | 0.000 | 0.000 | 0.000 |
| | 0.000 | 0.340 | 49 | 0.001 | 0.000 | 0.000 | 0.000 |
| | 0.000 | 0.390 | 59 | 0.001 | 0.000 | 0.000 | 0.000 |
| |
| 0.230 |
|
| 0.012 |
|
|
| | 0.113 | 0.320 | 101 | 0.001 | 0.000 | 0.000 | 0.001 |
| | 0.000 | 0.310 | 97 | 0.001 | 0.000 | 0.000 | 0.000 |
| |
| 0.190 | 142 | 0.009 | 0.006 | 0.007 | 0.004 |
| |
| 0.210 | 147 | 0.001 | 0.005 | 0.005 |
|
| | 0.286 | 0.270 | 108 |
| 0.001 | 0.001 | 0.000 |
| | 0.000 | 0.340 | 47 | 0.000 | 0.000 | 0.000 | 0.000 |
nw: global alignment, sw: local alignment, profile: physicochemical profile from matching regions of aligned sequences at different window sizes (3, 5 and 7), aac: amino acid composition, pseacc: pseudo-amino acid composition at λ = 3,4 and 10, Auto_Geary: Geary’s auto correlation, Auto_Moran: Moran’s auto correlation, Auto_Total: Total auto correlation, fcm: four-color maps, nandy: Nandy’s descriptors, CTD: Composition, Distribution and Transition (Total), CTD_C: Composition, Distribution and Transition (Composition), CTD_D: Composition, Distribution and Transition (Distributions), CTD_T: Composition, Distribution and Transition (Transition), k-mers: 2-mers, 3-mers, spaced words: 2-mers with “don’t care positions” = 1, 2 and 3; 3-mer with “don’t care positions” = 1, 2, 3, QSO: Quasi-Sequence-Order, w = weight factor and maximum lag = 30