| Literature DB >> 29899414 |
Mohammad Reza Bakhtiarizadeh1, Maryam Rahimi2, Abdollah Mohammadi-Sangcheshmeh2, Vahid Shariati J3, Seyed Alireza Salami4.
Abstract
Successful spermatogenesis and oogenesis are the two genetically independent processes preceding embryo development. To date, several fertility-related proteins have been described in mammalian species. Nevertheless, further studies are required to discover more proteins associated with the development of germ cells and embryogenesis in order to shed more light on the processes. This work builds on our previous software (OOgenesis_Pred), mainly focusing on algorithms beyond what was previously done, in particular new fertility-related proteins and their classes (embryogenesis, spermatogenesis and oogenesis) based on the support vector machine according to the concept of Chou's pseudo-amino acid composition features. The results of five-fold cross validation, as well as the independent test demonstrated that this method is capable of predicting the fertility-related proteins and their classes with accuracy of more than 80%. Moreover, by using feature selection methods, important properties of fertility-related proteins were identified that allowed for their accurate classification. Based on the proposed method, a two-layer classifier software, named as "PrESOgenesis" ( https://github.com/mrb20045/PrESOgenesis ) was developed. The tool identified a query sequence (protein or transcript) as fertility or non-fertility-related protein at the first layer and then classified the predicted fertility-related protein into different classes of embryogenesis, spermatogenesis or oogenesis at the second layer.Entities:
Mesh:
Substances:
Year: 2018 PMID: 29899414 PMCID: PMC5998058 DOI: 10.1038/s41598-018-27338-9
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Summary of our pipeline for developing PrESOgenesis.
Five-fold cross-validation and Independent evaluation (IE) test results of the SVM method for oogenesis datasets.
| Datasets | λ* | Five-fold cross-validation test | Independent evaluation test | ||||||
|---|---|---|---|---|---|---|---|---|---|
| Accuracy (%) | Sensitivity (%) | Specificity (%) | MCC (%) | Accuracy (%) | Sensitivity (%) | Specificity (%) | MCC (%) | ||
| 1 | 0.02 | 82.8 | 82.86 | 83.15 | 65.57 | 83.33 | 84.62 | 80.88 | 66.7 |
| 2 | 0.001 | 84.06 | 83.21 | 85.04 | 68.13 | 83.33 | 84.62 | 80.88 | 66.7 |
| 3 | 0.02 | 85.33 | 83.57 | 86.99 | 70.71 | 84.06 | 86.15 | 81.16 | 68.23 |
| 4 | 0.02 | 82.05 | 80.71 | 83.39 | 63.87 | 86.23 | 86.15 | 84.85 | 72.4 |
| 5 | 0.001 | 81.89 | 81.79 | 82.37 | 63.76 | 82.61 | 84.62 | 79.71 | 65.32 |
| Average | 0.01 | 84 | 83 | 85 | 79.4 | 84 | 86 | 82 | 67.87 |
*The optimum λ parameter value of kernel function of SVM using a grid-search technique based on five-fold cross-validation. Also, the optimum parameter C value was obtained 100 in all of models.
Five-fold cross-validation and Independent evaluation (IE) test results of the SVM method for general datasets.
| Datasets | λ* | Five-fold cross-validation test | Independent evaluation test | ||||||
|---|---|---|---|---|---|---|---|---|---|
| Accuracy (%) | Sensitivity (%) | Specificity (%) | MCC (%) | Accuracy (%) | Sensitivity (%) | Specificity (%) | MCC (%) | ||
| 1 | 0.001 | 82.94 | 84.27 | 82.48 | 65.87 | 81.02 | 80.41 | 79.48 | 64.96 |
| 2 | 0.01 | 82.56 | 84.34 | 81.84 | 70.1 | 84.46 | 83.92 | 83.19 | 68.82 |
| 3 | 0.03 | 82.21 | 82.85 | 82.23 | 64.42 | 82.26 | 82.75 | 80.17 | 64.48 |
| 4 | 0.04 | 83.31 | 83.46 | 83.63 | 66.62 | 82.53 | 81.29 | 81.52 | 64.93 |
| 5 | 0.05 | 83.87 | 82.85 | 84.98 | 67.76 | 83.77 | 80.12 | 84.57 | 67.41 |
| Average | 0.03 | 82.97 | 83.55 | 83.03 | 66.95 | 82.88 | 81.69 | 81.78 | 66.12 |
*The optimum λ parameter value of kernel function of SVM using a grid-search technique based on five-fold cross-validation. Also, the optimum parameter C value was obtained 100 in all of models.
The top 22 important features selected by attribute weighting feature selection method for general dataset.
| order | Descriptor | Protein feature | Feature group |
|---|---|---|---|
| 1 |
| Serine | Amino Acid Composition |
| 2 | I | Isoleucine | Amino Acid Composition |
| 3 | IA | Dipeptide Composition (Isoleucine-Alanine) | Amino Acid Composition |
| 4 | solventaccess.Group1 | Solvent Accessibility attribute of Composition | CTD |
| 5 | solventaccess.Group3 | Solvent Accessibility attribute of Composition | CTD |
| 6 | Schneider.Xr.S | QSO in QSOD using Schneider-Wrede distance | Quasi-sequence-order |
| 7 | Grantham.Xr.I | QSO in QSOD using normalized Grantham chemical distance | Quasi-sequence-order |
| 8 | Grantham.Xd.1 | QSO in QSOD using normalized Grantham chemical distance | Quasi-sequence-order |
| 9 | prop7.Tr2332 | Solvent Accessibility attribute of Transition | CTD |
| 10 | prop5.G2.residue0 | Charge attribute of Distribution | CTD |
| 11 | prop5.G2.residue25 | Charge attribute of Distribution | CTD |
| 12 | prop5.G2.residue50 | Charge attribute of Distribution | CTD |
| 13 | prop5.G2.residue75 | Charge attribute of Distribution | CTD |
| 14 | prop5.G2.residue100 | Charge attribute of Distribution | CTD |
| 15 | VS333 | Conjoint Triad | Conjoint Triad |
| 16 | prop2.G1.residue0 | Normalized van der Waals Volume attribute of Distribution | CTD |
| 17 | prop2.G1.residue25 | Normalized van der Waals Volume attribute of Distribution | CTD |
| 18 | prop2.G1.residue50 | Normalized van der Waals Volume attribute of Distribution | CTD |
| 19 | prop2.G1.residue75 | Normalized van der Waals Volume attribute of Distribution | CTD |
| 20 | prop2.G1.residue100 | Normalized van der Waals Volume attribute of Distribution | CTD |
| 21 | Schneider.Xr.I | QSO in QSOD using Schneider-Wrede distance | Quasi-sequence-order |
| 22 | Grantham.Xr.S | QSO in QSOD using normalized Grantham chemical distance | Quasi-sequence-order |
Five-fold cross-validation and independent evaluation test results of the SVM method for general datasets with selected features.
| Datasets | λ* | Five-fold cross-validation test | Independent evaluation test | ||||||
|---|---|---|---|---|---|---|---|---|---|
| Accuracy (%) | Sensitivity (%) | Specificity (%) | MCC (%) | Accuracy (%) | Sensitivity (%) | Specificity (%) | MCC (%) | ||
| 1 | 0.05 | 79.95 | 82.1 | 79.15 | 59.9 | 79.5 | 80.12 | 77.18 | 58.99 |
| 2 | 0.08 | 79.74 | 80.88 | 79.53 | 59.46 | 77.99 | 77.78 | 76 | 55.9 |
| 3 | 0.09 | 80.12 | 81.29 | 79.88 | 60.22 | 77.03 | 78.36 | 74.24 | 54.11 |
| 4 | 0.04 | 79.91 | 82.24 | 79.02 | 59.68 | 77.58 | 78.07 | 75.21 | 55.13 |
| 5 | 0.09 | 80.19 | 81.9 | 79.63 | 60.37 | 79.5 | 78.36 | 78.13 | 50.91 |
| Average | 0.07 | 79.98 | 81.68 | 79.44 | 59.92 | 78.32 | 78.53 | 76.15 | 55 |
*The optimum λ parameter value of kernel function of SVM using a grid-search technique based on five-fold cross-validation. Also, the optimum parameter C value was obtained 100 in all of models.
Five-fold cross-validation and Independent evaluation (IE) test results of the SVM method for spermatogenesis datasets.
| Datasets | λ* | Five-fold cross-validation test | Independent evaluation test | ||||||
|---|---|---|---|---|---|---|---|---|---|
| Accuracy (%) | Sensitivity (%) | Specificity (%) | MCC (%) | Accuracy (%) | Sensitivity (%) | Specificity (%) | MCC (%) | ||
| 1 | 0.03 | 82.15 | 82.12 | 82.59 | 64.28 | 85.99 | 85.12 | 85.12 | 71.88 |
| 2 | 0.03 | 85.17 | 81.92 | 88.02 | 70.53 | 84.05 | 84.3 | 82.26 | 68.04 |
| 3 | 0.04 | 82.73 | 81.54 | 83.94 | 65.47 | 84.44 | 84.3 | 82.93 | 68.8 |
| 4 | 0.03 | 83.61 | 82.88 | 84.51 | 67.23 | 88.33 | 86.78 | 88.24 | 76.56 |
| 5 | 0.05 | 84.1 | 81.92 | 86.06 | 68.29 | 81.71 | 80.99 | 80.33 | 63.31 |
| Average | 0.04 | 83.55 | 82.07 | 85.02 | 67.16 | 84.9 | 84.29 | 83.77 | 69.71 |
*The optimum λ parameter value of kernel function of SVM using a grid-search technique based on five-fold cross-validation. Also, the optimum parameter C value was obtained 100 in all of models.
Five-fold cross-validation and Independent evaluation (IE) test results of the SVM method for embryogenesis datasets.
| Datasets | λ* | Five-fold cross-validation test | Independent evaluation test | ||||||
|---|---|---|---|---|---|---|---|---|---|
| Accuracy (%) | Sensitivity (%) | Specificity (%) | MCC (%) | Accuracy (%) | Sensitivity (%) | Specificity (%) | MCC (%) | ||
| 1 | 0.02 | 80.23 | 80.74 | 80.38 | 45.08 | 80.12 | 81.41 | 77.44 | 47.33 |
| 2 | 0.03 | 81.05 | 79.7 | 82.39 | 62.14 | 79.22 | 77.56 | 78.06 | 58.26 |
| 3 | 0.001 | 80.75 | 80.59 | 81.32 | 65.2 | 83.43 | 78.85 | 84.83 | 69.42 |
| 4 | 0.03 | 82.33 | 81.19 | 83.54 | 67.79 | 78.92 | 79.49 | 76.54 | 62.23 |
| 5 | 0.001 | 81.43 | 81.19 | 82.04 | 62.85 | 80.42 | 82.05 | 77.58 | 60.91 |
| Average | 0.02 | 81.15 | 80.68 | 81.93 | 60.61 | 80.42 | 79.87 | 78.89 | 59.63 |
*The optimum λ parameter value of kernel function of SVM using a grid-search technique based on five-fold cross-validation. Also, the optimum parameter C value was obtained 100 in all of models.