| Literature DB >> 35453806 |
Zhandong Li1, Wei Guo2, Shijian Ding3, Kaiyan Feng4, Lin Lu5, Tao Huang6,7, Yudong Cai3.
Abstract
Radiotherapy is a helpful treatment for cancer, but it can also potentially cause changes in many molecules, resulting in adverse effects. Among these changes, the occurrence of abnormal DNA methylation patterns has alarmed scientists. To explore the influence of region-specific radiotherapy on blood DNA methylation, we designed a computational workflow by using machine learning methods that can identify crucial methylation alterations related to treatment exposure. Irrelevant methylation features from the DNA methylation profiles of 2052 childhood cancer survivors were excluded via the Boruta method, and the remaining features were ranked using the minimum redundancy maximum relevance method to generate feature lists. These feature lists were then fed into the incremental feature selection method, which uses a combination of deep forest, k-nearest neighbor, random forest, and decision tree to find the most important methylation signatures and build the best classifiers and classification rules. Several methylation signatures and rules have been discovered and confirmed, allowing for a better understanding of methylation patterns in response to different treatment exposures.Entities:
Keywords: childhood cancer radiotherapy; feature selection; machine learning method; methylation; rule learning
Year: 2022 PMID: 35453806 PMCID: PMC9030135 DOI: 10.3390/biology11040607
Source DB: PubMed Journal: Biology (Basel) ISSN: 2079-7737
Sample size of patients treated with different radiotherapy (RT).
| Dataset | Positive Sample | Negative Sample | Total |
|---|---|---|---|
| Abdominal RT | 412 | 1640 | 2052 |
| Brain RT | 629 | 1423 | 2052 |
| Chest RT | 577 | 1475 | 2052 |
| Pelvic RT | 352 | 1700 | 2052 |
Figure 1Computational workflow of this study. The methylation dataset was acquired in four sections from a public database: abdominal RT, brain RT, chest RT, and pelvic RT. The methylation features in each methylation profile were filtered and ranked using the Boruta feature selection and mRMR methods. The IFS method used the resulting feature list to identify the optimal number of features and develop the best classifiers and classification rules by combining SMOTE and classification algorithms.
Figure 2IFS curves with different classifiers on the different numbers of methylation features for abdominal RT methylation dataset. DF achieves the highest MCC value of 0.895 when the top 744 features are used.
Figure 3IFS curves with different classifiers on the different numbers of methylation features for brain RT methylation dataset. DF attains the highest MCC value of 0.686 when the top 128 features are utilized.
Figure 4IFS curves with different classifiers on the different numbers of methylation features for chest RT methylation dataset. DF reaches the highest MCC value of 0.812 when the top 691 features are adopted.
Figure 5IFS curves with different classifiers on the different numbers of methylation features for pelvic RT methylation dataset. DF yields the highest MCC value of 0.914 when the top 155 features are employed.
Detailed performance of different classifiers on four methylation datasets.
| Dataset | Classifiers | Number of Features | Accuracy | Sensitivity | Specificity | MCC |
|---|---|---|---|---|---|---|
| Abdominal RT | DF | 744 | 0.966 | 0.910 | 0.980 | 0.895 |
| kNN | 10 | 0.846 | 0.971 | 0.814 | 0.662 | |
| RF | 753 | 0.903 | 0.913 | 0.900 | 0.739 | |
| DT | 761 | 0.791 | 0.825 | 0.783 | 0.515 | |
| Brain RT | DF | 128 | 0.869 | 0.736 | 0.928 | 0.686 |
| kNN | 8 | 0.749 | 0.863 | 0.699 | 0.519 | |
| RF | 115 | 0.811 | 0.765 | 0.832 | 0.577 | |
| DT | 150 | 0.690 | 0.693 | 0.688 | 0.355 | |
| Chest RT | DF | 691 | 0.925 | 0.828 | 0.963 | 0.812 |
| kNN | 12 | 0.804 | 0.945 | 0.749 | 0.627 | |
| RF | 234 | 0.851 | 0.823 | 0.862 | 0.654 | |
| DT | 489 | 0.762 | 0.747 | 0.768 | 0.478 | |
| Pelvic RT | DF | 155 | 0.976 | 0.923 | 0.986 | 0.914 |
| kNN | 9 | 0.841 | 0.977 | 0.813 | 0.637 | |
| RF | 31 | 0.896 | 0.906 | 0.894 | 0.702 | |
| DT | 77 | 0.798 | 0.795 | 0.798 | 0.487 |
Total number of rules and the number of rules with each category generated by the optimal DT classifier in the four datasets.
| Dataset | Number of Rules | Number of Rules for Positive Class | Number of Rules for Negative Class |
|---|---|---|---|
| Abdominal RT | 151 | 87 | 64 |
| Brain RT | 239 | 132 | 107 |
| Chest RT | 166 | 93 | 73 |
| Pelvic RT | 183 | 99 | 84 |