| Literature DB >> 35049823 |
Maoxuan Miao1, Jinran Wu2, Fengjing Cai1, You-Gan Wang2.
Abstract
Selecting the minimal best subset out of a huge number of factors for influencing the response is a fundamental and very challenging NP-hard problem because the presence of many redundant genes results in over-fitting easily while missing an important gene can more detrimental impact on predictions, and computation is prohibitive for exhaust search. We propose a modified memetic algorithm (MA) based on an improved splicing method to overcome the problems in the traditional genetic algorithm exploitation capability and dimension reduction in the predictor variables. The new algorithm accelerates the search in identifying the minimal best subset of genes by incorporating it into the new local search operator and hence improving the splicing method. The improvement is also due to another two novel aspects: (a) updating subsets of genes iteratively until the no more reduction in the loss function by splicing and increasing the probability of selecting the true subsets of genes; and (b) introducing add and del operators based on backward sacrifice into the splicing method to limit the size of gene subsets. Additionally, according to the experimental results, our proposed optimizer can obtain a better minimal subset of genes with a few iterations, compared with all considered algorithms. Moreover, the mutation operator is replaced by it to enhance exploitation capability and initial individuals are improved by it to enhance efficiency of search. A dataset of the body weight of Hu sheep was used to evaluate the superiority of the modified MA against the genetic algorithm. According to our experimental results, our proposed optimizer can obtain a better minimal subset of genes with a few iterations, compared with all considered algorithms including the most advanced adaptive best-subset selection algorithm.Entities:
Keywords: gene selection; local search operator; memetic algorithm; modifications; sheep weight
Year: 2022 PMID: 35049823 PMCID: PMC8772977 DOI: 10.3390/ani12020201
Source DB: PubMed Journal: Animals (Basel) ISSN: 2076-2615 Impact factor: 2.752
Figure 1Example of the improved splicing method (a) Individual. (b) Individual Segmentation. The individuals are divided into active sets and inactive sets. (c) Evaluation. The score of each gene in an active set is evaluated in terms of backward sacrifice. (d) and Operators. Some genes with the lowest scores in active sets are deleted, then added into inactive sets. (e) Swap. A gene with the lowest score () in an active set and a gene with the highest score () in an inactive set swap each other. (f) Merge. The active set and inactive set are merged into a new individual.
Figure 2The flow chart of the proposed optimizer.
The characteristics of a dataset for three measures of BWs from Hu sheep.
| Type of BW | Number of Genes | Number of Instances |
|---|---|---|
| Birth weight | 54,183 | 240 |
| Six-month weight | 54,183 | 240 |
| Weaning weight | 54,183 | 240 |
The parameters values used in the proposed method for analyzing the body weights.
| Type of BW |
|
| C |
|
| T | N | s |
|---|---|---|---|---|---|---|---|---|
| Birth Weight | 30 | 0.5 | 0.1 | 0.01 |
| 30 | 20 | 2000 |
| Six-Month Weight | 50 | 0.5 | 1 | 0.01 | 0.01 | 30 | 20 | 2000 |
| Weaning Weight | 50 | 0.2 | 0.1 | 0.01 |
| 30 | 20 | 2000 |
Figure 3Mean squared error with different combinations and for analyzing the body weights on three occasions.
Performance of feature selection methods for predicting the body weights on three occasions.
| Birth Weight | Six-Month Weight | Weaning Weight | ||||
|---|---|---|---|---|---|---|
| Method | MSE | NumF | MSE | NumF | MSE | NumF |
| SVR | 0.2393 | All | 5.6800 | All | 15.9099 | All |
| SWA | 0.2392 | 2053 | 5.6424 | 1947 | 15.8819 | 2018 |
| ABC | 0.2390 | 1890 | 5.6264 | 2025 | 15.8762 | 1947 |
| SCA | 0.2391 | 1954 | 5.6433 | 2022 | 15.8815 | 1964 |
| GA | 0.2393 | 1995 | 5.6470 | 1905 | 15.8831 | 1952 |
| BPSO | 0.2392 | 1923 | 5.6464 | 2080 | 15.8835 | 1922 |
| 0.2389 | 1928 | 5.6385 | 1995 | 15.8799 | 1984 | |
| ABSS | 0.2292 | 9 | 5.8182 | 9 | 15.8993 | 9 |
|
|
|
|
|
|
|
|
Note: NumF-number of feature.
Figure 4Mean fitness value of GA and the proposed method for the dataset of the body weights on three occasions.
Figure 5Min fitness value of GA and the proposed method for the body weights on each of the three occasions.
Selected genes by using the proposed method for the body weight on the three occasions.
| Type of BW | Selected Genes |
|---|---|
| Birth Weight | OAR1_103547224.1,OAR1_156571804.1,OAR1_174344110.1, |
| Six-Month Weight | OAR1_103051402.1,OAR1_138627292.1,OAR1_214050298.1, |
| Weaning Weight | OAR1_174220716.1,OAR1_193099978.1,OAR1_208929906.1, |
Figure 6The heatmap of the actual expression profiles for the best subset of genes obtained from the proposed method.
Average rankings of MSE among 8 algorithms on three datasets using Friedman test.
| Proposed | SWA | ABC | SCA | GA | BPSO | ABSS | ||
|---|---|---|---|---|---|---|---|---|
| Rank | 1 | 5.17 | 2.67 | 4.67 | 7 | 6.5 | 3 | 6 |
Post hoc Holm test (0.1).
| Comparison | Result | |
|---|---|---|
| Proposed vs. SWA | 0.428 | |
| Proposed vs. ABC | 0.900 | |
| Proposed vs. SCA | 0.583 | |
| Proposed vs. GA | 0.055 | |
| Proposed vs. BPSO | 0.108 | |
| Proposed vs. | 0.900 | |
| Proposed vs. ABSS | 0.195 |
The parameters settings of other feature selection methods in analyzing the body weights.
| Method | Parameter |
|---|---|
| GA [ | |
| BPSO [ | |
| SCA [ | |
| ABC [ | |
| SWA [ | |
| ABSS [ |