| Literature DB >> 30241463 |
Abstract
BACKGROUND: The development of a disease is a complex process that may result from joint effects of multiple genes. In this article, we propose the overlapping group screening (OGS) approach to determining active genes and gene-gene interactions incorporating prior pathway information. The OGS method is developed to overcome the challenges in genome-wide data analysis that the number of the genes and gene-gene interactions is far greater than the sample size, and the pathways generally overlap with one another. The OGS method is further proposed for patients' survival prediction based on gene expression data.Entities:
Keywords: Gene-gene interaction; Lasso; Overlapping group; Survival prediction
Mesh:
Year: 2018 PMID: 30241463 PMCID: PMC6150983 DOI: 10.1186/s12859-018-2372-2
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1The natural hierarchal structure of genes related to pathways with the clinical outcome
Data structure in Simulation 1
| Pathway | 1 | 2 | 3 | 4 | 5 | ||||
| Gene Size | 7 | 14 | 21 | 28 | 35 | ||||
| Overlapping | 3 | 5 | 7 | 9 |
Fig. 2The gene indices of the pathways considered in Simulation 1
Data structure in Simulation 2
| Pathway | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 |
| Gene Size | 3 | 3 | 3 | 6 | 6 | 6 | 9 | 9 | 9 | 15 | 15 | 15 | 24 | 24 | 24 | 36 | 36 | 36 | 45 | 45 | 45 | 60 | 60 | 60 |
| Overlapping | 1 1 | 2 2 | 3 3 | 5 5 | 8 8 | 12 12 | 15 15 | 20 20 | ||||||||||||||||
Fig. 3The gene indices of the pathways considered in Simulation 2
Results of Simulation 1 (1): The performances of OGS compared with other approaches under gene-gene interactions within one pathway
| Oracle | Uni. Sel. | Ordinary Lasso | TS-GSIS Lasso | OGS Ridge | OGS Lasso | |
|---|---|---|---|---|---|---|
| censoring rate = 50% | ||||||
| RMSE.M | 0.422 | 0.425 | 0.355 | 0.340 | 0.435 | 0.325 |
| T.model | 1 | 0 | 1 | 0.625 | 0.650 | 0.650 |
| Tint.model | 1 | 0.075 | 1 | 0.625 | 0.650 | 0.650 |
| Sen. | 1 | 0.783 | 1 | 0.974 | 0.977 | 0.976 |
| Spe. | 1 | 0.999 | 0.954 | 0.964 | 0.775 | 0.970 |
| S.model | 45 | 37.895 | 197.165 | 160.375 | 781.805 | 142.245 |
| Deviance | − 128.514 | − 108.289 | − 281.603 | − 282.783 | − 50.658 | − 294.313 |
| 0.925 | 0.891 | 0.984 | 0.983 | 0.855 | 0.985 | |
| censoring rate = 65% | ||||||
| RMSE.M | 0.421 | 0.424 | 0.375 | 0.364 | 0.436 | 0.348 |
| T.model | 1 | 0 | 1 | 0.805 | 0.815 | 0.815 |
| Tint.model | 1 | 0.070 | 1 | 0.805 | 0.815 | 0.815 |
| Sen. | 1 | 0.764 | 1 | 0.986 | 0.988 | 0.987 |
| Spe. | 1 | 0.999 | 0.962 | 0.965 | 0.686 | 0.968 |
| S.model | 45 | 37.855 | 170.655 | 157.450 | 1072.52 | 148.745 |
| Deviance | −123.803 | −102.527 | −231.398 | − 240.500 | − 45.181 | −250.026 |
| 0.928 | 0.898 | 0.983 | 0.984 | 0.849 | 0.986 | |
Results of Simulation 1 (2): The performances of OGS compared with other approaches under gene-gene interactions across two pathways
| Oracle | Uni. Sel. | Ordinary | TS-GSIS | OGS | OGS | |
|---|---|---|---|---|---|---|
| censoring rate = 50% | ||||||
| RMSE.M | 0.418 | 0.424 | 0.382 | 0.361 | 0.436 | 0.349 |
| T.model | 1 | 0 | 1 | 0.905 | 0.915 | 0.915 |
| Tint.model | 1 | 0.035 | 1 | 0.905 | 0.915 | 0.915 |
| Sen. | 1 | 0.746 | 1 | 0.992 | 0.994 | 0.994 |
| Spe. | 1 | 0.999 | 0.963 | 0.965 | 0.666 | 0.966 |
| S.model | 45 | 36.480 | 165.850 | 158.610 | 1139.54 | 155.235 |
| Deviance | − 133.967 | −102.900 | − 219.334 | − 241.541 | − 42.702 | − 252.120 |
| 0.944 | 0.899 | 0.980 | 0.984 | 0.842 | 0.986 | |
| censoring rate = 65% | ||||||
| RMSE.M | 0.414 | 0.422 | 0.398 | 0.390 | 0.436 | 0.382 |
| T.model | 1 | 0 | 0.974 | 0.909 | 0.909 | 0.909 |
| Tint.model | 1 | 0.035 | 1 | 0.909 | 0.909 | 0.909 |
| Sen. | 1 | 0.729 | 0.999 | 0.992 | 0.994 | 0.992 |
| Spe. | 1 | 0.999 | 0.968 | 0.970 | 0.558 | 0.971 |
| S.model | 45 | 36.970 | 150.597 | 141.459 | 1494.17 | 140.481 |
| Deviance | − 127.447 | −92.867 | − 169.725 | − 182.585 | −39.018 | − 191.849 |
| 0.949 | 0.903 | 0.974 | 0.979 | 0.829 | 0.981 | |
Results of Simulation 1 (3): The performances of OGS compared with other approaches under coexistence of within- and between-pathway gene-gene interactions
| Oracle | Uni. Sel. | Ordinary | TS-GSIS | OGS | OGS | |
|---|---|---|---|---|---|---|
| censoring rate = 50% | ||||||
| RMSE.M | 0.457 | 0.463 | 0.400 | 0.430 | 0.471 | 0.393 |
| T.model | 1 | 0 | 1 | 0.500 | 0.525 | 0.525 |
| Tint.model | 1 | 0 | 1 | 0.500 | 0.525 | 0.525 |
| Sen. | 1 | 0.708 | 1 | 0.959 | 0.966 | 0.963 |
| Spe. | 1 | 0.999 | 0.956 | 0.968 | 0.672 | 0.969 |
| S.model | 48 | 36.970 | 191.050 | 151.135 | 1119.675 | 146.475 |
| Deviance | −127.602 | −95.760 | − 266.727 | − 206.468 | −46.208 | − 254.344 |
| 0.924 | 0.871 | 0.983 | 0.961 | 0.826 | 0.978 | |
| censoring rate = 65% | ||||||
| RMSE.M | 0.455 | 0.461 | 0.418 | 0.412 | 0.471 | 0.401 |
| T.model | 1 | 0 | 1 | 0.675 | 0.715 | 0.715 |
| Tint.model | 1 | 0 | 1 | 0.675 | 0.715 | 0.715 |
| Sen. | 1 | 0.694 | 1 | 0.973 | 0.980 | 0.977 |
| Spe. | 1 | 0.999 | 0.963 | 0.968 | 0.614 | 0.969 |
| S.model | 48 | 37.070 | 168.335 | 150.420 | 1308.87 | 147.785 |
| Deviance | − 126.277 | −92.721 | − 220.297 | − 223.262 | −43.936 | −235.996 |
| 0.929 | 0.878 | 0.979 | 0.978 | 0.831 | 0.982 | |
Results of Simulation 2 (1): The performances of OGS compared with other approaches under gene-gene interactions within one pathway
| Oracle | Uni. Sel. | Ordinary | TS-GSIS | OGS | OGS | |
|---|---|---|---|---|---|---|
| censoring rate = 50% | ||||||
| RMSE.M | 0.065 | 0.067 | 0.067 | 0.067 | 0.069 | 0.066 |
| T.model | 1 | 0 | 0 | 0.005 | 0.435 | 0 |
| Tint.model | 1 | 0 | 0.545 | 0.360 | 0.435 | 0.415 |
| Sen. | 1 | 0.213 | 0.592 | 0.660 | 0.980 | 0.722 |
| Spe. | 1 | 1 | 0.999 | 0.999 | 0.760 | 0.999 |
| S.model | 84 | 19.275 | 152.135 | 151.280 | 25,704 | 159.140 |
| Deviance | − 136.422 | −44.754 | −73.930 | −88.464 | −4.454 | −100.153 |
| 0.917 | 0.766 | 0.853 | 0.868 | 0.583 | 0.885 | |
| censoring rate = 65% | ||||||
| RMSE.M | 0.064 | 0.067 | 0.067 | 0.067 | 0.069 | 0.067 |
| T.model | 1 | 0 | 0 | 0 | 0.560 | 0.005 |
| Tint.model | 1 | 0 | 0.420 | 0.410 | 0.560 | 0.505 |
| Sen. | 1 | 0.204 | 0.511 | 0.600 | 0.984 | 0.660 |
| Spe. | 1 | 1 | 0.999 | 0.999 | 0.761 | 0.999 |
| S.model | 84 | 19.095 | 141.940 | 141.745 | 25,586 | 149.070 |
| Deviance | −128.513 | −39.108 | −59.558 | −74.966 | − 12.148 | −85.540 |
| 0.925 | 0.769 | 0.842 | 0.860 | 0.605 | 0.877 | |
Results of Simulation 2 (3): The performances of OGS compared with other approaches under gene-gene interactions across two pathways
| Oracle | Uni. Sel. | Ordinary | TS-GSIS | OGS | OGS | |
|---|---|---|---|---|---|---|
| censoring rate = 50% | ||||||
| RMSE.M | 0.064 | 0.067 | 0.067 | 0.067 | 0.069 | 0.067 |
| T.model | 1 | 0 | 0 | 0 | 0.172 | 0 |
| Tint.model | 1 | 0 | 0.098 | 0.064 | 0.172 | 0.078 |
| Sen. | 1 | 0.200 | 0.504 | 0.586 | 0.977 | 0.659 |
| Spe. | 1 | 1 | 0.999 | 0.999 | 0.767 | 0.999 |
| S.model | 84 | 18.529 | 137.623 | 140.039 | 24,970 | 145.250 |
| Deviance | −136.378 | −38.797 | −57.657 | −71.756 | −11.113 | −83.105 |
| 0.928 | 0.765 | 0.838 | 0.856 | 0.600 | 0.875 | |
| censoring rate = 65% | ||||||
| RMSE.M | 0.063 | 0.067 | 0.068 | 0.067 | 0.069 | 0.067 |
| T.model | 1 | 0 | 0 | 0 | 0.279 | 0 |
| Tint.model | 1 | 0 | 0.051 | 0.051 | 0.279 | 0.084 |
| Sen. | 1 | 0.180 | 0.435 | 0.519 | 0.982 | 0.573 |
| Spe. | 1 | 1 | 0.999 | 0.999 | 0.756 | 0.999 |
| S.model | 84 | 17.284 | 127.991 | 130.405 | 26,132 | 137.153 |
| Deviance | − 124.043 | −30.843 | −44.032 | −55.482 | −5.697 | −62.328 |
| 0.936 | 0.761 | 0.822 | 0.845 | 0.598 | 0.860 | |
Results of Simulation 2 (3): The performances of OGS compared with other approaches under coexistence of within- and between-pathway gene-gene interactions
| Oracle | Uni. Sel. | Ordinary | TS-GSIS | OGS | OGS | |
|---|---|---|---|---|---|---|
| censoring rate = 50% | ||||||
| RMSE.M | 0.068 | 0.071 | 0.071 | 0.070 | 0.072 | 0,070 |
| T.model | 1 | 0 | 0 | 0 | 0.045 | 0 |
| Tint.model | 1 | 0 | 0.085 | 0.030 | 0.045 | 0.035 |
| Sen. | 1 | 0.185 | 0.533 | 0.575 | 0.955 | 0.632 |
| Spe. | 1 | 1 | 0.999 | 0.999 | 0.763 | 0.999 |
| S.model | 87 | 17.625 | 147.060 | 137.595 | 25,425 | 146.765 |
| Deviance | −135.986 | −38.636 | −65.861 | −73.924 | −5.523 | −83.062 |
| 0.916 | 0.751 | 0.839 | 0.845 | 0.587 | 0.859 | |
| censoring rate = 65% | ||||||
| RMSE.M | 0.067 | 0.071 | 0.071 | 0.071 | 0.072 | 0.070 |
| T.model | 1 | 0 | 0 | 0 | 0.104 | 0 |
| Tint.model | 1 | 0 | 0.010 | 0.035 | 0.104 | 0.050 |
| Sen. | 1 | 0.177 | 0.464 | 0.518 | 0.961 | 0.582 |
| Spe. | 1 | 1 | 0.999 | 0.999 | 0.759 | 0.999 |
| S.model | 87 | 17.094 | 134.752 | 133.153 | 25,793 | 139.218 |
| Deviance | −128.426 | −33.808 | −52.046 | −60.786 | −7.674 | −70.564 |
| 0.925 | 0.752 | 0.826 | 0.837 | 0.601 | 0.855 | |
Results of prediction accuracies of different methods based on DLBCL data
| Uni. | Ordinary | Overlap Lasso | TS-GSIS | OGS | OGS | |
|---|---|---|---|---|---|---|
| Cox-test | 0.8173 | 0.4487 | 0.0087 | 0.1102 | 0.3828 | 0.0007 |
| LR-test | 0.5854 | 0.2220 | 0.0152 | 0.4029 | 0.1945 | 0.0085 |
| Deviance | 183.1428 | −0.4282 | −6.4363 | −1.9859 | 2.3566 | −10.6504 |
| 0.5136 | 0.5367 | 0.5842 | 0.5468 | 0.5568 | 0.6001 |
Fig. 4Kaplan-Meier curves for the 207 subjects in the DLBCL with the testing data. Good (blue) and poor (red) groups are identified by the median of the PI’s in the testing dataset
Results of prediction accuracies of different methods based on NSCLC data (using the training and test sets as in Chen et al. [14])
| Uni. | Ordinary | Overlap | TS-GSIS | OGS | OGS | |
|---|---|---|---|---|---|---|
| Cox-test | 0.8381 | 0.6215 | 0.3441 | 0.8467 | 0.2372 | 0.2484 |
| LR-test | 0.3205 | 0.7046 | 0.3921 | 0.6216 | 0.3254 | 0.3254 |
| Deviance | 40.1323 | 0.4820 | −0.3605 | 3.9135 | −1.3311 | −1.0551 |
| 0.4485 | 0.5565 | 0.5775 | 0.5394 | 0.5966 | 0.5966 | |
| LR-test_3 | 0.3205 | 0.5369 | 0.2351 | 0.8505 | 0.0818 | 0.0818 |
Fig. 5Kaplan-Meier curves for the 62 subjects in the NSCLC testing data. Good (blue), medium (red) and poor (green) groups are identified by the tertile of the PI’s in the test dataset
Fig. 6Kaplan-Meier curves for the 62 subjects in the NSCLC testing data. Good (blue) and poor (red) groups are identified by the median of the PI’s in the test dataset
Results of prediction accuracies of different methods based on NSCLC data with 10-fold cross-validation procedure
| Uni. | Ordinary | Overlap | TS-GSIS | OGS | OGS | |
|---|---|---|---|---|---|---|
| Cox-test | 0.1688 | 0.7781 | 0.1734 | 0.4678 | 0.1435 | 0.1426 |
| LR-test | 0.6795 | 0.5696 | 0.1120 | 0.5337 | 0.8289 | 0.4356 |
| Deviance | 22.4633 | 1.5506 | −0.1409 | −0.5405 | −1.0941 | −1.4853 |
| 0.7273 | 0.3333 | 0.6970 | 0.6061 | 0.6235 | 0.7576 | |
| LR-test_3 | 0.1997 | 0.1990 | 0.1194 | 0.1053 | 0.1150 | 0.1085 |