| Literature DB >> 32933072 |
Yitan Zhu1, Thomas Brettin1, Yvonne A Evrard2, Fangfang Xia1, Alexander Partin1, Maulik Shukla1, Hyunseung Yoo1, James H Doroshow3, Rick L Stevens1,4.
Abstract
The co-expression extrapolation (COXEN) method has been successfully used in multiple studies to select genes for predicting the response of tumor cells to a specific drug treatment. Here, we enhance the COXEN method to select genes that are predictive of the efficacies of multiple drugs for building general drug response prediction models that are not specific to a particular drug. The enhanced COXEN method first ranks the genes according to their prediction power for each individual drug and then takes a union of top predictive genes of all the drugs, among which the algorithm further selects genes whose co-expression patterns are well preserved between cancer cases for building prediction models. We apply the proposed method on benchmark in vitro drug screening datasets and compare the performance of prediction models built based on the genes selected by the enhanced COXEN method to that of models built on genes selected by the original COXEN method and randomly picked genes. Models built with the enhanced COXEN method always present a statistically significantly improved prediction performance (adjusted p-value ≤ 0.05). Our results demonstrate the enhanced COXEN method can dramatically increase the power of gene expression data for predicting drug response.Entities:
Keywords: co-expression extrapolation (COXEN); gene selection; general drug response prediction model; precision oncology
Mesh:
Substances:
Year: 2020 PMID: 32933072 PMCID: PMC7565427 DOI: 10.3390/genes11091070
Source DB: PubMed Journal: Genes (Basel) ISSN: 2073-4425 Impact factor: 4.096
Figure 1Analysis flowchart of the COXEN framework. The original COXEN method and the enhanced COXEN method are different in the two boxes with dashed line border.
The original COXEN gene selection method.
| Given: | Dataset 1 that includes an |
| Step 1: | On dataset 1, for every gene, calculate the gene’s prediction power for drug response using a defined PPM. Select |
| Step 2: | For each of the Calculate its Pearson correlation coefficients with the other Calculate its Pearson correlation coefficients with the other Calculate the Pearson correlation coefficient between |
| Step 3: | Among the |
Figure 2An illustration of generating the candidate gene pool by taking a union of top predictive genes for each drug.
Numbers of CCLs, drugs, and experiments (pairs of drugs and CCLs) in datasets.
| Dataset | # CCLs | # Drugs | # Experiments |
|---|---|---|---|
| GCSI | 357 | 16 | 5647 |
| CCLE | 474 | 24 | 10,971 |
| GDSC | 659 | 238 | 125,712 |
Figure 3Histograms of drug response AUC values in datasets. Mean and standard deviation (std) of AUC values are shown under each histogram.
Comparison of between-class and within-class drug response variations with CCLs and drugs taken as classes.
| Dataset | Total Variation | Between-Class Variation (CCL) | Within-Class Variation (CCL) | Between-Class Variation (Drug) | Within-Class Variation (Drug) |
|---|---|---|---|---|---|
| CCLE | 282.97 | 21.93 | 261.05 | 192.60 | 90.37 |
| GCSI | 203.14 | 22.43 | 180.71 | 122.85 | 80.30 |
| GDSC | 3413.74 | 218.11 | 3195.64 | 1954.38 | 1459.37 |
Comparison on the prediction performance of models built based on genes selected by the enhanced COXEN method and that of models built based on genes selected by the baseline gene selection methods.
| Data |
|
| Enhanced COXEN | Random All | Random LINCS | Original COXEN | ||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| R2 | R2 | Adjusted | PIP | R2 | Adjusted | PIP | R2 | Adjusted | PIP | |||
| CCLE | 1600 | 800 | 0.725 (0.018) | 0.715 (0.019) | 4.32 × 10−25 | 27.3% | 0.716 (0.018) | 1.57 × 10−24 | 23.0% | 0.719 (0.018) | 2.58 × 10−19 | 16.0% |
| 800 | 400 | 0.724 (0.018) | 0.712 (0.019) | 7.04 × 10−30 | 33.5% | 0.714 (0.018) | 8.04 × 10−24 | 26.5% | 0.717 (0.018) | 1.36 × 10−15 | 16.3% | |
| 400 | 200 | 0.721 (0.019) | 0.710 (0.019) | 5.74 × 10−24 | 33.5% | 0.711 (0.018) | 9.66 × 10−20 | 29.9% | 0.715 (0.019) | 2.81 × 10−11 | 14.8% | |
| 200 | 100 | 0.719 (0.019) | 0.706 (0.019) | 6.23 × 10−20 | 43.5% | 0.708 (0.020) | 1.74 × 10−18 | 37.2% | 0.713 (0.018) | 2.81 × 10−10 | 17.9% | |
| GCSI | 1600 | 800 | 0.678 (0.032) | 0.670 (0.031) | 1.03 × 10−10 | 11.7% | 0.669 (0.032) | 2.60 × 10−13 | 14.4% | 0.676 (0.032) | 4.10 × 10−3 | 3.4% |
| 800 | 400 | 0.678 (0.034) | 0.666 (0.032) | 6.64 × 10−13 | 18.0% | 0.666 (0.033) | 1.54 × 10−13 | 18.0% | 0.675 (0.031) | 2.21 × 10−2 | 3.0% | |
| 400 | 200 | 0.675 (0.036) | 0.661 (0.033) | 3.97 × 10−13 | 23.8% | 0.661 (0.032) | 2.60 × 10−13 | 22.7% | 0.672 (0.033) | 1.62 × 10−2 | 4.0% | |
| 200 | 100 | 0.672 (0.036) | 0.653 (0.033) | 1.59 × 10−15 | 35.1% | 0.654 (0.033) | 9.29 × 10−18 | 32.1% | 0.668 (0.032) | 1.00 × 10−2 | 5.6% | |
| GDSC | 1600 | 800 | 0.625 (0.017) | 0.618 (0.017) | 7.40 × 10−24 | 15.0% | 0.619 (0.017) | 1.64 × 10−20 | 12.2% | 0.621 (0.017) | 9.52 × 10−12 | 6.3% |
| 800 | 400 | 0.624 (0.018) | 0.616 (0.017) | 2.35 × 10−20 | 15.5% | 0.617 (0.017) | 4.99 × 10−18 | 13.0% | 0.619 (0.018) | 3.86 × 10−12 | 8.4% | |
| 400 | 200 | 0.622 (0.017) | 0.614 (0.018) | 3.95 × 10−25 | 19.9% | 0.615 (0.018) | 1.35 × 10−18 | 15.3% | 0.617 (0.017) | 3.55 × 10−16 | 11.1% | |
| 200 | 100 | 0.620 (0.018) | 0.610 (0.018) | 4.65 × 10−25 | 24.7% | 0.612 (0.018) | 1.76 × 10−15 | 18.1% | 0.614 (0.017) | 5.21 × 10−19 | 13.4% | |
Enhanced COXEN, Original COXEN, Random All, and Random LINCS refer to the enhanced COXEN method, the original COXEN method, genes randomly picked from all available genes, and genes randomly picked from the LINCS set, respectively. In the R2 columns, the number before the parenthesis is the average R2 across cross-validation trails and the number in the parenthesis is the standard deviation.
Comparison of the enhanced COXEN method and the baseline gene selection methods in the performance of predicting response to individual drugs.
| Data |
|
| Enhanced COXEN | Random All | Random LINCS | Original COXEN | |||
|---|---|---|---|---|---|---|---|---|---|
| R2 | R2 | Adjusted | R2 | Adjusted | R2 | Adjusted | |||
| CCLE | 1600 | 800 | 0.160 (0.110) | 0.147 (0.100) | 2.63 × 10−3 | 0.139 (0.091) | 4.72 × 10−4 | 0.147 (0.103) | 1.66 × 10−2 |
| 800 | 400 | 0.153 (0.110) | 0.143 (0.096) | 8.80 × 10−2 | 0.136 (0.091) | 7.26 × 10−3 | 0.144 (0.103) | 1.68 × 10−1 | |
| 400 | 200 | 0.141 (0.109) | 0.135 (0.093) | 2.85 × 10−1 | 0.133 (0.087) | 1.80 × 10−1 | 0.138 (0.102) | 7.42 × 10−1 | |
| 200 | 100 | 0.130 (0.105) | 0.126 (0.088) | 5.21 × 10−1 | 0.129 (0.087) | 9.21 × 10−1 | 0.129 (0.097) | 7.98 × 10−1 | |
| GCSI | 1600 | 800 | 0.222 (0.103) | 0.215 (0.104) | 3.05 × 10−1 | 0.207 (0.102) | 1.93 × 10−2 | 0.213 (0.101) | 2.50 × 10−1 |
| 800 | 400 | 0.219 (0.107) | 0.206 (0.100) | 1.10 × 10−1 | 0.206 (0.100) | 4.97 × 10−2 | 0.216 (0.102) | 7.42 × 10−1 | |
| 400 | 200 | 0.208 (0.107) | 0.198 (0.096) | 2.30 × 10−1 | 0.190 (0.097) | 1.93 × 10−2 | 0.205 (0.100) | 7.42 × 10−1 | |
| 200 | 100 | 0.203 (0.101) | 0.181 (0.095) | 4.16 × 10−2 | 0.178 (0.092) | 1.03 × 10−2 | 0.195 (0.095) | 5.34 × 10−1 | |
| GDSC | 1600 | 800 | 0.085 (0.118) | 0.074 (0.113) | 2.99 × 10−25 | 0.073 (0.116) | 1.78 × 10−38 | 0.080 (0.116) | 1.37 × 10−11 |
| 800 | 400 | 0.083 (0.117) | 0.074 (0.111) | 9.36 × 10−21 | 0.073 (0.113) | 4.54 × 10−26 | 0.077 (0.114) | 1.38 × 10−10 | |
| 400 | 200 | 0.083 (0.115) | 0.070 (0.110) | 4.26 × 10−26 | 0.072 (0.113) | 4.92 × 10−22 | 0.075 (0.114) | 5.24 × 10−14 | |
| 200 | 100 | 0.080 (0.114) | 0.067 (0.105) | 1.14 × 10−20 | 0.069 (0.109) | 4.62 × 10−22 | 0.071 (0.113) | 3.70 × 10−17 | |
Enhanced COXEN, Original COXEN, Random All, and Random LINCS refer to the enhanced COXEN method, the original COXEN method, genes randomly picked from all available genes, and genes randomly picked from the LINCS set, respectively. In the R2 columns, the number before the parenthesis is the average R2 across drugs and the number in the parenthesis is the standard deviation.
Comparison in the performance of prediction models built based on genes selected by the enhanced COXEN method and that of models built based on the hub genes derived from network analysis.
| Data |
|
| R2 (Enhanced COXEN) | R2 (Hub Genes) | Adjusted | PIP |
|---|---|---|---|---|---|---|
| CCLE | 90 | 45 | 0.716 (0.019) | 0.709 (0.020) | 1.14 × 10−10 | 25.2% |
| GCSI | 90 | 45 | 0.668 (0.035) | 0.655 (0.033) | 5.07× 10−9 | 23.3% |
| GDSC | 90 | 45 | 0.616 (0.018) | 0.617 (0.017) | 6.75× 10−1 | −0.7% |
Enhanced COXEN and Hub Genes refer to the enhanced COXEN method and the hub genes identified from co-expression network analysis [24], respectively. In the R2 columns, the number before the parenthesis is the average R2 across cross-validation trails and the number in the parenthesis is the standard deviation.