| Literature DB >> 31510663 |
Michio Iwata1, Longhao Yuan2,3, Qibin Zhao3,4, Yasuo Tabei3, Francois Berenger1, Ryusuke Sawada1, Sayaka Akiyoshi5, Momoko Hamano1, Yoshihiro Yamanishi1,6.
Abstract
MOTIVATION: Genome-wide identification of the transcriptomic responses of human cell lines to drug treatments is a challenging issue in medical and pharmaceutical research. However, drug-induced gene expression profiles are largely unknown and unobserved for all combinations of drugs and human cell lines, which is a serious obstacle in practical applications.Entities:
Mesh:
Year: 2019 PMID: 31510663 PMCID: PMC6612872 DOI: 10.1093/bioinformatics/btz313
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.Overview of our proposed approach: (a) TT decomposition, (b) mode- n matricization (inspired from Cichocki ) and (c) application of tensor decomposition to drug-induced transcriptome data comprising drugs, genes and cell lines
Fig. 2.Strategies for generating artificial missing values
Performance evaluation of data completion by tensor decomposition algorithms for third-order transcriptome data (drugs, genes and cell lines) with different rates of artificial missing values
| Artificial missing rate | |||||||||
|---|---|---|---|---|---|---|---|---|---|
| 10% | 50% | 90% | |||||||
| Standard imputation | CP (baseline) | TT (proposed) | Standard imputation | CP (baseline) | TT (proposed) | Standard imputation | CP (baseline) | TT (proposed) | |
| Total cell lines | 0.0750 | 0.0765 |
| 0.0837 | 0.0798 |
| NA | 0.0820 |
|
| MCF7 | 0.0634 | 0.0616 |
| 0.0735 | 0.0658 |
| NA | 0.0681 |
|
| PC3 | 0.0648 | 0.0650 |
| 0.0742 | 0.0673 |
| NA | 0.0699 |
|
| A375 | 0.0832 | 0.0862 |
| 0.0929 | 0.0906 |
| NA | 0.0930 |
|
| HA1E | 0.0744 | 0.0759 |
| 0.0842 | 0.0796 |
| NA | 0.0819 |
|
| HT29 | 0.0773 | 0.0777 |
| 0.0853 | 0.0810 |
| NA | 0.0831 |
|
| A549 | 0.0755 | 0.0785 |
| 0.0833 | 0.0812 |
| NA | 0.0822 |
|
| VCAP | 0.0643 | 0.0710 |
| 0.0703 | 0.0723 |
| NA | 0.0740 |
|
| YAPC | 0.0728 | 0.0738 |
| 0.0840 | 0.0786 |
| NA | 0.0810 |
|
| HELA | 0.0701 | 0.0715 |
| 0.0800 | 0.0749 |
| NA | 0.0772 |
|
| HCC515 | 0.0986 | 0.0994 |
| 0.1068 | 0.1039 |
| NA | 0.1049 |
|
| HEPG2 | 0.0948 | 0.0954 |
| 0.1012 | 0.0978 |
| NA | 0.0990 |
|
| HS578T | 0.0407 | 0.0420 |
| 0.0431 | 0.0432 |
| NA | 0.0445 |
|
| MCF10A | 0.0480 | 0.0476 |
| 0.0496 | 0.0482 |
| NA | 0.0496 |
|
| MDAMB231 | 0.0432 | 0.0440 |
| 0.0475 | 0.0467 |
| NA | 0.0490 |
|
| SKBR3 |
| 0.0440 | 0.0416 | 0.0426 | 0.0432 |
| NA | 0.0450 |
|
| BT20 | 0.0433 | 0.0441 |
| 0.0443 | 0.0443 |
| NA | 0.0468 |
|
Note: Missing values were generated by the ‘random missing’ strategy. RSEs between the original and reconstructed data from tensor decomposition were calculated for missing values only. The proposed TT-WOPT method and the baseline CP-WOPT method are denoted as TT and CP, respectively. Artificially generated missing rates of 10, 50 and 90% were tested. Cell lines are listed in order of increasing original missing rates. Bold indicates the best result.
Performance evaluation of data completion by tensor decomposition algorithms for third-order transcriptome data (drugs, genes and cell lines) with artificial missing values
| Artificial missing cell | (a) RSEs for all values | (b) RSEs for missing values | ||
|---|---|---|---|---|
| CP (baseline) | TT (proposed) | CP (baseline) | TT (proposed) | |
| MCF7 | 0.1811 |
| 0.6673 |
|
| PC3 | 0.2170 |
| 0.8199 |
|
| A375 | 0.2216 |
| 0.8122 |
|
| HA1E | 0.2495 |
| 0.9562 |
|
| HT29 | 0.2577 |
| 0.9910 |
|
| A549 | 0.2401 |
| 0.9157 |
|
| VCAP | 0.2196 |
| 0.8329 |
|
| YAPC | 0.2604 |
| 1.0015 |
|
| HELA | 0.2695 |
| 1.0390 |
|
| HCC515 | 0.2109 |
| 0.7910 |
|
| HEPG2 | 0.1657 |
| 0.5855 |
|
| HS578T | 0.2281 |
| 0.8655 |
|
| MCF10A | 0.2157 |
| 0.8139 |
|
| MDAMB231 | 0.2134 |
| 0.8029 |
|
| SKBR3 | 0.2208 |
| 0.8307 |
|
| BT20 | 0.2238 |
| 0.8500 |
|
Note: Missing values were generated by the ‘cell-based missing’ strategy. RSEs between the original and reconstructed data from tensor decomposition were calculated for missing values only. The proposed TT-WOPT method and the baseline CP-WOPT method are denoted as TT and CP, respectively. Cell lines are listed in order of increasing original missing rates. Bold indicates the best result.
Performance evaluation of data completion by tensor decomposition algorithms for fourth-order transcriptome data (drugs, genes, cell lines and time points) with different rates of artificial missing values
| Artificial missing rate | |||||||||
|---|---|---|---|---|---|---|---|---|---|
| 10% | 50% | 90% | |||||||
| Standard imputation | CP (baseline) | TT (proposed) | Standard imputation | CP (baseline) | TT (proposed) | Standard imputation | CP (baseline) | TT (proposed) | |
| Total cell lines | 0.00271 | 0.0031 |
| 0.0028 | 0.0030 |
| NA | 0.0036 |
|
| MCF7 | 0.00195 | 0.0031 |
| 0.00242 | 0.0028 |
| NA | 0.0037 |
|
| PC3 | 0.0024 | 0.0027 |
| 0.0026 | 0.0029 |
| NA | 0.0036 |
|
| A375 | 0.00288 | 0.0032 |
| 0.0028 | 0.0030 |
| NA | 0.0035 |
|
| HA1E | 0.0033 | 0.0032 |
| 0.0029 | 0.0032 |
| NA | 0.0037 |
|
| HT29 | 0.0022 | 0.0022 |
| 0.00195 | 0.0023 |
| NA | 0.0030 |
|
| A549 | 0.0027 | 0.0027 |
| 0.0033 | 0.0035 |
| NA | 0.0039 |
|
| VCAP | 0.0028 | 0.0036 |
| 0.0031 | 0.0033 |
| NA | 0.0037 |
|
| YAPC | 0.0036 | 0.0037 |
| 0.0037 | 0.0037 |
| NA | 0.0041 |
|
| HELA | 0.0043 | 0.0035 |
| 0.0040 | 0.0040 |
| NA | 0.0042 |
|
| HCC515 | 0.0023 | 0.0023 |
| 0.0021 | 0.0023 |
| NA | 0.0031 |
|
| HEPG2 | 0.0011 | 0.0017 |
| 0.00142 | 0.0017 |
| NA | 0.0030 |
|
| HS578T | 0.0019 | 0.0018 |
| 0.0011 | 0.0020 |
| NA | 0.0031 |
|
| MCF10A | 0.0016 | 0.0016 |
| 0.0009 | 0.0017 |
| NA | 0.0032 |
|
| MDAMB231 | 0.0006 | 0.0017 |
| 0.0008 | 0.0020 |
| NA | 0.0028 |
|
| SKBR3 |
| 0.0017 | 0.00036 | 0.0012 | 0.0022 |
| NA | 0.0029 |
|
| BT20 |
| 0.0018 | 0.0008 | 0.0009 | 0.0020 |
| NA | 0.0030 |
|
Note: Missing values were generated by the ‘random missing’ strategy. RSEs between the original and reconstructed data from tensor decomposition were calculated for missing values only. The proposed TT-WOPT method and the baseline CP-WOPT method are denoted as TT and CP, respectively. Artificially generated missing rates of 10, 50 and 90% were tested. Cell lines are listed in order of increasing original missing rates. Bold indicates the best result.
Performance evaluation of data completion by tensor decomposition algorithms for fourth-order transcriptome data (drugs, genes, cell lines and time points) with artificial missing values
| Artificial missing cell | (a) RSEs for all values | (b) RSEs for missing values | ||
|---|---|---|---|---|
| CP (baseline) | TT (proposed) | CP (baseline) | TT (proposed) | |
| MCF7 | 0.2693 |
| 1.0749 |
|
| PC3 | 0.2215 |
| 0.8859 |
|
| A375 | 0.1811 |
| 0.7245 |
|
| HA1E | 0.2568 |
| 1.0273 |
|
| HT29 | 0.2950 |
| 1.1522 |
|
| A549 | 0.2222 |
| 0.8887 |
|
| VCAP | 0.1543 |
| 0.6172 |
|
| YAPC | 0.1838 |
| 0.7352 |
|
| HELA | 0.2073 |
| 0.8291 |
|
| HCC515 | 0.3141 |
| 1.0315 |
|
| HEPG2 | 0.2077 |
| 0.8308 |
|
| HS578T | 0.1887 |
| 0.7548 |
|
| MCF10A | 0.1678 |
| 0.6713 |
|
| MDAMB231 | 0.2241 |
| 0.8964 |
|
| SKBR3 | 0.2164 |
| 0.8654 |
|
| BT20 | 0.2711 |
| 1.0127 |
|
Note: Missing values were generated by the ‘cell-based missing’ strategy. RSEs between the original and reconstructed data from tensor decomposition were calculated for (a) all values and (b) missing values only. The proposed TT-WOPT method and the baseline CP-WOPT method are denoted as TT and CP, respectively. Cell lines are listed in order of increasing original missing rates. Bold indicates the best result.
Fig. 3.Performance comparison on the drug indication prediction among the inverse signature, XSum and multitask learning methods with and without tensor decomposition. Each box-plot represents AUC scores for all cell lines. The horizontal gray line corresponds to random inference
Fig. 4.Performance comparison on drug indication prediction among the inverse signature, XSum and multitask learning methods with and without tensor decomposition. The top panel shows the AUC score calculated using all prediction scores for all drug–disease pairs. The middle panel shows the average of AUC scores calculated using all prediction scores for individual diseases. The bottom panel shows the missing rate in each cell line. Cell lines are listed in increasing order of missing rates