| Literature DB >> 27556419 |
Chifeng Ma1, Konduru S Sastry2,3, Mario Flore1, Salah Gehani4, Issam Al-Bozom4, Yusheng Feng5, Erchin Serpedin6, Lotfi Chouchane2, Yidong Chen7,8, Yufei Huang9,10.
Abstract
BACKGROUND: We considered the prediction of cancer classes (e.g. subtypes) using patient gene expression profiles that contain both systematic and condition-specific biases when compared with the training reference dataset. The conventional normalization-based approaches cannot guarantee that the gene signatures in the reference and prediction datasets always have the same distribution for all different conditions as the class-specific gene signatures change with the condition. Therefore, the trained classifier would work well under one condition but not under another.Entities:
Mesh:
Year: 2016 PMID: 27556419 PMCID: PMC5001207 DOI: 10.1186/s12864-016-2903-z
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Fig. 1General idea of CL. Due to condition-specific biases, the existing normalization algorithm might fail to normalize the distributions of class-specific gene signatures in the reference and prediction datasets. Therefore, the classifier trained using the reference dataset would not work well for the prediction dataset (top right figure). Unlike normalization based approach, CL exploits the fact that the signature is unique to its associated class under any condition and thus employs an unsupervised clustering algorithm to discover this unique signature, hence the class label (bottom right figure)
Distribution of patients on PAM50 subtypes and ER-PR status
| LumA | LumB | Her2 | Basal | Normal | |
|---|---|---|---|---|---|
| ER+,PR+ | 246 | 188 | 78 | 4 | 23 |
| ER+, PR- | 12 | 51 | 33 | 3 | 6 |
| ER-, PR+ | 15 | 5 | 3 | 4 | 1 |
| ER-, PR- | 4 | 17 | 60 | 59 | 2 |
Fig. 2Comparison of CL and seven cross platform normalization + SVM algorithms for PAM50 classification accuracy. Horizontal line represents the level of experimental bias level and vertical line represents the classification accuracy
The size of CL selected gene set for PAM50 classification
| Subtype | Selection gene size |
|---|---|
| LumA | 60 |
| LumB | 60 |
| Her2 | 63 |
| Basal | 299 |
| Normal | 52 |
Impact of different threshold on selected size, value and corresponding classification accuracy
| T1 T2 combination | Selected gene size | Smallest absolute expression | Classification accuracy |
|---|---|---|---|
| 0.1, 0.1 | 790 | 0.21 | 79.66 % |
| 0.3, 0.1 | 637 | 0.28 | 74.02 % |
| 0.5, 0.1 | 441 | 0.37 | 72.99 % |
| 0.7, 0.1 | 292 | 0.48 | 72.99 % |
| 0.9, 0.1 | 189 | 0.66 | 74.53 % |
| 1.1, 0.1 | 123 | 0.73 | 63.42 % |
| 0.3, 0.3 | 634 | 0.30 | 74.02 % |
| 0.3, 0.5 | 600 | 0.50 | 75.56 % |
| 0.3, 0.7 | 532 | 0.70 | 73.85 % |
| 0.3, 0.9 | 442 | 0.80 | 70.09 % |
| 0.1, 0.8 (selected) | 534 | 0.80 | 80.00 % |
Fig. 3Comparison of CL selected PAM50 signature and PAM50 signature
Classification accuracy of PAM50 classification of GSE2740
| Algorithm | Accuracy |
|---|---|
| CL | 73 % |
| EB | 55 % |
| GQ | 55 % |
| DWD | 56 % |
| XPN | 57 % |
| DisTran | 53 % |
| MRS | 57 % |
| QD | 56 % |
Fig. 4Classification accuracy vs ISEP for simulation case. Horizontal axis represents the classification accuracy and vertical axis represents the corresponding ISEP
Fig. 5Plot of ISEP with experimental bias for CL and seven cross platform normalization algorithm + SVM in the Simulation Case. Horizontal axis represents the experiment Bias level and vertical axis represents the ISEP values
ISEP of PAM50 prediction for CL and seven cross platform normalization algorithms + SVM for GSE10797
| Algorithm | ISEP |
|---|---|
| CL | 5.67 |
| EB | 3.61 |
| GQ | 3.86 |
| DWD | 3.66 |
| XPN | 4.09 |
| DisTran | 5.27 |
| MRS | 5.12 |
| QD | 5.71 |
| PAM50 | 3.3 |
CL selected signature gene set size for cancer 2000
| Subtype | Selection gene size |
|---|---|
| Class 1 | 367 |
| Class 2 | 3111 |
| Class 3 | 98 |
| Class 4 | 207 |
| Class 5 | 981 |
| Class 6 | 501 |
| Class 7 | 265 |
| Class 8 | 247 |
| Class 9 | 773 |
| Class 10 | 286 |
Fig. 6Comparison of Cancer 2000 Classification between CL and seven cross platform normalization algorithm + SVM in the simulation case. Horizontal axis represents the experimental bias level and vertical axis represents the classification accuracy
Fig. 7Cancer2000 classification for TCGA-BRCA dataset. Horizontal axis represents the number of samples classified for each cancer 2000 cluster. Different colors label the PAM50 class label
Selected cancer 2000 classes and their characteristics
| Cancer 2000 cluster (Selected) | Class 2 | Class 3 | Class 5 | Class 6 |
|---|---|---|---|---|
| Characteristics | ER + | Luminal A | ER-,Her2 enriched | Luminal Samples, ER+ |
Comparison of cancer2000 prediction results between CL and 7 alternative cross platform normalization algorithm
| Cancer2000 class | Class 2 | Class 3 | Class 5 ER- | Class 5 Her 2 | Class 6 Luminal | Class 6 ER+ |
|---|---|---|---|---|---|---|
| CL | 36/41 | 24/26 | 20/28 | 21/28 | 24/26 | 26/26 |
| EB | 8/10 | 0/0 | 0/0 | 0/0 | 0/0 | 0/0 |
| GQ | 37/69 | 75/126 | 4/4 | 0/4 | 20/23 | 21/23 |
| DWD | 98/111 | 52/105 | 14/15 | 0/15 | 19/24 | 19/24 |
| XPN | 8/10 | 0/0 | 0/0 | 0/0 | 0/0 | 0/0 |
| DisTran | 0/0 | 256/256 | 0/0 | 0/0 | 0/1 | 0/1 |
| MRS | 0/0 | 29/68 | 0/0 | 0/0 | 0/0 | 0/0 |
| QD | 2/3 | 1/11 | 0/0 | 0/0 | 0/0 | 0/0 |
Breast cancer subtype classification of QNRF
| QNRF sample | ER | PR | PAM50 R call | PAM50 CL cCall | Cancer 2000 CL call |
|---|---|---|---|---|---|
| B10 | + | + | LumB | Basal | cancer2000 icluster 1 |
| B13 | + | + | LumA | Basal | cancer2000 icluster 1 |
| B14 | NA | NA | LumB | Basal | cancer2000 icluster 3 |
| B17 | + | + | Normal | HER2 | cancer2000 icluster 3 |
| B18 | + + | + | Normal | HER2 | cancer2000 icluster 3 |
| B19 | NA | NA | LumB | HER2 | cancer2000 icluster 3 |
| B20 | + | + | Basal | HER2 | cancer2000 icluster 3 |
| B21 | + | + | Lum B | Lum B | cancer2000 icluster 3 |
| B22 | - | - | LumB | Basal | cancer2000 icluster 3 |
| B23 | + | + | LumB | Basal | cancer2000 icluster 3 |
| B24 | + | + | LumB | Basal | cancer2000 icluster 3 |
| B25 | - | - | LumB | Basal | cancer2000 icluster 3 |
| B26 | + | + | Basal | Basal | cancer2000 icluster 3 |
| B27 | + | + | Basal | Basal | cancer2000 icluster 3 |
| B2 | - | - | Lum B | Basal | cancer2000 icluster 1 |
| B3 | NA | NA | Lum B | Basal | cancer2000 icluster 1 |
| B4 | + | + | Lum B | HER2 | cancer2000 icluster 5 |
| B5 | + | - | Lum B | Basal | cancer2000 icluster 1 |
| B6 | - | - | Basal | Basal | cancer2000 icluster 1 |
| B7 | NA | NA | Lum A | Basal | cancer2000 icluster 1 |