| Literature DB >> 28086747 |
Nikhil Cheerla1, Olivier Gevaert2.
Abstract
BACKGROUND: The current state-of-the-art in cancer diagnosis and treatment is not ideal; diagnostic tests are accurate but invasive, and treatments are "one-size fits-all" instead of being personalized. Recently, miRNA's have garnered significant attention as cancer biomarkers, owing to their ease of access (circulating miRNA in the blood) and stability. There have been many studies showing the effectiveness of miRNA data in diagnosing specific cancer types, but few studies explore the role of miRNA in predicting treatment outcome.Entities:
Keywords: Cancer diagnosis; Pan-cancer; TCGA dataset; miRNA
Mesh:
Substances:
Year: 2017 PMID: 28086747 PMCID: PMC5237282 DOI: 10.1186/s12859-016-1421-y
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Distribution of the cancer and normal samples in the dataset used to build the predictive model (TCGA cancer classifier)
| Organ/System | Cancer | Cancer acronym | Normal samples | Cancer samples |
|---|---|---|---|---|
| Thymus | Thymoma | THYM | 2 | 124 |
| Lung | Lung Squamous Cell Carcinoma | LUSC | 45 | 342 |
| Pancreas | Pancreatic Adenocarcinoma | PAAD | 4 | 179 |
| GI tract | Cholangiocarcinoma | CHOL | 9 | 36 |
| Esophageal Carcinoma | ESCA | 13 | 185 | |
| Stomach Adenocarcinoma | STAD | 41 | 395 | |
| Liver | Liver Hepatocellular Carcinoma | LIHC | 50 | 374 |
| Thyroid | Thyroid Carcinoma | THCA | 59 | 510 |
| Adipose | Adrenocortical carcinoma | ACC | 0 | 80 |
| Lymph | Diffuse Large B-cell Lymphoma | DLBC | 0 | 47 |
| Heart | Mesothelioma | MESO | 0 | 87 |
| Reproductive | Cervical Squamous Cell and Endocervical Adenocarcinoma | CESC | 3 | 309 |
| Ovarian Serous Cystadenocarcinoma | OV | 0 | 461 | |
| Testicular Germ Cell Tumors | TGCT | 0 | 156 | |
| Urinary | Uterine Carcinosarcoma | UCS | 0 | 56 |
| Kidney | Kidney Chromophobe | KICH | 25 | 66 |
| Kidney Renal Papillary cell carcinoma | KIRP | 34 | 292 | |
| Brain | Brain Lower Grade Glioma | LGG | 0 | 526 |
| Peripheral Nervous System | Pheochromocytoma and Paraganglioma | PCPG | 3 | 184 |
| Epidermis | Skin Cutaneous Melanoma | SKCM | 2 | 450 |
| Uveal Melanoma | UVM | 0 | 80 |
Note that not all the cancer types have normal samples. Even though the TCGA dataset has about 33 cancer types, many cancer types were removed due to lack of data (less than 5 samples) and in the end 21 cancer types as listed in this table were used for the classification
Fig. 2Mammalian developmental tree. Each of the 21 cancer types was assigned to its appropriate leaf node of the tree
Fig. 1Performance metrics for multiclass classifiers. Accuracy and kappa statistics for the 7 multiclass classifiers that we evaluated, using boxplots to reflect classifier variability over multiple runs
SVM classifier performance
Per-cancer performance metrics for the SVM classifier with all features and with various feature subsets selected by two-stage feature selection algorithms. The cells shaded in pink are the cancer types with sensitivities below 90%
Confusion matrix for the SVM classifier
This matrix is obtained by aggregating the results of 10-fold cross validation repeated 10 times. The rows represent the predictions and the columns represent the true values. The entry values contain the fraction of the overall samples of a cancer type (represented by the column) that are predicted as the cancer type represented by the row. Cells shaded in orange-red colors represent misclassifications greater than 5% of the total samples for that cancer type. For example, for the ESCA cancer type, 11% of the ESCA cancer type samples were misclassified as STAD
SVM classifier performance at different stages of the embryonic development tree
| Stage I classifier | Stage II classifier | Stage III classifier | ||||||
|---|---|---|---|---|---|---|---|---|
| Subtype | Sensitivity | Specificity | Subtype | Sensitivity | Specificity | Subtype | Sensitivity | Specificity |
| Endoderm | 0.99 | 1 | GI Tube | 0.99 | 0.99 | Thymus | 0.98 | 1 |
| Lung | 0.98 | 1 | ||||||
| Pancreas | 0.96 | 1 | ||||||
| GI Tract | 0.98 | 1 | ||||||
| Liver | 0.98 | 1 | ||||||
| Thyroid | 1 | 1 | ||||||
| Mesoderm | 0.99 | 1 | Lateral Plate Mesoderm | 0.97 | 1 | Adipose | 0.98 | 1 |
| Lymph | 0.96 | 1 | ||||||
| Heart | 0.98 | 1 | ||||||
| Intermediate Mesoderm | 0.99 | 1 | Reproductive | 0.99 | 1 | |||
| Urinary | 0.98 | 1 | ||||||
| Kidney | 0.99 | 1 | ||||||
| Ectoderm | 1 | 1 | Neural Ectoderm/Neural Tube | 1 | 1 | Brain | 1 | 1 |
| Neural Crest | 0.99 | 1 | Peripheral Nervous System | 0.99 | 1 | |||
| Surface Ectoderm | 1 | 1 | Epidermis | 1 | 1 | |||
Climbing up the embryonic development tree, SVM classifiers were built at each stage to classify the cancers at different granularities. Stage 4 identifies the actual cancer. Stage III classifier can classify the cancers at the tissue/organ level. At Stage I, the cancer is classified as belonging to one of the germ layers
Distribution of the samples used in prognosis prediction and treatment recommendation models
| Cancer type | # of patients | # of unique treatments per cancer | # reoccurrence cases |
|---|---|---|---|
| CESC | 52 | 4 | 9 |
| ESCA | 23 | 4 | 5 |
| LGG | 137 | 6 | 88 |
| LUSC | 15 | 7 | 3 |
| OV | 111 | 9 | 23 |
| PAAD | 45 | 4 | 32 |
| STAD | 37 | 5 | 13 |
| TGCT | 51 | 4 | 6 |
| UCS | 13 | 2 | 3 |
Preprocessing the cohort of 710 patients with full clinical and treatment information yielded a smaller subset of 476 patients
Fig. 3Cox scaling map. A graphical representation of the Cox scaling map of treatment space. Uses the MATLAB “jet” colormap, with black and red colors representing more prevalent treatments and green and blue colors representing less prevalent treatments. The (x, y, z) axes simply represent the 3 Cox-scaled coordinates assigned to each treatment. Treatments with edit distances less than 25 were further merged to form 29 unique treatments
Prognosis predictor classifier performance
| SVM with miRNA features | SVM without miRNA features | |
|---|---|---|
| Disease sensitivity | 0.86 | 0.76 |
| Disease specificity | 0.84 | 0.77 |
| Accuracy | 0.85 | 0.77 |
| Kappa statistic | 0.71 | 0.53 |
| Accuracy standard deviation | 0.03 | 0.05 |
| Kappa statistic standard deviation | 0.05 | 0.09 |
| Tuning parameters (Sigma) | 0.01 | 0.16 |