| Literature DB >> 30890858 |
Abedalrhman Alkhateeb1, Iman Rezaeian1, Siva Singireddy1, Dora Cavallo-Medved2, Lisa A Porter2, Luis Rueda1.
Abstract
Prostate cancer is one of the most common types of cancer among Canadian men. Next-generation sequencing using RNA-Seq provides large amounts of data that may reveal novel and informative biomarkers. We introduce a method that uses machine learning techniques to identify transcripts that correlate with prostate cancer development and progression. We have isolated transcripts that have the potential to serve as prognostic indicators and may have tremendous value in guiding treatment decisions. Analysis of normal versus malignant prostate cancer data sets indicates differential expression of the genes HEATR5B, DDC, and GABPB1-AS1 as potential prostate cancer biomarkers. Our study also supports PTGFR, NREP, SCARNA22, DOCK9, FLVCR2, IK2F3, USP13, and CLASP1 as potential biomarkers to predict prostate cancer progression, especially between stage II and subsequent stages of the disease.Entities:
Keywords: RNA-Seq analysis; machine learning; prostate cancer progression; transcriptomics signature
Year: 2019 PMID: 30890858 PMCID: PMC6416685 DOI: 10.1177/1176935119835522
Source DB: PubMed Journal: Cancer Inform ISSN: 1176-9351
Figure 1.A Schematic view of the proposed workflow for finding differential transcripts between benign versus malignant tumours and across various stages of prostate cancer.
Distribution of Long’s data set[17] samples in various stages of prostate cancer.
| Prostate stages | Description | No. of patients |
|---|---|---|
| T1c | The tumor can be a needle biopsy due to the elevated PSA level. But still cannot be detected during imaging test. | 14 |
| T2 | The tumor is found only in the prostate. | 10 |
| T2a | The tumor exists in less than a half (or half at most) in only one of prostate glands. | 23 |
| T2b | The tumor exists in more than a half in only one of prostate glands. | 11 |
| T2c | The tumor exists in both sides of the prostate. | 30 |
| T3 | The tumor has grown through prostate tissue into the outside. | 2 |
| T3a | The tumor has grown through the prostate either on 1 or both sides of the prostate. | 6 |
| T3b | The tumor has spread into the seminal vesicles | 8 |
| T4 | The tumor has spread to other organs. | 1 |
Abbreviation: PSA, prostate-specific antigen.
Data sets used in this study for malignant versus normal analysis with the number of samples in each data set.
| Data set | No. of tumor samples | References | |
|---|---|---|---|
| Malignant | Matched normal | ||
| Kim | 7 | 4 | Kim et al[ |
| Ren | 14 | 14 | Ren et al[ |
| Kannan | 10 | 10 | Kannan et al[ |
Differentially expressed transcripts identified in Kannan’s, Kim’s, and Ren’s data sets.
| Data set | Transcript ID | Gene name | Gene description |
|---|---|---|---|
| Kannan et al[ | NM_019024 | HEATR5B | HEAT repeat containing 5B |
| NM_001242889 | DDC | Dopa decarboxylase, transcript variant 6 | |
| NM_152228 | TAS1R3 | Taste 1 receptor member 3 | |
| NM_001204401 | XIAP | X-linked inhibitor of apoptosis, transcript variant 2 | |
| Kim et al[ | NR_024490 | GABPB1-AS1 | GABPB1 antisense RNA 1 |
| NM_001242889 | DDC | Dopa decarboxylase, transcript variant 6 | |
| NM_019024 | HEATR5B | HEAT repeat containing 5B | |
| NM_032415 | CARD11 | Caspase recruitment domain family member 11, transcript variant 2 | |
| Ren et al[ | NR_024490 | GABPB1-AS1 | GABPB1 antisense RNA 1 |
| NM_000424 | KRT5 | Keratin 5 | |
| NM_001128826 | NCS1 | Neuronal calcium sensor 1, transcript variant 2 | |
| NM_000494 | COL17A1 | Collagen type XVII alpha 1 chain | |
| NM_000700 | ANXA1 | Annexin A1 | |
| NM_005567 | LGALS3BP | Galectin 3 binding protein |
Transcripts that start with prefix NM are mRNAs, whereas the ones that start with NR are lncRNAs.
Figure 2.Genes corresponding to the differentially expressed transcripts identified in Kannan’s, Kim’s, and Ren’s data sets.
Figure 3.Expression of transcripts in malignant versus matched normal samples.
Figure 4.Performance of 5 different classifiers for matched normal versus malignant classification.
The list of the transcripts that differentiate stage T1C from T2.
| Transcript | Chr. | Gene | Gene description |
|---|---|---|---|
| NR_003669 | 16 | MT1IP | Metallothionein 1I, pseudogene (MT1IP), transcript variant 1 |
| NM_001160393 | 11 | TRPT1 | tRNA phosphotransferase 1 (TRPT1), transcript variant 6 |
| NM_001161345 | 12 | CHFR | Checkpoint with forkhead and ring finger domains, E3 ubiquitin protein ligase (CHFR), transcript variant 2 |
| NM_052857 | 17 | ZNF830 | Zinc finger protein 830 |
| NR_003594 | 8 | REXO1L2P | RNA exonuclease one homolog ( |
| NR_033240 | 14 | SLC25A21 | SLC25A21 antisense RNA 1 |
The list of the transcripts that differentiate stage T2C from T3/T4.
| Transcript | Chr. | Description | Gene |
|---|---|---|---|
| NM_001257413 | 17 | IKAROS family zinc finger 3 (Aiolos), transcript variant 12 | IKZF3 |
| NM_003940 | 3 | Ubiquitin-specific peptidase 13 (isopeptidase T-3) | USP13 |
| NM_001142274 | 2 | Cytoplasmic linker associated protein 1, transcript variant 3 | CLASP1 |
| NM_001199165 | 17 | Centrosomal protein 112kDa, transcript variant 3 | CEP112 |
| NM_052965 | 1 | tRNA splicing endonuclease subunit, transcript variant 1 | TSEN15 |
| NM_001195283 | 14 | Feline leukemia virus subgroup C cellular receptor family, member 2, transcript variant 2 | FLVCR2 |
| NM_001023567 | 15 | Golgin A8 family, member B, transcript variant 1 | GOLGA8B |
| NM_001143766 | 10 | Zinc finger protein 438, transcript variant 1 | ZNF438 |
| NR_003004 | 4 | Small Cajal body-specific RNA 22 | SCARNA22 |
| NM_017753 | 9 | Lipid phosphate phosphatase-related protein type 1, transcript variant 2 | LPPR1 |
| NM_000959 | 1 | Prostaglandin F receptor (FP), transcript variant 1 | PTGFR |
| NM_004772 | 5 | Neuronal regeneration related protein, transcript variant 1 | NREP |
Comparison between CuffDiff and our feature-selection method for identifying differentially expressed transcripts between each pair of consecutive stages of prostate cancer.
| Stage | Method | No. of selected transcripts | No. of common transcripts | ACC | FM | MCC | AUC |
|---|---|---|---|---|---|---|---|
| T1C-T2 (14 versus 10) | CuffDiff | 21 | 0 | 70.8% | 0.710 | 0.410 | 0.846 |
| Proposed method | 6 | 95.8% | 0.958 | 0.917 | 0.971 | ||
| T2-T2A (10 versus 23) | CuffDiff | 43 | 0 | 69.7% | 0.650 | 0.159 | 0.580 |
| Proposed method | 7 | 93.9% | 0.939 | 0.857 | 0.970 | ||
| T2A-T2B (23 versus 11) | CuffDiff | 35 | 0 | 64.7% | 0.601 | 0.068 | 0.634 |
| Proposed method | 6 | 85.3% | 0.851 | 0.657 | 0.826 | ||
| T2B-T2C (11 versus 30) | CuffDiff | 38 | 0 | 65.8% | 0.647 | 0.078 | 0.645 |
| Proposed method | 5 | 87.8% | 0.880 | 0.699 | 0.885 | ||
| T2C-T3A (30 versus 8) | CuffDiff | 29 | 0 | 73.7% | 0.722 | 0.130 | 0.612 |
| Proposed method | 5 | 89.4% | 0.895 | 0.683 | 0.948 | ||
| T3A-T3B (8 versus 9) | CuffDiff | 27 | 0 | 58.8% | 0.588 | 0.181 | 0.750 |
| Proposed method | 3 | 94.1% | 0.941 | 0.887 | 1.000 | ||
| T2C-T3/T4 (30 versus 17) | CuffDiff | 49 | 0 | 57.4% | 0.568 | 0.055 | 0.483 |
| Proposed method | 12 | 95.7% | 0.957 | 0.908 | 0.988 |
Abbreviations: ACC, accuracy; FM, F-measure; MCC, Matthews correlation coefficient; AUC, area under receiver operating characteristic curve.
Figure 5.Stage-specific expression level of transcripts that have been selected based on their significant expression changes between stages T1c and T2.
Figure 11.Stage-specific expression level of transcripts that have been selected based on their significant expression changes between stages T2c and T3/T4.
Figure 9.Stage-specific expression level of transcripts that have been selected based on their significant expression changes between stages T2c and T3a.
The list of the transcripts that differentiate stage T2 from T2A.
| Transcript | Chr. | Gene | Gene description |
|---|---|---|---|
| NM_004860 | 17 | FXR2 | Fragile X mental retardation, autosomal homolog 2 |
| NM_052850 | 19 | GADD45GIP1 | Growth arrest and DNA-damage-inducible, gamma interacting protein 1 |
| NM_001272095 | 16 | STX4 | Syntaxin 4, transcript variant 1 |
| NM_001261390 | 17 | CALCOCO2 | Calcium binding and coiled-coil domain 2, transcript variant 1 |
| NM_153274 | 1 | BEST4 | Bestrophin 4 |
| NM_001252641 | 19 | URI1 | Prefoldin-like chaperone, transcript variant 3 |
| NR_038352 | 5 | DCP2 | Decapping mRNA 2, transcript variant 3 |
The list of the transcripts that differentiate stage T2A from T2B.
| Transcript | Chr. | Gene | Gene description |
|---|---|---|---|
| NM_032023 | 10 | RASSF4 | Ras association (RalGDS/AF-6) domain family member 4 |
| NM_080792 | 20 | SIRPA | Signal-regulatory protein alpha (SIRPA), transcript variant 3 |
| NM_000095 | 19 | COMP | Cartilage oligomeric matrix protein |
| NM_003102 | 4 | SOD3 | Superoxide dismutase 3, extracellular |
| NM_080797 | 20 | DIDO1 | Death inducer-obliterator 1, transcript variant 3 |
| NM_002725 | 1 | PRELP | Proline/arginine-rich end leucine-rich repeat protein, transcript variant 1 |
The list of the transcripts that differentiate stage T2B from T2C.
| Transcript | Chr. | Gene | Gene description |
|---|---|---|---|
| NM_001711 | X | BGN | |
| NM_032023 | 10 | RASSF4 | Ras association (RalGDS/AF-6) domain family member 4 |
| NM_001014443 | 1 | USP21 | Ubiquitin-specific peptidase 21, transcript variant 3 |
| NM_021724 | 17 | NR1D1 | Nuclear receptor subfamily 1 group D, member 1 |
| NM_012098 | 9 | ANGPTL2 | Angiopoietin-like 2 |
The list of the transcripts that differentiate stage T2C from T3A.
| Transcript | Chr. | Description | Gene |
|---|---|---|---|
| NM_001198979 | 1 | Small ArfGAP2 (SMAP2), transcript variant 2 | SMAP2 |
| NM_001099285 | 2 | Prothymosin, alpha (PTMA), transcript variant 1 | TMSA |
| NM_001198899 | 1 | YY1 associated protein 1 (YY1AP1), transcript variant 6 | YY1AP1 |
| NM_001130048 | 13 | Dedicator of cytokinesis 9 (DOCK9), transcript variant 2 | DOCK9 |
| NM_000899 | 12 | KIT ligand (KITLG), transcript variant b | KITLG |
The list of the transcripts that differentiate stage T3A from T3B.
| Transcript | Chr. | Description | Gene |
|---|---|---|---|
| NR_034169 | 2 | Family with sequence similarity 133 member D pseudogene | FAM133DP |
| NM_015380 | 22 | Sorting and assembly machinery component 50 homolog, protein coding | SAMM50 |
| NR_046417 | 15 | Olfactory receptor family 4 subfamily F member 13 pseudogene | OR4F13P |