Literature DB >> 30890858

Transcriptomics Signature from Next-Generation Sequencing Data Reveals New Transcriptomic Biomarkers Related to Prostate Cancer.

Abedalrhman Alkhateeb¹, Iman Rezaeian¹, Siva Singireddy¹, Dora Cavallo-Medved², Lisa A Porter², Luis Rueda¹.

Abstract

Prostate cancer is one of the most common types of cancer among Canadian men. Next-generation sequencing using RNA-Seq provides large amounts of data that may reveal novel and informative biomarkers. We introduce a method that uses machine learning techniques to identify transcripts that correlate with prostate cancer development and progression. We have isolated transcripts that have the potential to serve as prognostic indicators and may have tremendous value in guiding treatment decisions. Analysis of normal versus malignant prostate cancer data sets indicates differential expression of the genes HEATR5B, DDC, and GABPB1-AS1 as potential prostate cancer biomarkers. Our study also supports PTGFR, NREP, SCARNA22, DOCK9, FLVCR2, IK2F3, USP13, and CLASP1 as potential biomarkers to predict prostate cancer progression, especially between stage II and subsequent stages of the disease.

Entities: CellLine Chemical Disease Gene Mutation Species

Keywords: RNA-Seq analysis; machine learning; prostate cancer progression; transcriptomics signature

Year: 2019 PMID： 30890858 PMCID： PMC6416685 DOI： 10.1177/1176935119835522

Source DB: PubMed Journal: Cancer Inform ISSN： 1176-9351

Introduction

Prostate cancer is one of the most common types of cancer among men worldwide. It is estimated that more than 1 in 1.2 million men were diagnosed with prostate cancer in 2015, resulting in more than 335 000 deaths.[1] A current obstacle in improving patient care is the inability to accurately predict tumors that are at a high risk for progression. Identifying reliable prognostic biomarkers to guide treatment decisions is a high priority in the prostate cancer field. Next-generation sequencing (NGS) has revolutionized genomic and transcriptomic analysis. RNA-Seq reads the transcriptome at a single-nucleotide resolution, revealing unexplored genomic and transcriptomic territories not revealed using conventional technologies, such as microarray,[2,3] RNA-Seq represents a high-throughput technique capable of identifying nonconventional biomarkers, such as noncoding RNA and alternative splicing events.[2,3] Alternative splicing can produce protein isoforms with potentially different functions from the same DNA sequence. Indeed, approximately half of all active splicing events are altered in ovarian and breast tumors.[4] RNA-Seq can also measure transcriptomic activity and transcriptome assembly to provide a better understanding of the regulation of corresponding protein isoforms.[5-8] A typical RNA-Seq experiment, however, produces a large amount of data, and therefore, demands considerable computational resources in both time and space. Using machine learning to analyze RNA-Seq data can reduce redundant and irrelevant information while providing a selection of potentially significant biomarkers for biological validation. Optimizing a computational approach to effectively isolate novel splice variants from RNA-Seq data may provide invaluable clues about novel biomarkers for detecting and predicting the progression of prostate cancer. Several studies have used RNA-Seq to identify new potential biomarkers for prostate cancer. Feng et al[9] presented a comprehensive review of the most recent studies on alternative splicing in cancer using RNA-Seq data. This included an overview of several publicly available RNA-Seq data sets and the most recent open-source bioinformatics tools for RNA-Seq data analysis. Recent studies using RNA-Seq for prostate cancer analysis include genome-wide association and variation studies, noncoding RNAs (eg, microRNA, lincRNA, and siRNA), somatic mutations, chimeric RNA, and gene fusion. Kannan et al[10] used RNA-Seq on 20 human prostate cancer and 10 matched benign prostate tissues from patients who had received no preoperative therapy prior to radical prostatectomy and identified a potential link between increased chimeric RNA events and prostate cancer. Pflueger et al[11] used RNA-Seq data from 25 human prostate cancer samples and isolated 7 novel gene fusions related to prostate cancer, including TMPRSS2-ERG. TMPRSS2-ERG gene fusion is present in 50% to 90% of human prostate cancers and has been identified as an early molecular event associated with invasion of the disease.[12] Ren et al[13] also identified recurrent gene fusions in 14 primary prostate tumors from a Chinese population. Although they found TRMPRSS2-ERG fusion to occur at a very low frequency, they isolated additional novel gene fusions, CTAGE5-KHDRBS3 and USP9Y-TTTY15, that frequently occurred in the Chinese cohort. These conflicting reports illustrate that disparity exists among prostate cancer patients of different ethnic backgrounds. In another study, Xu et al[14] identified 92 new genes with somatic mutations in human prostate cancer. Their study used RNA-Seq data from 5 cancer patients to detect variants of chromosomal rearrangements, insertions, and deletions. Of significance, they identified a frame-shift mutation in the coding region of TNFSF10 that disrupts its ability to induce apoptosis, a change that may promote tumor progression. Prensner et al[15] focused on new noncoding RNA and found an unannotated lincRNA, PCAT-1, a prostate-specific regulator of cell proliferation. Exploiting the high-resolution features of RNA-Seq that allow for reconstructing the transcriptome, inferring protein isoforms, and their corresponding protein function can offer an integrative approach to better understand the onset and progression of the disease. Thus, in this study, we extended our earlier study[16] for detecting differential expressed transcripts in prostate cancer using RNA-Seq data. This model identifies transcripts associated with malignant tumors as compared with corresponding matched normal samples and transcripts that are differentially expressed during disease progression through different TNM stages. Our analysis revealed several transcripts that may be used as potential biomarkers for predicting prostate cancer and disease progression.

Methods

Data preprocessing

Figure 1 depicts the pipeline of our proposed model. Initially, samples are pre-processed individually by filtering the mRNA reads of each sample[17] and mapping them to the Human Genome (hg19) using Tophat2,[18] 2 fast methods for mapping splice junctions and aligning short reads, respectively. In the next step, we use Cufflinks[6] for assembling the transcriptome using the mapped reads from the previous step based on RefSeq annotation.[19] For all samples, we used Cufflinks to estimate the relative abundances of the transcripts in fragments per kilobase of exon per million of mapped reads (FPKM) values. We run Tophat2 and Cufflinks using the default values.

Figure 1.

A Schematic view of the proposed workflow for finding differential transcripts between benign versus malignant tumours and across various stages of prostate cancer.

Obtaining discriminative transcripts

The deliverables of our study are 2-fold. First, we aim to identify a gene signature that predicts prostate cancer by comparing cancer versus their matched normal counterparts. Second, we focus on the differential expression of gene transcripts in a pairwise analysis of various stages of prostate cancer progression; these transcripts are considered as discriminative transcripts for a specific stage. Using the latter, we anticipate that this type of analysis will reveal discriminative transcripts that are potential biomarkers for prediction of disease progression. The literature confirms that those discriminative transcripts are strongly related to cancer progression; however, a deeper investigation with wet-lab experiments are required to confirm them as predictive or biomarker transcripts. The products of these biomarkers may then be identified using routine blood or urine tests to predict progression.

Normal versus malignant

We consider the identification of differentially expressed transcripts in normal versus malignant prostate cells as a 2-class classification problem, where each transcript is used as a feature along with FPKM as feature value. After obtaining the transcripts using Cufflinks, we used minimum Redundancy Maximum Relevance (mRMR).[20] mRMR tries to select a subset of features that maximize the relevance, which means to increase the correlation within a class and minimize the correlation between themselves (redundancy). The method incorporates the standard classifier and forward-selection the features that improve the classification measurements. After feature selection, we used several standard classifiers to find the best accuracy for classifying the consecutive stages/sub-stages. The selected transcripts from the previous step were used to optimize the classification performance; it is also easier to validate a smaller subset of genes. The classifiers used for comparison include support vector machine (SVM)[21] with the radial basis function (RBF), linear and polynomial kernels, random forest,[22] decision tree,[23] and naïve Bayes.[24]

Prostate cancer progression

We modeled the machine learning problem as binary class problems; for each 2 consecutive stages/sub-stages, we created a binary class problems. We considered the stages/sub-stages from Table 2 as the classes, so we selected T1c versus T2, T2 versus T2a, and so on to create the binary class problems. For each binary class problem, the reconstructed transcripts are the features and the quantified FPKM values for each sample’s transcript are the values of the features, and the labels are the stages/sub-stages of the samples from that pair of binary consecutive classes. To avoid overfitting, we merged all T3 and its sub-stage (T3a, T3b) samples with T4 samples, and then labeled the merged class samples with T3/T4 class label. The discriminative transcripts serve as differentially expressed transcripts because they are able to identify class from another.

Table 2.

Distribution of Long’s data set[17] samples in various stages of prostate cancer.

Prostate stages	Description	No. of patients
T_1c	The tumor can be a needle biopsy due to the elevated PSA level. But still cannot be detected during imaging test.	14
T₂	The tumor is found only in the prostate.	10
T_2a	The tumor exists in less than a half (or half at most) in only one of prostate glands.	23
T_2b	The tumor exists in more than a half in only one of prostate glands.	11
T_2c	The tumor exists in both sides of the prostate.	30
T₃	The tumor has grown through prostate tissue into the outside.	2
T_3a	The tumor has grown through the prostate either on 1 or both sides of the prostate.	6
T_3b	The tumor has spread into the seminal vesicles	8
T₄	The tumor has spread to other organs.	1

Abbreviation: PSA, prostate-specific antigen.

We started the feature selection process on 43 497 reconstructed transcripts, and then the numbers were narrowed down at each binary classification problem to a few discriminative transcripts; we used Weka[24] data mining tool to run mRMR on the features. We first normalized the features and then used mRMR on SVM with linear kernel as a classifier inside the wrapper method. The reason behind choosing the linear kernel is because of the heavy cost of applying forward-selection in the wrapper method using polynomial or RBF kernels.

Data Availability

We used 3 data sets, Kim’s,[25] Ren’s,[13] and Kannan’s,[10] each containing matched normal versus malignant prostate cancer tumor samples. Ren’s data set used random hexamer primers, whereas the others’ data sets used oligo (DT) primers. All these data sets are in sequence read archive (SRA) file format and are publicly available from the National Center for Biotechnology Information (NCBI) repository. Table 1 shows the number of samples in each data set.

Table 1.

Data sets used in this study for malignant versus normal analysis with the number of samples in each data set.

Data set	No. of tumor samples		References
Data set	Malignant	Matched normal	References
Kim	7	4	Kim et al[18]
Ren	14	14	Ren et al[13]
Kannan	10	10	Kannan et al[10]

Data sets used in this study for malignant versus normal analysis with the number of samples in each data set. In addition, we used the data set from Long et al[26] which contains prostate cancer progression stages using 104 samples from 100 patients. Table 2 shows the distribution of samples across various stages of prostate cancer in this data set. Distribution of Long’s data set[17] samples in various stages of prostate cancer. Abbreviation: PSA, prostate-specific antigen.

Results

Using the proposed model, we conducted 2 different experiments: first, on malignant tumors versus their matched normal counterparts, and second, on samples from various stages of prostate cancer progression.

Malignant versus matched normal comparison

We tested and validated our proposed wrapper-based feature-selection method on 3 different data sets (Kannan’s, Kim’s, and Ren’). Table 3 and Figure 2 show the differentially expressed transcripts (i.e., malignant versus normal) identified in each data set. Two of the identified transcripts (NM_019024 and NM_001242889; corresponding to genes HEATR5B and DDC, respectively) were common between Kannan’s and Kim’s data sets, whereas one identified transcript (NR_024490; corresponding to the gene GABPB1-A51) was common between both Kim’s and Ren’s data sets.

Table 3.

Differentially expressed transcripts identified in Kannan’s, Kim’s, and Ren’s data sets.

Data set	Transcript ID	Gene name	Gene description
Kannan et al[10]	NM_019024	HEATR5B	HEAT repeat containing 5B
	NM_001242889	DDC	Dopa decarboxylase, transcript variant 6
	NM_152228	TAS1R3	Taste 1 receptor member 3
	NM_001204401	XIAP	X-linked inhibitor of apoptosis, transcript variant 2
Kim et al[18]	NR_024490	GABPB1-AS1	GABPB1 antisense RNA 1
	NM_001242889	DDC	Dopa decarboxylase, transcript variant 6
	NM_019024	HEATR5B	HEAT repeat containing 5B
	NM_032415	CARD11	Caspase recruitment domain family member 11, transcript variant 2
Ren et al[13]	NR_024490	GABPB1-AS1	GABPB1 antisense RNA 1
	NM_000424	KRT5	Keratin 5
	NM_001128826	NCS1	Neuronal calcium sensor 1, transcript variant 2
	NM_000494	COL17A1	Collagen type XVII alpha 1 chain
	NM_000700	ANXA1	Annexin A1
	NM_005567	LGALS3BP	Galectin 3 binding protein

Transcripts that start with prefix NM are mRNAs, whereas the ones that start with NR are lncRNAs.

Figure 2.

Genes corresponding to the differentially expressed transcripts identified in Kannan’s, Kim’s, and Ren’s data sets.

Differentially expressed transcripts identified in Kannan’s, Kim’s, and Ren’s data sets. Transcripts that start with prefix NM are mRNAs, whereas the ones that start with NR are lncRNAs. Genes corresponding to the differentially expressed transcripts identified in Kannan’s, Kim’s, and Ren’s data sets. Figure 3 shows the average of transcript abundance for malignant versus matched normal samples. The bars represent mean FPKM values for the 3 common transcripts selected. The averages of FPKM values were calculated for both malignant and matched normal samples in the 3 data sets in such a way that the result of each data set is comparable on an uneven field. Transcript NM_001242889 (DDC) was found to be differentially expressed in malignant samples compared with matched normal samples. DDC has previously been shown to be over-expressed in cancer samples compared with their matched normal samples.[27] Similar patterns were observed in our results, which suggest that DDC gene is a relevant biomarker for prostate cancer.

Figure 3.

Expression of transcripts in malignant versus matched normal samples.

Expression of transcripts in malignant versus matched normal samples. Figure 4 shows the performance of 5 different classifiers on discriminating malignant samples from their matched normal counterparts in the 3 data sets. The classifiers were trained with default parameters and validated via the 10-fold cross-validation approach. We used accuracy (ACC) and area under receiver operating characteristic curve (AUC) to evaluate the performance of the classifiers, which show that the SVM classifier with a linear kernel outperformed all other classifiers for the 3 data sets. These classification results show that using a handful of transcripts—less than 10 for each data set—malignant tumors can be easily identified with almost perfect accuracy, in most cases. This has an important implication in clinical contexts, by virtue of the fact that effective and simple tools for diagnosis and prognosis of the disease can be developed.

Figure 4.

Performance of 5 different classifiers for matched normal versus malignant classification.

Prostate cancer progression

We applied the proposed method to compare different stages of prostate cancer using the data set from Long et al[26] for this comparison. Our method identified 44 transcripts expressed differentially between pairs of stages (e.g., T1, T2, T3, and T4) or sub-stages of prostate cancer progression (e.g., T2a, T2b, and T2c), collectively. Each pair of consecutive stages, namely, T1c-T2, T2-T2a, T2a-T2b, T2b-T2c, T2c-T3a, T3a-T3b, and T2c-T3/T4 was fed to a classifier, modeled as a 2-class dichotomizer that distinguishes stage A versus stage B, for an A–B pair. Then, mRMR as a wrapper-based feature selection approach was applied to the data set. SVM was used as a classifier with default parameters to obtain the best set of features, where the performance measure is accuracy. As a result of applying the feature selection and classification algorithms, each pair of consecutive stages led to 6, 7, 6, 5, 5, 3, and 12 differentially expressed transcripts, respectively. Tables 4 to 10 provide a list and the corresponding description of the top discriminative transcripts between different pairs of stages/sub-stages of prostate cancer progression. As shown in the tables, the largest number of discriminative transcripts was found between the T2c-T3/T4 pairwise stages.

Table 4.

The list of the transcripts that differentiate stage T1C from T2.

Transcript	Chr.	Gene	Gene description
NR_003669	16	MT1IP	Metallothionein 1I, pseudogene (MT1IP), transcript variant 1
NM_001160393	11	TRPT1	tRNA phosphotransferase 1 (TRPT1), transcript variant 6
NM_001161345	12	CHFR	Checkpoint with forkhead and ring finger domains, E3 ubiquitin protein ligase (CHFR), transcript variant 2
NM_052857	17	ZNF830	Zinc finger protein 830
NR_003594	8	REXO1L2P	RNA exonuclease one homolog (S. cerevisiae)-like 2
NR_033240	14	SLC25A21	SLC25A21 antisense RNA 1

Table 10.

The list of the transcripts that differentiate stage T2C from T3/T4.

Transcript	Chr.	Description	Gene
NM_001257413	17	IKAROS family zinc finger 3 (Aiolos), transcript variant 12	IKZF3
NM_003940	3	Ubiquitin-specific peptidase 13 (isopeptidase T-3)	USP13
NM_001142274	2	Cytoplasmic linker associated protein 1, transcript variant 3	CLASP1
NM_001199165	17	Centrosomal protein 112kDa, transcript variant 3	CEP112
NM_052965	1	tRNA splicing endonuclease subunit, transcript variant 1	TSEN15
NM_001195283	14	Feline leukemia virus subgroup C cellular receptor family, member 2, transcript variant 2	FLVCR2
NM_001023567	15	Golgin A8 family, member B, transcript variant 1	GOLGA8B
NM_001143766	10	Zinc finger protein 438, transcript variant 1	ZNF438
NR_003004	4	Small Cajal body-specific RNA 22	SCARNA22
NM_017753	9	Lipid phosphate phosphatase-related protein type 1, transcript variant 2	LPPR1
NM_000959	1	Prostaglandin F receptor (FP), transcript variant 1	PTGFR
NM_004772	5	Neuronal regeneration related protein, transcript variant 1	NREP

The list of the transcripts that differentiate stage T1C from T2. The list of the transcripts that differentiate stage T2 from T2A. The list of the transcripts that differentiate stage T2A from T2B. The list of the transcripts that differentiate stage T2B from T2C. The list of the transcripts that differentiate stage T2C from T3A. The list of the transcripts that differentiate stage T3A from T3B. The list of the transcripts that differentiate stage T2C from T3/T4. The results of applying mRMR feature selection method to identify the most differentially expressed transcripts between pairs of consecutive classes were compared with the results obtained after applying CuffDiff,[6] a tool that uses statistical methods to identify differentially expressed transcripts. The reason for selecting CuffDiff rather than the other state-of-art differential expression analysis tools is that it outperforms the other tools when it comes to isoforms analysis despite reports that it is less accurate and performs slower than other tools.[28] In each pair of consecutive stages, the proposed model identified fewer selected transcripts as compared with the CuffDiff model (Table 11). We evaluated the performance of the 2 models above using different performance measures that include ACC, F-measure (FM), Matthews correlation coefficient (MCC), and AUC. For classification, we used the cost-sensitive meta-classifier model along with random forest classifier (100 trees) with the same settings for both models. In each case, we obtained a much higher performance using transcripts selected from our feature-selection method as compared to CuffDiff. Importantly, we observed no overlap between transcripts detected by the 2 models, stressing the importance of the new method for isolating hits as biomarkers for progression of prostate cancer.

Table 11.

Comparison between CuffDiff and our feature-selection method for identifying differentially expressed transcripts between each pair of consecutive stages of prostate cancer.

Stage	Method	No. of selected transcripts	ACC	FM	MCC	AUC
T_1C-T₂ (14 versus 10)	CuffDiff	21	70.8%	0.710	0.410	0.846
T_1C-T₂ (14 versus 10)	Proposed method	6	95.8%	0.958	0.917	0.971
T₂-T_2A (10 versus 23)	CuffDiff	43	69.7%	0.650	0.159	0.580
T₂-T_2A (10 versus 23)	Proposed method	7	93.9%	0.939	0.857	0.970
T_2A-T_2B (23 versus 11)	CuffDiff	35	64.7%	0.601	0.068	0.634
T_2A-T_2B (23 versus 11)	Proposed method	6	85.3%	0.851	0.657	0.826
T_2B-T_2C (11 versus 30)	CuffDiff	38	65.8%	0.647	0.078	0.645
T_2B-T_2C (11 versus 30)	Proposed method	5	87.8%	0.880	0.699	0.885
T_2C-T_3A (30 versus 8)	CuffDiff	29	73.7%	0.722	0.130	0.612
T_2C-T_3A (30 versus 8)	Proposed method	5	89.4%	0.895	0.683	0.948
T_3A-T_3B (8 versus 9)	CuffDiff	27	58.8%	0.588	0.181	0.750
T_3A-T_3B (8 versus 9)	Proposed method	3	94.1%	0.941	0.887	1.000
T_2C-T₃/T₄ (30 versus 17)	CuffDiff	49	57.4%	0.568	0.055	0.483
T_2C-T₃/T₄ (30 versus 17)	Proposed method	12	95.7%	0.957	0.908	0.988

Abbreviations: ACC, accuracy; FM, F-measure; MCC, Matthews correlation coefficient; AUC, area under receiver operating characteristic curve.

Comparison between CuffDiff and our feature-selection method for identifying differentially expressed transcripts between each pair of consecutive stages of prostate cancer. Abbreviations: ACC, accuracy; FM, F-measure; MCC, Matthews correlation coefficient; AUC, area under receiver operating characteristic curve. Figures 5 to 11 depict transcripts listed in Tables 4 to 10, respectively, across different stages of prostate cancer. The x-axis shows the stages of prostate cancer, whereas the y-axis shows the median of FPKM values of samples in each stage. Of particular interest are transcripts that are significantly altered at the critical transition from stage T2 to T3/T4 (Figures 9 and 11). DOCK9 (Figure 9) and FLVCR2 IK2F3, USP13, PTGFR, CLASP1 (Figure 11) are all transcripts that significantly increase at the T2 transition and remain elevated in advanced prostate cancer stages. These may represent novel biomarkers—either individually or combined as a signature. They may also represent novel targets for therapeutic intervention.

Figure 5.

Stage-specific expression level of transcripts that have been selected based on their significant expression changes between stages T1c and T2.

Figure 11.

Stage-specific expression level of transcripts that have been selected based on their significant expression changes between stages T2c and T3/T4.

Figure 9.

Stage-specific expression level of transcripts that have been selected based on their significant expression changes between stages T2c and T3a.

Stage-specific expression level of transcripts that have been selected based on their significant expression changes between stages T1c and T2. Stage-specific expression level of transcripts that have been selected based on their significant expression changes between stages T2 and T2a. Stage-specific expression level of transcripts that have been selected based on their significant expression changes between stages T2a and T2b. Stage-specific expression level of transcripts that have been selected based on their significant expression changes between stages T2b and T2c. Stage-specific expression level of transcripts that have been selected based on their significant expression changes between stages T2c and T3a. Stage-specific expression level of transcripts that have been selected based on their significant expression changes between stages T3a and T3b. Stage-specific expression level of transcripts that have been selected based on their significant expression changes between stages T2c and T3/T4.

Discussion

Identifying novel biomarkers to clearly distinguish between low and high-risk prostate cancer progression is a significant step toward directing treatment strategies that are efficacious yet minimally invasive. Using the power of NGS and machine learning techniques, we found several transcripts that have the potential to serve as prognostic indicators in guiding treatment decisions. These transcripts constitute a genomic and transcriptomic signature of prostate cancer and its progression, which has never been characterized before. Further studies using wet-lab experiments and clinical assays will be essential to confirm the presence of these biomarkers in particular biological processes involved in the disease and its progression. Some of our isolated genes have previously been linked to other forms of cancer. For example, NREP (P311) is a transcript upregulated in stages T3 to T4 as compared with T2c. Although there are no published reports on the role of NREP on prostate cancer, it has been shown to be involved in glioma motility and invasion via the reorganization of the actin cytoskeleton at the periphery of these cells.[29] Upregulation of NREP expression from stages T3 to T4 is consistent with the invasion of prostate cancer cells extending beyond the prostatic capsule during this stage. Our results also revealed upregulation of the gene expression of the small Cajal body-specific RNA (SCARNA22) from stages T2c to T3/T4. SCARNA22 is a noncoding RNA involved in the maturation of other RNA molecules, and along with other small nucleolar RNA, it has been linked to human cancers.[30] Typically located in the introns of host genes, upregulation of SCARNA22 was found in multiple myeloma harboring chromosomal translocations and may suppress oxidative stress, facilitate cell proliferation, and protect cells from the effects of chemotherapy.[31] Our study is the first to link SCARNA22 with prostate cancer and progression of the disease. In particular, we have isolated a set of transcripts that are significantly altered at the critical transition between stages T2 and T3/4 and remain elevated. These are transcripts from the genes DOCK9, FLVCR2, IK2F3, USP13, PTGFR, and CLASP1. In the human protein atlas, Dock9, Clasp1, and USP13 protein levels are highly expressed in prostate cancer tissues. Dock9 is a Rho GEF responsible for activating Rho-GTPases and known to be implicated in tumorigenesis,[32,33] Although the protein atlas has not detected PTGFR as highly stained in prostate cancer, gene expression of PTGFR is associated with cell proliferation and in vivo progression to castration-recurrent prostate cancer, an end stage of the disease.[34] PTGFR is a membrane receptor for the prostaglandin F2alpha and a potent luteolytic agent. It has previously been shown to be highly expressed in endometrial adenocarcinomas.[35] In ovarian cancer, overexpression of PTGFR stimulates the spontaneous development and secretion of autoantibodies against the protein, as detected in the serum samples of patients with cancer.[36] Autoantibodies against PTGFR may serve as biomarkers for early serological detection of the disease. Overexpression of PTGFR has also been reported in human tumor-endothelial cells of renal cell carcinoma where it is believed to be involved in tumor angiogenesis.[37] Whether these transcripts or their protein products can be used alone or in combination as a prognostic indicator for prostate cancer is an important next step of this work. It is also interesting to consider that these protein products may represent novel drug targets for advanced disease.

Table 5.

The list of the transcripts that differentiate stage T2 from T2A.

Transcript	Chr.	Gene	Gene description
NM_004860	17	FXR2	Fragile X mental retardation, autosomal homolog 2
NM_052850	19	GADD45GIP1	Growth arrest and DNA-damage-inducible, gamma interacting protein 1
NM_001272095	16	STX4	Syntaxin 4, transcript variant 1
NM_001261390	17	CALCOCO2	Calcium binding and coiled-coil domain 2, transcript variant 1
NM_153274	1	BEST4	Bestrophin 4
NM_001252641	19	URI1	Prefoldin-like chaperone, transcript variant 3
NR_038352	5	DCP2	Decapping mRNA 2, transcript variant 3

Table 6.

The list of the transcripts that differentiate stage T2A from T2B.

Transcript	Chr.	Gene	Gene description
NM_032023	10	RASSF4	Ras association (RalGDS/AF-6) domain family member 4
NM_080792	20	SIRPA	Signal-regulatory protein alpha (SIRPA), transcript variant 3
NM_000095	19	COMP	Cartilage oligomeric matrix protein
NM_003102	4	SOD3	Superoxide dismutase 3, extracellular
NM_080797	20	DIDO1	Death inducer-obliterator 1, transcript variant 3
NM_002725	1	PRELP	Proline/arginine-rich end leucine-rich repeat protein, transcript variant 1

Table 7.

The list of the transcripts that differentiate stage T2B from T2C.

Transcript	Chr.	Gene	Gene description
NM_001711	X	BGN	Homo sapiens biglycan
NM_032023	10	RASSF4	Ras association (RalGDS/AF-6) domain family member 4
NM_001014443	1	USP21	Ubiquitin-specific peptidase 21, transcript variant 3
NM_021724	17	NR1D1	Nuclear receptor subfamily 1 group D, member 1
NM_012098	9	ANGPTL2	Angiopoietin-like 2

Table 8.

The list of the transcripts that differentiate stage T2C from T3A.

Transcript	Chr.	Description	Gene
NM_001198979	1	Small ArfGAP2 (SMAP2), transcript variant 2	SMAP2
NM_001099285	2	Prothymosin, alpha (PTMA), transcript variant 1	TMSA
NM_001198899	1	YY1 associated protein 1 (YY1AP1), transcript variant 6	YY1AP1
NM_001130048	13	Dedicator of cytokinesis 9 (DOCK9), transcript variant 2	DOCK9
NM_000899	12	KIT ligand (KITLG), transcript variant b	KITLG

Table 9.

The list of the transcripts that differentiate stage T3A from T3B.

Transcript	Chr.	Description	Gene
NR_034169	2	Family with sequence similarity 133 member D pseudogene	FAM133DP
NM_015380	22	Sorting and assembly machinery component 50 homolog, protein coding	SAMM50
NR_046417	15	Olfactory receptor family 4 subfamily F member 13 pseudogene	OR4F13P

19 in total

1. Identifying intragenic functional modules of genomic variations associated with cancer phenotypes by learning representation of association networks.

Authors: Minsu Kim; Jennifer E Huffman; Amy Justice; Ian Goethert; Greeshma Agasthya; Ioana Danciu
Journal: BMC Med Genomics Date: 2022-07-06 Impact factor: 3.622

2. Identification of Potential Key Genes and Pathways in Enzalutamide-Resistant Prostate Cancer Cell Lines: A Bioinformatics Analysis with Data from the Gene Expression Omnibus (GEO) Database.

Authors: Long Zheng; Xiaojie Dou; Xiaodong Ma; Wei Qu; Xiaoshuang Tang
Journal: Biomed Res Int Date: 2020-07-16 Impact factor: 3.411

3. Prediction of tumor location in prostate cancer tissue using a machine learning system on gene expression data.

Authors: Osama Hamzeh; Abedalrhman Alkhateeb; Julia Zheng; Srinath Kandalam; Luis Rueda
Journal: BMC Bioinformatics Date: 2020-03-11 Impact factor: 3.169

4. Immune-Related Gene-Based Novel Subtypes to Establish a Model Predicting the Risk of Prostate Cancer.

Authors: Enchong Zhang; Jieqian He; Hui Zhang; Liping Shan; Hongliang Wu; Mo Zhang; Yongsheng Song
Journal: Front Genet Date: 2020-11-13 Impact factor: 4.599

5. Expression of DOCK9 and DOCK11 Analyzed with Commercial Antibodies: Focus on Regulation of Mutually Exclusive First Exon Isoforms.

Authors: Antonio Parrado
Journal: Antibodies (Basel) Date: 2020-06-27

6. GEOlimma: differential expression analysis and feature selection using pre-existing microarray data.

Authors: Liangqun Lu; Kevin A Townsend; Bernie J Daigle
Journal: BMC Bioinformatics Date: 2021-02-03 Impact factor: 3.169

7. An integrated approach to biomarker discovery reveals gene signatures highly predictive of cancer progression.

Authors: Kevin L Sheng; Lin Kang; Kevin J Pridham; Logan E Dunkenberger; Zhi Sheng; Robin T Varghese
Journal: Sci Rep Date: 2020-12-04 Impact factor: 4.379

8. Computational Strategies for the Identification of a Transcriptional Biomarker Panel to Sense Cellular Growth States in Bacillus subtilis.

Authors: Yiming Huang; Wendy Smith; Colin Harwood; Anil Wipat; Jaume Bacardit
Journal: Sensors (Basel) Date: 2021-04-01 Impact factor: 3.576

9. A Hierarchical Machine Learning Model to Discover Gleason Grade-Specific Biomarkers in Prostate Cancer.

Authors: Osama Hamzeh; Abedalrhman Alkhateeb; Julia Zhuoran Zheng; Srinath Kandalam; Crystal Leung; Govindaraja Atikukke; Dora Cavallo-Medved; Nallasivam Palanisamy; Luis Rueda
Journal: Diagnostics (Basel) Date: 2019-12-11

10. Transcriptome Analysis Identifies Novel Mechanisms Associated with the Antitumor Effect of Chitosan-Stabilized Selenium Nanoparticles.

Authors: Hector Estevez; Estefania Garcia-Calvo; Jose Rivera-Torres; María Vallet-Regí; Blanca González; Jose L Luque-Garcia
Journal: Pharmaceutics Date: 2021-03-08 Impact factor: 6.321