| Literature DB >> 32420338 |
Ana Carolina Mello1,2, Martiela Freitas1,2,3, Laura Coutinho1,2,4, Tiago Falcon1,2, Ursula Matte1,2,5.
Abstract
Uterine corpus endometrial carcinoma (UCEC) is the second most common type of gynecological tumor. Several research studies have recently shown the potential of different ncRNAs as biomarkers for prognostics and diagnosis in different types of cancers, including UCEC. Thus, we hypothesized that long noncoding RNAs (lncRNAs) could serve as efficient factors to discriminate solid primary (TP) and normal adjacent (NT) tissues in UCEC with high accuracy. We performed an in silico differential expression analysis comparing TP and NT from a set of samples downloaded from the Cancer Genome Atlas (TCGA) database, targeting highly differentially expressed lncRNAs that could potentially serve as gene expression markers. All analyses were performed in R software. The receiver operator characteristics (ROC) analyses and both supervised and unsupervised machine learning indicated a set of 14 lncRNAs that may serve as biomarkers for UCEC. Functions and putative pathways were assessed through a coexpression network and target enrichment analysis.Entities:
Mesh:
Substances:
Year: 2020 PMID: 32420338 PMCID: PMC7199595 DOI: 10.1155/2020/3968279
Source DB: PubMed Journal: Biomed Res Int Impact factor: 3.411
Figure 1Boxplot after normalization of total RNA samples (training set data). Primary tumor samples (TP) (sky blue); adjacent tumor samples (TP) (purple). Black line inside the boxes indicates the median position. Dotted lines indicate the expression deviations.
Figure 2Differentially expressed genes highlighted. (a) Volcano plot of the differential expression analysis of total RNA in primary tumor (TP) compared with adjacent tissue (NT). TP upregulated transcripts (67 genes) with log fold change (logFC) > 5 (red). TP downregulated transcripts with logFC < −5 (124 genes) (green). Horizontal purple line indicates the −log10 (FDR) = 2 (FDR = 0.01) cutoff. Grey vertical lines indicate logFC = −5 and logFC = 5 cutoffs. (b) Heat map of the expression of the 50 most differentially expressed transcripts. Dendrogram: the clustering of TP samples (sky blue); the clustering of NT samples (purple). Top left: colorkey for the heat expression quantification.
Long noncoding RNAs that discriminate TP from NT samples.
| lncRNA | logFC | FDR | AUC |
|---|---|---|---|
|
| 5.88492063377296 | 2.37514982980919 | 0.913 |
|
| −5.05883480566089 | 2.3408685424323 | 0.820 |
|
| −5.11057542483241 | 2.25688066033936 | 0.892 |
|
| −5.15921206745447 | 8.93670161058092 | 0.744 |
|
| −5.16386849714406 | 8.14134337771604 | 0.717 |
|
| −5.17825782522368 | 2.06667461632931 | 0.633 |
|
| −5.20814071478136 | 4.04097876674846 | 0.601 |
|
| −5.29670533561579 | 2.78034526756707 | 0.535 |
|
| −5.38175032295688 | 4.77496645881152 | 0.659 |
|
| −5.38578795348022 | 2.25613841195626 | 0.694 |
|
| −5.64269979170812 | 0.000474000179607125 | 0.516 |
|
| −5.92292108357159 | 4.02699887083864 | 0.624 |
|
| −6.32585914185104 | 9.39257125151877 | 0.643 |
|
| −7.78029289441235 | 2.81489971171908 | 0.471 |
|
| −7.93247628000136 | 1.81453269242651 | 0.674 |
|
| −8.15747923329883 | 4.51757852734348 | 0.511 |
|
| −9.64327159184569 | 2.76479236381279 | 0.578 |
Figure 3ROC curve of the lncRNAs that obtained AUC greater than 7; sensitivity (true positive rate) (y axis); 1-specificity (x axis) (false negative rate).
SVM analysis for the three tested sets of lncRNAS.
| Set | Accuracy | CI (95%) |
| Kappa | Sensitivity | Specificity |
|---|---|---|---|---|---|---|
| First set | 0.9583 | 0.7888, 0.9989 | 1.49 | 0.9167 | 0.9167 | 1.0000 |
| Second set | 0.9167 | 0.6262, 0.9526 | 1.794 | 0.8333 | 0.9167 | 0.9167 |
| Third set | 0.9167 | 0.73, 0.9897 | 1.794 | 0.8333 | 0.9167 | 0.9167 |
The first set is comprised of 14 lncRNAs; the second set is composed of the five lncRNAs with AUC > 0.7; the third set is composed by the AUC's top two lncRNAs, as shown in Table 1.
Figure 4Scatter plot of the fitted support vector machine model based on the expression of the top 2 lncRNAs. Classes: primary tumor (TP) and adjacent tissue (NT). Support vectors instances: x. Expression values are in log10 scale.
Figure 5Hierarchical clustering analysis using the train set (23 patients with TP and NT samples for each) based on the expression of the 14 lncRNAs with |log fold change| > 5. Before each sample ID, there is the indication if the samples belong to the TP or NT group. TP: primary solid tumor. NT: adjacent normal tissue. Red values (au) represent the group support. Green values (bp) represent the bootstrap support. Grey values (edge) represent the limit of the branches. Red square highlights the clustering of NT samples.
Figure 6Representative gene coexpression networks. Blue circles are the lncRNAs; green circles are the miRNAs; purple circles are the highlighted genes; white circles are the genes. Each edge represents an r > 0.7 or r < −0.7 and p value < 0.05. Red edges highlight the discussed pathways.