| Literature DB >> 29229974 |
Enrico Capobianco1, Camilo Valdes2, Samanta Sarti3, Zhijie Jiang2, Laura Poliseno4, Nicolas F Tsinoremas2,5.
Abstract
We studied the transcriptome landscape of skin cutaneous melanoma (SKCM) using 103 primary tumor samples from TCGA, and measured the expression levels of both protein coding genes and non-coding RNAs (ncRNAs). In particular, we emphasized pseudogenes potentially relevant to this cancer. While cataloguing the profiles based on the known biotypes, all the employed RNA-Seq methods generated just a small consensus of significant biotypes. We thus designed an approach to reconcile the profiles from all methods following a simple strategy: we selected genes that were confirmed as differentially expressed by the ensemble predictions obtained in a regression model. The main advantages of this approach are: 1) Selection of a high-confidence gene set identifying relevant pathways; 2) Use of a regression model whose covariates embed all method-driven outcomes to predict an averaged profile; 3) Method-specific assessment of prediction power and significance. Furthermore, the approach can be generalized to any biological system for which noisy RNA-Seq profiles are computed. As our analyses concerned bio-annotations of both high-quality protein coding genes and ncRNAs, we considered the associations between pseudogenes and parental genes (targets). Among the candidate targets that were validated, we identified PINK1, which is studied in patients with Parkinson and cancer (especially melanoma).Entities:
Mesh:
Year: 2017 PMID: 29229974 PMCID: PMC5725464 DOI: 10.1038/s41598-017-17337-7
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Sankey flow diagram. The outcomes generated by our approach are visualized step-by-step. The first stage indicated at the left side lists the various methods applied to the RNA-Seq data with the corresponding profile, namely the number of DE biotypes (both genes and ncRNAs) detected in each individual application. A total profile of 9,729 DE biotypes is obtained from the sum of all such detections, and in particular Limma (5290) and DESeq (4326) which provided the largest amounts, CuffDiff (3568) and GeneSpring (2758) with smaller amounts, and finally NOISeq as the most conservative methods (511) (Supplementary Tables 5.1.1-5). Two models are then reported, LRM and PCA which deliver respectively 896 and 752 DE biotypes, thus reducing dramatically (more than one order of magnitude) the overall profile. The rest of the diagram accounts for specific biotypes, relatively more represented by protein coding genes (906), antisense (216), lincRNA (84), pseudogenes (136), and with the latter presenting in 129 cases association with related parental genes (145) (Supplementary Fig. 5.2.1). Bioannotations (pathways, gene families) are indicated downstream as the endpoint of the diagram.
Figure 2RNA-Seq flowchart. The samples were processed according to the five different methods that were selected, and whose quantifications included both read counts (simply the number of reads overlapping a given feature such as a gene) and fpkm (Fragments Per Kilobase of exon per Million reads). With the latter it is possible to compare genes of different lengths, and ‘per million reads’ means that a value normalized against the library size is obtained. The consensus occurred between significant detections in both read counts and fpkm scenarios, before reaching a global result (overall consensus). The mutational profile at the right side was implemented under simplified algorithmic conditions (CuffDiff), and for two major mutations (BRAF, NRAS) (Supplementary Tables 10.1-5). The validations refer to candidates coming from all the considered scenarios.
Figure 3Venn diagrams: multiple detection scenarios. Detections refer to all DE biotypes. (A) Overall consensus. (B) Pseudogene-parental genes detections. (C) LRM-driven detections. (D) PCA-driven detections. (E) LRM vs PCA comparison. (F) Biotype classifications in each model. Thresholded coefficients are obtained by boxplot-derived criteria: from 1.5 x IQR for LRM (Supplementary Tables 8.2.5-8), and from most outlying values for PCA (Supplementary Fig. 8.1.1-2).
Figure 4LRM output. Three panels are reported with reference to the performance of LRM: (A) shows a tree-map view of LRM vs consensus DE-space. The comparisons are localized in all regions in which both individual methods and combinations of methods (from 2 to 5) have performed and delivered their detections. LRM is visible always in just a small fraction of space. (B) presents the results screenshot of the LRM script in R. (C) shows comparisons between multiple LRM statistics used to establish the predicted profile, and given the adopted IQR levels (top right inset).
Figure 5Pseudogene-parental gene analyses. (A) Joint profiles of pseudogenes and parental genes from consensus (with significance bars). Insets are biotype associations examples. (B) Scatters at various expression levels, and empirical correlation patterns. (C) Protein network of DE target parental genes with associated at least expressed pseudogenes (Supplementary Fig. 8.6.3 and Supplementary Table 8.6.5). Enlarged networks in Supplementary Fig. 8.6.4.
Pathway terms ranked by methods (left) and LRM. *is for cancer specific terms.
|
|
|
|
|
|
| Cell Adhesion Molecules (6.53E-10)ECM-Receptor Interaction (5.68E-06)Focal Adhesion (1.54E-05)PI3K-Akt Signaling Pathway (2.52E-03)p53 Signaling Pathway (9.31E-03)ECM Organization (3.41E-02) | |
|
|
|
|
| Phagosome (7.42E-12)Complement System (8.09E-11)Protein Digestion and Absorption (6.53E-10)Leishmaniasis (3.59-08)Collagen Formation (1.26E-08)Allograft Rejection (1.24E-07)Viral myocarditis (1.55E-07)Graft-versus-host disease (1.75E-07)Type I diabetes mellitus (3.19-07)Asthma (1.45E-07)Rheumatoid Arthritis (6.46E-06)Cell Cycle (5.68E-06)Autoimmune thyroid disease (4.02E-05)IgA Intestinal Immune Network Production (1.31E-05)Tuberculosis (4.31E-05)Hematopoietic cell lineage (2.02E-04)Pertussis (2.12E-04)Systemic Lupus Erythematosus (2.60E-04)HTLV-I infection (2.88E-03)Spinal Cord Injury (1.19E-03)Inflammatory Bowel Disease (IBD) (2.16E-03)Toxoplasmosis (5.02E-03)Natural killer cell mediated cytotoxicity (6.95E-03)Insulin Processing (2.43E-03)Leukocyte transendothelial migration (1.27E-02) |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
Classification by Model Biotypes.
| Biotypes | DESeq | NOISeq | CuffDiff | Limma | GeneSpring | LRM | PCA |
|---|---|---|---|---|---|---|---|
| 3′ Overlapping ncRNA | 4 | 0 | 1 | 5 | 1 | 0 | 1 |
| Antisense | 412 | 14 | 197 | 646 | 150 | 65 | 157 |
| lincRNA | 349 | 17 | 243 | 574 | 96 | 50 | 41 |
| miRNA | 14 | 0 | 25 | 5 | 11 | 2 | 11 |
| misc_RNA | 6 | 0 | 51 | 12 | 16 | 1 | 1 |
| Parental Genes | 355 | 62 | 431 | 549 | 631 | 115 | 49 |
| Processed Transcript | 213 | 8 | 169 | 302 | 55 | 26 | 36 |
| Protein Coding | 2,811 | 438 | 2,390 | 2,962 | 2,248 | 648 | 418 |
| Pseudogene | 468 | 32 | 250 | 618 | 151 | 85 | 72 |
| Sense Intronic | 25 | 0 | 9 | 107 | 14 | 9 | 1 |
| Sense Overlapping | 16 | 0 | 5 | 26 | 2 | 0 | 0 |
| snRNA | 2 | 0 | 2 | 4 | 2 | 2 | 2 |
| snoRNA | 6 | 0 | 1 | 23 | 12 | 7 | 4 |
|
|
|
|
|
|
|
|
|
Gene Families Associated with Protein Coding Genes.
| Gene Family | DESeq | NOISeq | CuffDiff | Limma | GeneSpring | PCA | LRM |
|---|---|---|---|---|---|---|---|
| Tumor Suppressors | 4 | 0 | 8 | 9 | 6 | 1 | 1 |
| Oncogenes | 52 | 5 | 57 | 65 | 56 | 12 | 14 |
| Translocated Cancer Genes | 48 | 5 | 54 | 59 | 51 | 10 | 14 |
| Protein Kinases | 70 | 7 | 79 | 101 | 82 | 15 | 18 |
| Cell Differentiation Markers | 112 | 27 | 94 | 105 | 45 | 23 | 22 |
| Homeodomain Proteins | 48 | 2 | 18 | 36 | 7 | 4 | 3 |
| Transcription Factors | 179 | 11 | 127 | 176 | 163 | 23 | 16 |
| Cytokines and Growth Factors | 127 | 25 | 83 | 83 | 26 | 23 | 13 |
|
|
|
|
|
|
|
|
|
Notes: number of detections in each cell refers to differentially expressed values. ENSEMBL classification (Table 2); other annotations (Table 3).
Figure 6Validated evidences: from consensus (A) and LRM (B) (details in Supplementary Fig. 9.1 and Supplementary Text 9.2).