| Literature DB >> 27324197 |
Francisco Garcia-Garcia1, Joaquin Panadero2, Joaquin Dopazo3, David Montaner1.
Abstract
MOTIVATION: Functional interpretation of miRNA expression data is currently done in a three step procedure: select differentially expressed miRNAs, find their target genes, and carry out gene set overrepresentation analysis Nevertheless, major limitations of this approach have already been described at the gene level, while some newer arise in the miRNA scenario.Here, we propose an enhanced methodology that builds on the well-established gene set analysis paradigm. Evidence for differential expression at the miRNA level is transferred to a gene differential inhibition score which is easily interpretable in terms of gene sets or pathways. Such transferred indexes account for the additive effect of several miRNAs targeting the same gene, and also incorporate cancellation effects between cases and controls. Together, these two desirable characteristics allow for more accurate modeling of regulatory processes.Entities:
Mesh:
Substances:
Year: 2016 PMID: 27324197 PMCID: PMC5018374 DOI: 10.1093/bioinformatics/btw334
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Analyzed datasets
| ID | Total | Cases | Controls | Paired | Description |
|---|---|---|---|---|---|
| BLCA | 271 | 252 | 19 | 19 | Bladder Urothelial Carcinoma |
| BRCA | 807 | 720 | 87 | 86 | Breast invasive carcinoma |
| CESC | 218 | 215 | 3 | 3 | Cervical squamous cell carcinoma |
| COAD | 243 | 235 | 8 | 0 | Colon adenocarcinoma |
| ESCA | 113 | 102 | 11 | 11 | Esophageal carcinoma |
| HNSC | 519 | 475 | 44 | 43 | Head and Neck squamous cell carcinoma |
| KICH | 91 | 66 | 25 | 25 | Kidney Chromophobe |
| KIRC | 311 | 240 | 71 | 68 | Kidney renal clear cell carcinoma |
| KIRP | 245 | 211 | 34 | 34 | Kidney renal papillary cell carcinoma |
| LIHC | 283 | 233 | 50 | 49 | Liver hepatocellular carcinoma |
| LUAD | 474 | 428 | 46 | 39 | Lung adenocarcinoma |
| LUSC | 376 | 331 | 45 | 45 | Lung squamous cell carcinoma |
| PAAD | 100 | 96 | 4 | 4 | Pancreatic adenocarcinoma |
| PCPG | 182 | 179 | 3 | 3 | Pheochromocytoma and Paraganglioma |
| PRAD | 117 | 100 | 17 | 17 | Prostate adenocarcinoma |
| READ | 93 | 90 | 3 | 0 | Rectum adenocarcinoma |
| SKCM | 75 | 74 | 1 | 0 | Skin Cutaneous Melanoma |
| STAD | 345 | 306 | 39 | 39 | Stomach adenocarcinoma |
| THCA | 558 | 499 | 59 | 59 | Thyroid carcinoma |
| UCEC | 418 | 386 | 32 | 19 | Uterine Corpus Endometrial Carcinoma |
Columns of the table display: TCGA disease ID, the total number of samples in the analysis, the number of tumoral samples, the number of control samples (solid normal tissue), the number of paired samples available in the dataset and the cancer type.
Fig. 1.Interpretation of the differential expression statistic at miRNA level and the transferred index at gene level
Fig. 2.Interpretation of the logistic regression model slope parameter in terms of genes and gene sets
Fig. 3.Example diagram of the analysis steps for the neurofilament cytoskeleton GO term (GO:0060053). Plot (A) represents the distribution of the ranking index computed as described in Equation 1. The white box shows the distribution for all miRNAs in the study. In our case, positive values belong to those miRNAs more expressed in tumors while the negative relate to miRNAs more expressed in controls. Each of the colored boxes represents the same index, but just for the subset of miRNAs targeting one gene in the GO. Plot (B) represents the gene transferred index introduced in Equation 2. For each of the genes in the GO term all miRNA level indexes are added up into a unique value. Each of the dots in plot B represents the gene level transferred index computed from the microRNAs represented in the boxplot underneath (plot A). Plot (C) displays the distribution of the transferred index for the whole genome (left box) and for the genes within the neurofilament cytoskeleton GO term (right box and dots). Here, we can appreciate how the overall distribution of the genes in the GO term is higher that the basal distribution of all genes. The logistic regression model spots this pattern and reports the GO term as enriched in tumor samples, meaning that the neurofilament cytoskeleton cellular component is more intercepted by miRNA action in cases than in controls
Number of up, down and not differentially regulated miRNAS in each cancer type
| Unpaired | Paired | |||||
|---|---|---|---|---|---|---|
| ID | Down | noDif | Up | Down | noDif | Up |
| BLCA | 128 | 337 | 353 | 127 | 343 | 219 |
| BRCA | 200 | 244 | 396 | 202 | 215 | 269 |
| CESC | 92 | 621 | 73 | 29 | 537 | 65 |
| COAD | 174 | 291 | 262 | |||
| ESCA | 98 | 443 | 152 | 62 | 464 | 133 |
| HNSC | 204 | 285 | 360 | 164 | 305 | 222 |
| KICH | 166 | 297 | 199 | 217 | 252 | 169 |
| KIRC | 169 | 191 | 323 | 213 | 180 | 215 |
| KIRP | 221 | 262 | 295 | 223 | 242 | 237 |
| LIHC | 120 | 278 | 407 | 200 | 283 | 213 |
| LUAD | 152 | 292 | 405 | 130 | 264 | 259 |
| LUSC | 169 | 215 | 462 | 180 | 313 | 244 |
| PAAD | 23 | 607 | 11 | 8 | 606 | 14 |
| PCPG | 70 | 608 | 43 | 40 | 507 | 55 |
| PRAD | 76 | 429 | 104 | 38 | 513 | 31 |
| READ | 136 | 307 | 204 | |||
| SKCM | 46 | 680 | 6 | |||
| STAD | 152 | 308 | 356 | 138 | 307 | 206 |
| THCA | 218 | 351 | 257 | 226 | 347 | 145 |
| UCEC | 243 | 284 | 347 | 211 | 272 | 229 |
Number of genes targeted by the up and down regulated miRNAS
| Unpaired | Paired | |||||
|---|---|---|---|---|---|---|
| ID | Down | Common | Up | Down | Common | Up |
| BLCA | 8345 | 6763 | 8599 | 8087 | 5955 | 7528 |
| BRCA | 8968 | 7700 | 9465 | 9305 | 7724 | 9001 |
| CESC | 7834 | 5201 | 6525 | 4877 | 3178 | 5431 |
| COAD | 6981 | 6418 | 9998 | |||
| ESCA | 7992 | 5646 | 6959 | 8233 | 5207 | 6212 |
| HNSC | 9090 | 7496 | 8976 | 9065 | 7006 | 8013 |
| KICH | 8998 | 7044 | 8252 | 9594 | 7125 | 7902 |
| KIRC | 8838 | 7351 | 9056 | 9575 | 7543 | 8681 |
| KIRP | 9169 | 7388 | 8629 | 9311 | 7025 | 8267 |
| LIHC | 7466 | 6848 | 9560 | 8896 | 6851 | 7720 |
| LUAD | 8255 | 7354 | 9898 | 8150 | 6843 | 8848 |
| LUSC | 8535 | 7265 | 9447 | 8844 | 6710 | 8166 |
| PAAD | 3759 | 616 | 1169 | 1529 | 442 | 1748 |
| PCPG | 6303 | 4033 | 5295 | 4102 | 3110 | 5652 |
| PRAD | 7422 | 5932 | 8039 | 4997 | 1600 | 2374 |
| READ | 6938 | 6225 | 9672 | |||
| SKCM | 5983 | 631 | 857 | |||
| STAD | 8921 | 6761 | 8041 | 8947 | 6731 | 7855 |
| THCA | 8763 | 7244 | 8702 | 9064 | 7065 | 8056 |
| UCEC | 9182 | 7171 | 8436 | 9338 | 7069 | 8201 |
The Common column shows the number of genes which are targets of both, the up and down regulated miRNAs. The total number of genes which are targets of at least one miRNA is 12084.
Number of GO terms associated with the genes targeted by the up and down regulated miRNAs
| Unpaired | Paired | |||||
|---|---|---|---|---|---|---|
| ID | Down | Common | Up | Down | Common | Up |
| BLCA | 5169 | 5169 | 5169 | 5169 | 5168 | 5168 |
| BRCA | 5169 | 5169 | 5169 | 5169 | 5169 | 5169 |
| CESC | 5169 | 5168 | 5168 | 5144 | 5138 | 5160 |
| COAD | 5168 | 5168 | 5169 | |||
| ESCA | 5169 | 5168 | 5168 | 5169 | 5167 | 5167 |
| HNSC | 5169 | 5169 | 5169 | 5169 | 5169 | 5169 |
| KICH | 5169 | 5169 | 5169 | 5169 | 5169 | 5169 |
| KIRC | 5169 | 5169 | 5169 | 5169 | 5169 | 5169 |
| KIRP | 5169 | 5169 | 5169 | 5169 | 5169 | 5169 |
| LIHC | 5169 | 5169 | 5169 | 5169 | 5169 | 5169 |
| LUAD | 5169 | 5169 | 5169 | 5169 | 5169 | 5169 |
| LUSC | 5169 | 5169 | 5169 | 5169 | 5169 | 5169 |
| PAAD | 5129 | 4578 | 4590 | 4870 | 4681 | 4915 |
| PCPG | 5166 | 5161 | 5164 | 5150 | 5146 | 5165 |
| PRAD | 5169 | 5169 | 5169 | 5159 | 4981 | 4990 |
| READ | 5168 | 5168 | 5169 | |||
| SKCM | 5169 | 4385 | 4385 | |||
| STAD | 5169 | 5169 | 5169 | 5169 | 5169 | 5169 |
| THCA | 5169 | 5169 | 5169 | 5169 | 5169 | 5169 |
| UCEC | 5169 | 5169 | 5169 | 5169 | 5169 | 5169 |
Most GO terms are targeted in cases and controls at the same time as it can be seen in the Common column. The total number of GO terms annotated for the targeted genes is 5169.
Number significant GO terms in the functional profiling analysis for the paired and unpaired comparisons
| Unpaired | Paired | |||||
|---|---|---|---|---|---|---|
| ID | Derg. | noDif | Inh. | Derg. | noDif | Inh. |
| BLCA | 2 | 5167 | 0 | 2 | 5167 | 0 |
| BRCA | 3 | 5166 | 0 | 0 | 5167 | 2 |
| CESC | 0 | 5169 | 0 | 1 | 5167 | 1 |
| COAD | 18 | 4930 | 221 | |||
| ESCA | 2 | 5167 | 0 | 1 | 5168 | 0 |
| HNSC | 53 | 5116 | 0 | 0 | 5169 | 0 |
| KICH | 1 | 5167 | 1 | 30 | 5138 | 1 |
| KIRC | 0 | 5159 | 10 | 5 | 5163 | 1 |
| KIRP | 4 | 5165 | 0 | 13 | 5155 | 1 |
| LIHC | 7 | 5080 | 82 | 0 | 5169 | 0 |
| LUAD | 0 | 5169 | 0 | 0 | 5169 | 0 |
| LUSC | 0 | 5169 | 0 | 0 | 5169 | 0 |
| PAAD | 3 | 5165 | 1 | 0 | 5169 | 0 |
| PCPG | 0 | 5169 | 0 | 0 | 5166 | 3 |
| PRAD | 0 | 5168 | 1 | 1 | 5168 | 0 |
| READ | 0 | 5157 | 12 | |||
| SKCM | 121 | 5043 | 5 | |||
| STAD | 5 | 5164 | 0 | 0 | 5169 | 0 |
| THCA | 2 | 5167 | 0 | 2 | 5167 | 0 |
| UCEC | 89 | 5080 | 0 | 9 | 5160 | 0 |
Columns Inh. indicates the number of terms with a positive α coefficient in the logistic regression analysis. Those are the terms inhibited or intercepted in cases. Columns Derg. indicates the number of terms with a negative α value. Those are the terms inhibited in controls or deregulated in cases. Columns noDif indicate the number of GOs with a not significant slope coefficient.