| Literature DB >> 35664231 |
Alessia Buratin1,2, Chiara Romualdi2, Stefania Bortoluzzi1,3, Enrico Gaffo1.
Abstract
Finding differentially expressed circular RNAs (circRNAs) is instrumental to understanding the molecular basis of phenotypic variation between conditions linked to circRNA-involving mechanisms. To date, several methods have been developed to identify circRNAs, and combining multiple tools is becoming an established approach to improve the detection rate and robustness of results in circRNA studies. However, when using a consensus strategy, it is unclear how circRNA expression estimates should be considered and integrated into downstream analysis, such as differential expression assessment. This work presents a novel solution to test circRNA differential expression using quantifications of multiple algorithms simultaneously. Our approach analyzes multiple tools' circRNA abundance count data within a single framework by leveraging generalized linear mixed models (GLMM), which account for the sample correlation structure within and between the quantification tools. We compared the GLMM approach with three widely used differential expression models, showing its higher sensitivity in detecting and efficiently ranking significant differentially expressed circRNAs. Our strategy is the first to consider combined estimates of multiple circRNA quantification methods, and we propose it as a powerful model to improve circRNA differential expression analysis.Entities:
Keywords: AUC, Area under the ROC curve; Circular RNAs; DECs, Differentially Expressed circRNAs; DEMs, Differential Expression Models; Differential expression; FDR, False Discovery Rate; GLMM, Generalized Linear Mixed Model; Generalized linear mixed models; RNA-seq; RNAseq, RNA sequencing; TPR, True Positive Rate; circRNAs; circRNAs, circular RNAs
Year: 2022 PMID: 35664231 PMCID: PMC9136258 DOI: 10.1016/j.csbj.2022.05.026
Source DB: PubMed Journal: Comput Struct Biotechnol J ISSN: 2001-0370 Impact factor: 6.155
Fig. 1Sample correlation between and within circRNAs quantification methods. The circRNA expression was estimated independently with four quantification methods, and the pairwise Spearman’s correlations were calculated between different (Inter, red boxes) or the same (Intra, blue boxes) quantification method circRNA expression estimates in real RNA-seq datasets of three independent human circRNA studies (x-axis). ALZ: brain tissue, Alzheimer's disease; DM1: skeletal muscle tissue, Myotonic Dystrophy Type 1; IPF: lung tissue, Idiopathic Pulmonary Fibrosis. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
Summary of simulation results. Average Area under ROC curve (AUC), True Positive Rate (TPR), Sensitivity and False Discovery Rate (FDR) across 30 simulated datasets from ALZ data are reported for each Differential Expression Method (DEM).
| DEM | TPR | FDR | AUC |
|---|---|---|---|
| DESeq2 | 0.416 | 0.043 | 0.944 |
| edgeR | 0.591 | 0.066 | 0.785 |
| limma-voom | 0.614 | 0.101 | 0.796 |
| GLMM | 0.683 | 0.016 | 0.998 |
Fig. 2Type I error control. Boxplots of the proportion of tests with P-values lower than nominal αs (0.05 and 0.1) in 30 mock datasets obtained from ALZ data samples. The dashed red line indicates 0.01, 0.05, and 0.1 type I error thresholds. The y-axis was squared-root scaled to improve visibility. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
Fig. 3Sensitivity and Precision of real-data benchmark. Sensitivity (a) and precision (b) of each algorithm (in the horizontal axis) when holding one algorithm’s predictions on the verification set as the reference truths (in the facet strip labels) and comparing them with the algorithms’ predictions on the evaluation sets.
Fig. 4Consistency and replicability of models. (a) Average between-method (non-diagonal cells) and within-method (main diagonal cells) concordance for the 100 top-ranked calls DEMs in replicated ALZ and IPF datasets. (b) Boxplot of the within-method concordance (WMC) on ALZ and IPF data. The plot illustrates that the concordances are different by comparing each other with the GLMM model in terms of WMC. P-value denotes the result from a pairwise t-test. CAT: concordance at the 100 top differentially expressed circRNAs.
Fig. 5Ranking of the methods based on three evaluation criteria. The type I error ranking was based on the analysis of the 30 mock comparisons from ALZ datasets; the within method concordance (WMC) was based on the average WMC values across the 30 random subset comparisons for each of the two datasets used (ALZ and IPF); the power analysis ranking was based on the tumor vs. normal ALZ and IPF dataset evaluations. The ranks range from 1 (best) to 4 (worst) with lower rank values corresponding to better performances.