| Literature DB >> 33051680 |
Krzysztof J Szkop1,2, David S Moss1, Irene Nobeli1.
Abstract
MOTIVATION: We present flexible Modeling of Alternative PolyAdenylation (flexiMAP), a new beta-regression-based method implemented in R, for discovering differential alternative polyadenylation events in standard RNA-seq data.Entities:
Year: 2021 PMID: 33051680 PMCID: PMC8208744 DOI: 10.1093/bioinformatics/btaa854
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.flexiMAP detects differential polyadenylation events with a good balance of specificity and sensitivity. (A) Receiver operating characteristic (ROC) curves representing the accuracy of detecting differential APA events using flexiMAP, DaPars, APAtrap and Roar. DaPars and APAtrap make their own prediction of polyadenylation sites, not always agreeing with the annotated sites used in this study. To avoid inflating the error rate of these programs by including sites that do not map the annotation (and hence, differential events called at these sites would be automatically considered as false positives), only transcripts where the polyadenylation site was correctly predicted by DaPars and APAtrap are included in this plot. FlexiMAP clearly outperformed DaPars, APAtrap and Roar by perfect specificity and improved sensitivity in this simulated experiment. Although application of the DaPars’ PDUI (Percentage of Distal polyA site Usage Index) post hoc filter (dark blue) and APAtrap’s PD (Percentage Difference) filter (dark red) corrected the false positives problem of these methods, they did so at a heavy cost on sensitivity. (B) Venn diagram showing the overlap of ‘true’ differential polyadenylation events in the MAQC samples PolyA-seq data (as called by DEXSeq; grey) with predictions from all four methods tested here: flexiMAP (orange), DaPars (light blue), APAtrap (pink) and Roar (green). (C) Example from the imbalanced simulated dataset of a situation where a covariate of no interest (in this case, sex) affects the ratio of reads assigned to short and long isoforms. Male samples display much higher expression of the short region of transcript NM_003613 compared with female ones, regardless of the condition group samples belong to. In addition, the dataset is imbalanced, with more males present in condition 1 than condition 2. The mean expression for condition 1 is thus higher than the mean for condition 2, but the effect is due to the covariate sex, not the condition to which the samples belong to. (D) DaPars, APAtrap and Roar report a large number of false positives for an imbalanced simulated dataset. In contrast, flexiMAP reports only one false positive in this case, highlighting its main advantage over alternative approaches