| Literature DB >> 34302453 |
Kristoffer Sahlin1, Veli Mäkinen2.
Abstract
MOTIVATION: Long-read RNA sequencing technologies are establishing themselves as the primary techniques to detect novel isoforms, and many such analyses are dependent on read alignments. However, the error rate and sequencing length of the reads create new challenges for accurately aligning them, particularly around small exons.Entities:
Year: 2021 PMID: 34302453 PMCID: PMC8665758 DOI: 10.1093/bioinformatics/btab540
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.Overview of the uLTRA alignment algorithm. (Step 1) Segments (in color labeled sX, ), parts (in color) and flanks (in grey) are stored and indexed for alignment. Small exons and segments below a threshold (indicated with
Datasets included in evaluation
SIRV genome
| Technology | Dataset | Nr reads | Median read length | Median error rate | Genome | Annotation |
|---|---|---|---|---|---|---|
| Simulated | ENS | 234 207 | 890 | 0.0% | GRCh38.p12 | Gencode v34 |
| Simulated | SIM_ANN | 1 000 000 | 864 | 8.6% | GRCh38.p12 | Gencode v34 |
| Simulated | SIM_NIC | 1 000 000 | 1272 | 8.6% | GRCh38.p12 | Gencode v34 |
| ONT | SIRV | 1 514 274 | 538 | 6.9% | SIRV annotation C_170612a | |
| ONT | DROS | 3 646 342 | 559 | 7.0% | BDGP6.28 | Ensembl v100 |
| Iso-Seq | ALZ | 4 277 293 | 2699 | 1.2% | GRCh38.p12 | Gencode v34 |
Measured from minimap2’s alignments. Due to biological sequence variations, the error rate may be lower than the number presented here.
Includes alternative haplotypes.
Fig. 2.Alignment results on simulated data for the SIM_ANN dataset. (A) Percentage of reads in each respective category. (B) The fraction of correctly aligned exons (y-axis) as a function of exon size (x-axis)
Fig. 3.Number of reads annotated in different splicing categories for DROS (A) and ALZ (B)
Runtime of alignment using four cores
| Dataset | uLTRA | uLTRA_mm2 | minimap2 | minimap2_GTF | deSALT | deSALT_GTF |
|---|---|---|---|---|---|---|
| ENS | 52 min | 1 h 11 min | 43 min | 45 min | 18 min |
|
| SIM_ANN | 2h 47 min | 4h 00 min | 2 h 42 min | 2 h 48 min |
| 1 h 23 min |
| SIM_NIC | 3 h 21 min | 6h 40 min | 4 h 35 min | 4 h 42 min |
| 1 h 55 min |
| SIRV | 35 min | 50 min | 13 min | 13 min |
| 7 min |
| ALZ | 16 h 9 min | 17h 32 min |
| 9 h 47 min | 10 h 16 min | 10h 17 min |
| DROS | 1 h 17 min | 1h 37 min |
| 25 min | 23 min | 23 min |