| Literature DB >> 28239666 |
Laura Oikkonen1, Stefano Lise2.
Abstract
Identifying variants from RNA-seq (transcriptome sequencing) data is a cost-effective and versatile alternative to whole-genome sequencing. However, current variant callers do not generally behave well with RNA-seq data due to reads encompassing intronic regions. We have developed a software programme called Opossum to address this problem. Opossum pre-processes RNA-seq reads prior to variant calling, and although it has been designed to work specifically with Platypus, it can be used equally well with other variant callers such as GATK HaplotypeCaller. In this work, we show that using Opossum in conjunction with either Platypus or GATK HaplotypeCaller maintains precision and improves the sensitivity for SNP detection compared to the GATK Best Practices pipeline. In addition, using it in combination with Platypus offers a substantial reduction in run times compared to the GATK pipeline so it is ideal when there are only limited time or computational resources available.Entities:
Keywords: RNA-seq; SNP; software tools; variant calling
Year: 2017 PMID: 28239666 PMCID: PMC5322827 DOI: 10.12688/wellcomeopenres.10501.2
Source DB: PubMed Journal: Wellcome Open Res ISSN: 2398-502X
Figure 1. Percentage of error nucleotides at first four positions (left column) and last four positions (right column) in the first strands.
RNA-seq data from GM12878 [11], mapped with TopHat2 v2.0.12.
Precision, sensitivity, and runtimes for the three different variant calling pipelines.
| Mapper | Variant calling pipeline | Runtime | Precision
| Sensitivity
|
|---|---|---|---|---|
| TopHat2 | GATK Best Practices | 11 h 50 min | 97.04 | 90.08 |
| Opossum + GATK
| 13 h 35 min | 97.88 | 92.20 | |
| Opossum + Platypus | 5 h 40 min | 97.33 | 92.96 | |
| Star 2-pass | GATK Best Practices | 14 h 45 min | 96.37 | 88.47 |
| Opossum + GATK
| 15 h 35 min | 96.92 | 89.65 | |
| Opossum + Platypus | 7 h 0 min | 95.23 | 94.07 | |
Figure 2. Precision as a function of the number of supporting bases.
RNA-seq data mapped with TopHat2 v2.0.12. GATK HC stands for GATK HaplotypeCaller v3.4.
Figure 3. Sensitivity as a function of the number of supporting bases.
RNA-seq data mapped with TopHat2 v2.0.12. GATK HC stands for GATK HaplotypeCaller v3.4.