| Literature DB >> 28241736 |
Sarah Sandmann1, Aniek O de Graaf2, Martin Dugas3.
Abstract
BACKGROUND: Deriving valid variant calling results from raw next-generation sequencing data is a particularly challenging task, especially with respect to clinical diagnostics and personalized medicine. However, when using classic variant calling software, the user usually obtains nothing more than a list of variants that pass the corresponding caller's internal filters. Any expected mutations (e.g. hotspot mutations), that have not been called by the software, need to be investigated manually.Entities:
Keywords: Hotspot mutations; Next-generation sequencing; Personalized medicine; Variant calling; Visualization
Mesh:
Year: 2017 PMID: 28241736 PMCID: PMC5330023 DOI: 10.1186/s12859-017-1549-4
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Exemplary output file from real patient data generated by Illumina NextSeq. Relative number of reads at seven positions analyzed in case of sample “Example_Illumina”. Reference bases are plotted at the negative y axis, detected bases in the mapped reads are plotted at the positive y axis (marked 5% threshold). Likely SNV at chr1:115,258,747 (reference C, ∼70% of the reads with high-quality C and ∼30% of the reads with high-quality T). No variant at chr2:25,467,204 (reference G, ∼100% of the reads with high-quality G). Unlikely SNV at chr2:198,267,280 (reference C, ∼95% of the reads with low-quality C, ∼5% of the reads with low-quality A). Likely deletion at chr4:106,157,106 (reference A, ∼75% of the reads with high quality A, ∼25% of the reads with deleted A). Known homozygous SNP at chr17:7,579,472 (reference G, polymorphism C displayed as additional reference base, ∼100% of the reads with high-quality C). Possible insertion of a “G”, but unlikely deletion at chr20:31,022,442 (reference G, ∼97% of the reads with high-quality G, ∼3% of the reads with deleted G, ∼30% of the reads with inserted high-quality G). Likely SNV at chr21:44,514,777 (reference T, ∼65% of the reads with high-quality T, ∼35% of the reads with high-quality G)
Fig. 2Analysis of position chr20:31,022,442 with BBCAnalyzer. Relative number of reads (one bar plot per position; marked 20% threshold): UPN1 and UPN4 feature an inserted G in almost 30% of the reads, while samples UPN2, UPN3 and UPN5 feature no significant difference between the number of reads containing a deletion and the number of reads containing an insertion. Thus, only samples UPN1 and UPN4 are likely to feature the mutation chr20:31,022,441 A >AG
Fig. 3Analysis of position chr1:115,258,744 with BBCAnalyzer. Relative number of reads (one bar plot per position; marked 3% threshold): a: Simulated data. A low-frequency mutation (C >A) can be observed in case of samples SIM1, SIM2 and SIM3, but not in samples SIM4 and SIM5. b: Real data. Similar to the simulated data, the same low-frequency mutation can be observed in case of sample UPN1, but not in samples UPN2-UPN5. Thus, samples UPN1 is likely to feature the mutation chr1:115,258,744 C >A