| Literature DB >> 31186302 |
Wouter De Coster1,2, Peter De Rijk2,3, Arne De Roeck1,2, Tim De Pooter2,3, Svenn D'Hert2,3, Mojca Strazisar2,3, Kristel Sleegers1,2, Christine Van Broeckhoven1,2.
Abstract
We sequenced the genome of the Yoruban reference individual NA19240 on the long-read sequencing platform Oxford Nanopore PromethION for evaluation and benchmarking of recently published aligners and germline structural variant calling tools, as well as a comparison with the performance of structural variant calling from short-read sequencing data. The structural variant caller Sniffles after NGMLR or minimap2 alignment provides the most accurate results, but additional confidence or sensitivity can be obtained by a combination of multiple variant callers. Sensitive and fast results can be obtained by minimap2 for alignment and a combination of Sniffles and SVIM for variant identification. We describe a scalable workflow for identification, annotation, and characterization of tens of thousands of structural variants from long-read genome sequencing of an individual or population. By discussing the results of this well-characterized reference individual, we provide an approximation of what can be expected in future long-read sequencing studies aiming for structural variant identification.Entities:
Mesh:
Year: 2019 PMID: 31186302 PMCID: PMC6633254 DOI: 10.1101/gr.244939.118
Source DB: PubMed Journal: Genome Res ISSN: 1088-9051 Impact factor: 9.043
Library characteristics
Figure 1.Comparison of PromethION and MinION libraries. (A) Read lengths capped at 100 kb. (B) Percentage of identity after minimap2 alignment to the reference genome. (P) PromethION; (M) MinION; (N) nonsheared/native; (S) sheared before library preparation. Plots were made using NanoPack (De Coster et al. 2018).
Metrics of aligners
Figure 2.Comparison of aligners. (A) Aligned read lengths, plot limited to 100 kb. (B) Read percentage identity compared with the reference genome. Plots were made using NanoPack (De Coster et al. 2018).
Metrics of SV callers
F-measure of aligners and variant callers
Figure 3.Precision-recall comparison. Aligners are tagged with symbols, variant callers with colors.
Accuracy of zygosity of SV callers with their optimal aligner
Figure 4.SV validation status per length SVs identified using Sniffles after NGMLR alignment, compared with truth set. The top panel has SVs up to 2 kb binned per 10 bp; the bottom panel, up to 20 kb, binned per 100 bp with a log-transformed number of variants.
Figure 5.Precision-recall comparison of combined variant sets. Combination of all compatible variant callers per aligner are tagged with plus signs, pairwise combinations of variant callers with dots.
Figure 6.Upset plot of variant calls obtained after alignment using minimap2. The height of the vertical bars indicates the number of variants in this set overlap, as indicated by the colored dots and connecting lines in the bottom panel. The height of the horizontal bars indicates the total number of variants per set.
Figure 7.Precision and recall with parameter variation. (A) Specifying minimally supporting reads. (B) Influence of the median genome coverage after down-sampling to various fractions. Both sets use Sniffles SV calling and minimap2 alignment.
Figure 8.Length profile of SV calls made by Sniffles after minimap2 alignment. The top panel has SVs up to 2 kb; the bottom panel, up to 20 kb with a log-transformed number of variants.