| Literature DB >> 33575574 |
Wouter De Coster1, Mojca Strazisar1, Peter De Rijk1.
Abstract
Long-read sequencing has substantial advantages for structural variant discovery and phasing of variants compared to short-read technologies, but the required and optimal read length has not been assessed. In this work, we used long reads simulated from human genomes and evaluated structural variant discovery and variant phasing using current best practice bioinformatics methods. We determined that optimal discovery of structural variants from human genomes can be obtained with reads of minimally 20 kb. Haplotyping variants across genes only reaches its optimum from reads of 100 kb. These findings are important for the design of future long-read sequencing projects.Entities:
Year: 2020 PMID: 33575574 PMCID: PMC7671308 DOI: 10.1093/nargab/lqz027
Source DB: PubMed Journal: NAR Genom Bioinform ISSN: 2631-9268
Figure 1.Precision (with and without filtering on duphold annotation), recall and F-measure (y-axis) for SV call sets of simulated reads from the HG00733 assembly with increasing read length (x-axis). Average of triplicate simulations.
Figure 2.Ridge plot showing the distribution of the length of phase blocks with increasing read length simulated from HG00733. The x-axis is the genomic size of phase blocks, and the y-axis shows the length distribution. Datasets are stacked vertically on separate lines.
Figure 3.The fraction of genes entirely contained in a single phase block, reaching a plateau at 100 kb, enabling phasing of variants across the entire gene.