| Literature DB >> 30559433 |
Mitchell R Vollger1, Philip C Dishuck1, Melanie Sorensen1, AnneMarie E Welch1, Vy Dang1, Max L Dougherty1, Tina A Graves-Lindsay2, Richard K Wilson3,4, Mark J P Chaisson5, Evan E Eichler6,7.
Abstract
We have developed a computational method based on polyploid phasing of long sequence reads to resolve collapsed regions of segmental duplications within genome assemblies. Segmental Duplication Assembler (SDA; https://github.com/mvollger/SDA ) constructs graphs in which paralogous sequence variants define the nodes and long-read sequences provide attraction and repulsion edges, enabling the partition and assembly of long reads corresponding to distinct paralogs. We apply it to single-molecule, real-time sequence data from three human genomes and recover 33-79 megabase pairs (Mb) of duplications in which approximately half of the loci are diverged (<99.8%) compared to the reference genome. We show that the corresponding sequence is highly accurate (>99.9%) and that the diverged sequence corresponds to copy-number-variable paralogs that are absent from the human reference genome. Our method can be applied to other complex genomes to resolve the last gene-rich gaps, improve duplicate gene annotation, and better understand copy-number-variant genetic diversity at the base-pair level.Entities:
Mesh:
Year: 2018 PMID: 30559433 PMCID: PMC6382464 DOI: 10.1038/s41592-018-0236-3
Source DB: PubMed Journal: Nat Methods ISSN: 1548-7091 Impact factor: 28.547