Literature DB >> 33035301

Sensitive alignment using paralogous sequence variants improves long-read mapping and variant calling in segmental duplications.

Timofey Prodanov1, Vikas Bansal2.   

Abstract

The ability to characterize repetitive regions of the human genome is limited by the read lengths of short-read sequencing technologies. Although long-read sequencing technologies such as Pacific Biosciences (PacBio) and Oxford Nanopore Technologies can potentially overcome this limitation, long segmental duplications with high sequence identity pose challenges for long-read mapping. We describe a probabilistic method, DuploMap, designed to improve the accuracy of long-read mapping in segmental duplications. It analyzes reads mapped to segmental duplications using existing long-read aligners and leverages paralogous sequence variants (PSVs)-sequence differences between paralogous sequences-to distinguish between multiple alignment locations. On simulated datasets, DuploMap increased the percentage of correctly mapped reads with high confidence for multiple long-read aligners including Minimap2 (74.3-90.6%) and BLASR (82.9-90.7%) while maintaining high precision. Across multiple whole-genome long-read datasets, DuploMap aligned an additional 8-21% of the reads in segmental duplications with high confidence relative to Minimap2. Using DuploMap-aligned PacBio circular consensus sequencing reads, an additional 8.9 Mb of DNA sequence was mappable, variant calling achieved a higher F1 score and 14 713 additional variants supported by linked-read data were identified. Finally, we demonstrate that a significant fraction of PSVs in segmental duplications overlaps with variants and adversely impacts short-read variant calling.
© The Author(s) 2020. Published by Oxford University Press on behalf of Nucleic Acids Research.

Entities:  

Mesh:

Year:  2020        PMID: 33035301      PMCID: PMC7641771          DOI: 10.1093/nar/gkaa829

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


  51 in total

1.  The UCSC Table Browser data retrieval tool.

Authors:  Donna Karolchik; Angela S Hinrichs; Terrence S Furey; Krishna M Roskin; Charles W Sugnet; David Haussler; W James Kent
Journal:  Nucleic Acids Res       Date:  2004-01-01       Impact factor: 16.971

2.  Complex SNP-related sequence variation in segmental genome duplications.

Authors:  David Fredman; Stefan J White; Susanna Potter; Evan E Eichler; Johan T Den Dunnen; Anthony J Brookes
Journal:  Nat Genet       Date:  2004-07-11       Impact factor: 38.330

3.  Minimap2: pairwise alignment for nucleotide sequences.

Authors:  Heng Li
Journal:  Bioinformatics       Date:  2018-09-15       Impact factor: 6.937

4.  Weighted minimizer sampling improves long read mapping.

Authors:  Chirag Jain; Arang Rhie; Haowen Zhang; Claudia Chu; Brian P Walenz; Sergey Koren; Adam M Phillippy
Journal:  Bioinformatics       Date:  2020-07-01       Impact factor: 6.937

5.  NanoSim: nanopore sequence read simulator based on statistical characterization.

Authors:  Chen Yang; Justin Chu; René L Warren; Inanç Birol
Journal:  Gigascience       Date:  2017-04-01       Impact factor: 6.524

6.  Diversity of human copy number variation and multicopy genes.

Authors:  Peter H Sudmant; Jacob O Kitzman; Francesca Antonacci; Can Alkan; Maika Malig; Anya Tsalenko; Nick Sampas; Laurakay Bruhn; Jay Shendure; Evan E Eichler
Journal:  Science       Date:  2010-10-29       Impact factor: 47.728

7.  Long-read sequence and assembly of segmental duplications.

Authors:  Mitchell R Vollger; Philip C Dishuck; Melanie Sorensen; AnneMarie E Welch; Vy Dang; Max L Dougherty; Tina A Graves-Lindsay; Richard K Wilson; Mark J P Chaisson; Evan E Eichler
Journal:  Nat Methods       Date:  2018-12-17       Impact factor: 28.547

8.  MinION-based long-read sequencing and assembly extends the Caenorhabditis elegans reference genome.

Authors:  John R Tyson; Nigel J O'Neil; Miten Jain; Hugh E Olsen; Philip Hieter; Terrance P Snutch
Journal:  Genome Res       Date:  2017-12-22       Impact factor: 9.043

9.  Nanopore sequencing and assembly of a human genome with ultra-long reads.

Authors:  Miten Jain; Sergey Koren; Karen H Miga; Josh Quick; Arthur C Rand; Thomas A Sasani; John R Tyson; Andrew D Beggs; Alexander T Dilthey; Ian T Fiddes; Sunir Malla; Hannah Marriott; Tom Nieto; Justin O'Grady; Hugh E Olsen; Brent S Pedersen; Arang Rhie; Hollian Richardson; Aaron R Quinlan; Terrance P Snutch; Louise Tee; Benedict Paten; Adam M Phillippy; Jared T Simpson; Nicholas J Loman; Matthew Loose
Journal:  Nat Biotechnol       Date:  2018-01-29       Impact factor: 54.908

10.  Efficient and unique cobarcoding of second-generation sequencing reads from long DNA molecules enabling cost-effective and accurate sequencing, haplotyping, and de novo assembly.

Authors:  Ou Wang; Robert Chin; Xiaofang Cheng; Michelle Ka Yan Wu; Qing Mao; Jingbo Tang; Yuhui Sun; Radoje Drmanac; Brock A Peters; Ellis Anderson; Han K Lam; Dan Chen; Yujun Zhou; Linying Wang; Fei Fan; Yan Zou; Yinlong Xie; Rebecca Yu Zhang; Snezana Drmanac; Darlene Nguyen; Chongjun Xu; Christian Villarosa; Scott Gablenz; Nina Barua; Staci Nguyen; Wenlan Tian; Jia Sophie Liu; Jingwan Wang; Xiao Liu; Xiaojuan Qi; Ao Chen; He Wang; Yuliang Dong; Wenwei Zhang; Andrei Alexeev; Huanming Yang; Jian Wang; Karsten Kristiansen; Xun Xu
Journal:  Genome Res       Date:  2019-04-02       Impact factor: 9.043

View more
  4 in total

1.  Long-read mapping to repetitive reference sequences using Winnowmap2.

Authors:  Chirag Jain; Arang Rhie; Nancy F Hansen; Sergey Koren; Adam M Phillippy
Journal:  Nat Methods       Date:  2022-04-01       Impact factor: 28.547

2.  Robust and accurate estimation of paralog-specific copy number for duplicated genes using whole-genome sequencing.

Authors:  Timofey Prodanov; Vikas Bansal
Journal:  Nat Commun       Date:  2022-06-09       Impact factor: 17.694

3.  Vulcan: Improved long-read mapping and structural variant calling via dual-mode alignment.

Authors:  Yilei Fu; Medhat Mahmoud; Viginesh Vaibhav Muraliraman; Fritz J Sedlazeck; Todd J Treangen
Journal:  Gigascience       Date:  2021-09-24       Impact factor: 6.524

4.  Muconic acid production from glucose and xylose in Pseudomonas putida via evolution and metabolic engineering.

Authors:  Chen Ling; George L Peabody; Davinia Salvachúa; Young-Mo Kim; Colin M Kneucker; Christopher H Calvey; Michela A Monninger; Nathalie Munoz Munoz; Brenton C Poirier; Kelsey J Ramirez; Peter C St John; Sean P Woodworth; Jon K Magnuson; Kristin E Burnum-Johnson; Adam M Guss; Christopher W Johnson; Gregg T Beckham
Journal:  Nat Commun       Date:  2022-08-22       Impact factor: 17.694

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.