Literature DB >> 28808695

Resolving multicopy duplications de novo using polyploid phasing.

Mark J Chaisson1, Sudipto Mukherjee2, Sreeram Kannan2, Evan E Eichler1.   

Abstract

While the rise of single-molecule sequencing systems has enabled an unprecedented rise in the ability to assemble complex regions of the genome, long segmental duplications in the genome still remain a challenging frontier in assembly. Segmental duplications are at the same time both gene rich and prone to large structural rearrangements, making the resolution of their sequences important in medical and evolutionary studies. Duplicated sequences that are collapsed in mammalian de novo assemblies are rarely identical; after a sequence is duplicated, it begins to acquire paralog specific variants. In this paper, we study the problem of resolving the variations in multicopy long-segmental duplications by developing and utilizing algorithms for polyploid phasing. We develop two algorithms: the first one is targeted at maximizing the likelihood of observing the reads given the underlying haplotypes using discrete matrix completion. The second algorithm is based on correlation clustering and exploits an assumption, which is often satisfied in these duplications, that each paralog has a sizable number of paralog-specific variants. We develop a detailed simulation methodology, and demonstrate the superior performance of the proposed algorithms on an array of simulated datasets. We measure the likelihood score as well as reconstruction accuracy, i.e., what fraction of the reads are clustered correctly. In both the performance metrics, we find that our algorithms dominate existing algorithms on more than 93% of the datasets. While the discrete matrix completion performs better on likelihood score, the correlation clustering algorithm performs better on reconstruction accuracy due to the stronger regularization inherent in the algorithm. We also show that our correlation-clustering algorithm can reconstruct on an average 7.0 haplotypes in 10-copy duplication data-sets whereas existing algorithms reconstruct less than 1 copy on average.

Entities:  

Year:  2017        PMID: 28808695      PMCID: PMC5553120          DOI: 10.1007/978-3-319-56970-3_8

Source DB:  PubMed          Journal:  Res Comput Mol Biol


  18 in total

Review 1.  Genome architecture, rearrangements and genomic disorders.

Authors:  Paweł Stankiewicz; James R Lupski
Journal:  Trends Genet       Date:  2002-02       Impact factor: 11.639

Review 2.  Recent duplication, domain accretion and the dynamic mutation of the human genome.

Authors:  E E Eichler
Journal:  Trends Genet       Date:  2001-11       Impact factor: 11.639

3.  Bayesian k-Means as a "maximization-expectation" algorithm.

Authors:  Kenichi Kurihara; Max Welling
Journal:  Neural Comput       Date:  2009-04       Impact factor: 2.026

4.  HapCUT: an efficient and accurate algorithm for the haplotype assembly problem.

Authors:  Vikas Bansal; Vineet Bafna
Journal:  Bioinformatics       Date:  2008-08-15       Impact factor: 6.937

5.  Assembling large genomes with single-molecule sequencing and locality-sensitive hashing.

Authors:  Konstantin Berlin; Sergey Koren; Chen-Shan Chin; James P Drake; Jane M Landolin; Adam M Phillippy
Journal:  Nat Biotechnol       Date:  2015-05-25       Impact factor: 54.908

6.  De novo assembly and phasing of a Korean human genome.

Authors:  Jeong-Sun Seo; Arang Rhie; Junsoo Kim; Sangjin Lee; Min-Hwan Sohn; Chang-Uk Kim; Alex Hastie; Han Cao; Ji-Young Yun; Jihye Kim; Junho Kuk; Gun Hwa Park; Juhyeok Kim; Hanna Ryu; Jongbum Kim; Mira Roh; Jeonghun Baek; Michael W Hunkapiller; Jonas Korlach; Jong-Yeon Shin; Changhoon Kim
Journal:  Nature       Date:  2016-10-05       Impact factor: 49.962

7.  Ancestral reconstruction of segmental duplications reveals punctuated cores of human genome evolution.

Authors:  Zhaoshi Jiang; Haixu Tang; Mario Ventura; Maria Francesca Cardone; Tomas Marques-Bonet; Xinwei She; Pavel A Pevzner; Evan E Eichler
Journal:  Nat Genet       Date:  2007-10-07       Impact factor: 38.330

8.  Evolution of human-specific neural SRGAP2 genes by incomplete segmental duplication.

Authors:  Megan Y Dennis; Xander Nuttle; Peter H Sudmant; Francesca Antonacci; Tina A Graves; Mikhail Nefedov; Jill A Rosenfeld; Saba Sajjadian; Maika Malig; Holland Kotkiewicz; Cynthia J Curry; Susan Shafer; Lisa G Shaffer; Pieter J de Jong; Richard K Wilson; Evan E Eichler
Journal:  Cell       Date:  2012-05-03       Impact factor: 41.582

9.  HapTree: a novel Bayesian framework for single individual polyplotyping using NGS data.

Authors:  Emily Berger; Deniz Yorukoglu; Jian Peng; Bonnie Berger
Journal:  PLoS Comput Biol       Date:  2014-03-27       Impact factor: 4.475

10.  Structural forms of the human amylase locus and their relationships to SNPs, haplotypes and obesity.

Authors:  Christina L Usher; Robert E Handsaker; Tõnu Esko; Marcus A Tuke; Michael N Weedon; Alex R Hastie; Han Cao; Jennifer E Moon; Seva Kashin; Christian Fuchsberger; Andres Metspalu; Carlos N Pato; Michele T Pato; Mark I McCarthy; Michael Boehnke; David M Altshuler; Timothy M Frayling; Joel N Hirschhorn; Steven A McCarroll
Journal:  Nat Genet       Date:  2015-06-22       Impact factor: 38.330

View more
  10 in total

1.  lordFAST: sensitive and Fast Alignment Search Tool for LOng noisy Read sequencing Data.

Authors:  Ehsan Haghshenas; S Cenk Sahinalp; Faraz Hach
Journal:  Bioinformatics       Date:  2019-01-01       Impact factor: 6.937

2.  Sequence diversity analyses of an improved rhesus macaque genome enhance its biomedical utility.

Authors:  Wesley C Warren; R Alan Harris; Marina Haukness; Ian T Fiddes; Shwetha C Murali; Jason Fernandes; Philip C Dishuck; Jessica M Storer; Muthuswamy Raveendran; LaDeana W Hillier; David Porubsky; Yafei Mao; David Gordon; Mitchell R Vollger; Alexandra P Lewis; Katherine M Munson; Elizabeth DeVogelaere; Joel Armstrong; Mark Diekhans; Jerilyn A Walker; Chad Tomlinson; Tina A Graves-Lindsay; Milinn Kremitzki; Sofie R Salama; Peter A Audano; Merly Escalona; Nicholas W Maurer; Francesca Antonacci; Ludovica Mercuri; Flavia A M Maggiolini; Claudia Rita Catacchio; Jason G Underwood; David H O'Connor; Ashley D Sanders; Jan O Korbel; Betsy Ferguson; H Michael Kubisch; Louis Picker; Ned H Kalin; Douglas Rosene; Jon Levine; David H Abbott; Stanton B Gray; Mar M Sanchez; Zsofia A Kovacs-Balint; Joseph W Kemnitz; Sara M Thomasy; Jeffrey A Roberts; Erin L Kinnally; John P Capitanio; J H Pate Skene; Michael Platt; Shelley A Cole; Richard E Green; Mario Ventura; Roger W Wiseman; Benedict Paten; Mark A Batzer; Jeffrey Rogers; Evan E Eichler
Journal:  Science       Date:  2020-12-18       Impact factor: 47.728

3.  QAlign: aligning nanopore reads accurately using current-level modeling.

Authors:  Dhaivat Joshi; Shunfu Mao; Sreeram Kannan; Suhas Diggavi
Journal:  Bioinformatics       Date:  2021-05-05       Impact factor: 6.937

Review 4.  Recurrent de novo mutations in neurodevelopmental disorders: properties and clinical implications.

Authors:  Amy B Wilfert; Arvis Sulovari; Tychele N Turner; Bradley P Coe; Evan E Eichler
Journal:  Genome Med       Date:  2017-11-27       Impact factor: 11.117

5.  ComHapDet: a spatial community detection algorithm for haplotype assembly.

Authors:  Abishek Sankararaman; Haris Vikalo; François Baccelli
Journal:  BMC Genomics       Date:  2020-09-09       Impact factor: 3.969

6.  Multi-platform discovery of haplotype-resolved structural variation in human genomes.

Authors:  Mark J P Chaisson; Ashley D Sanders; Xuefang Zhao; Ankit Malhotra; David Porubsky; Tobias Rausch; Eugene J Gardner; Oscar L Rodriguez; Li Guo; Ryan L Collins; Xian Fan; Jia Wen; Robert E Handsaker; Susan Fairley; Zev N Kronenberg; Xiangmeng Kong; Fereydoun Hormozdiari; Dillon Lee; Aaron M Wenger; Alex R Hastie; Danny Antaki; Thomas Anantharaman; Peter A Audano; Harrison Brand; Stuart Cantsilieris; Han Cao; Eliza Cerveira; Chong Chen; Xintong Chen; Chen-Shan Chin; Zechen Chong; Nelson T Chuang; Christine C Lambert; Deanna M Church; Laura Clarke; Andrew Farrell; Joey Flores; Timur Galeev; David U Gorkin; Madhusudan Gujral; Victor Guryev; William Haynes Heaton; Jonas Korlach; Sushant Kumar; Jee Young Kwon; Ernest T Lam; Jong Eun Lee; Joyce Lee; Wan-Ping Lee; Sau Peng Lee; Shantao Li; Patrick Marks; Karine Viaud-Martinez; Sascha Meiers; Katherine M Munson; Fabio C P Navarro; Bradley J Nelson; Conor Nodzak; Amina Noor; Sofia Kyriazopoulou-Panagiotopoulou; Andy W C Pang; Yunjiang Qiu; Gabriel Rosanio; Mallory Ryan; Adrian Stütz; Diana C J Spierings; Alistair Ward; AnneMarie E Welch; Ming Xiao; Wei Xu; Chengsheng Zhang; Qihui Zhu; Xiangqun Zheng-Bradley; Ernesto Lowy; Sergei Yakneen; Steven McCarroll; Goo Jun; Li Ding; Chong Lek Koh; Bing Ren; Paul Flicek; Ken Chen; Mark B Gerstein; Pui-Yan Kwok; Peter M Lansdorp; Gabor T Marth; Jonathan Sebat; Xinghua Shi; Ali Bashir; Kai Ye; Scott E Devine; Michael E Talkowski; Ryan E Mills; Tobias Marschall; Jan O Korbel; Evan E Eichler; Charles Lee
Journal:  Nat Commun       Date:  2019-04-16       Impact factor: 17.694

7.  Longshot enables accurate variant calling in diploid genomes from single-molecule long read sequencing.

Authors:  Peter Edge; Vikas Bansal
Journal:  Nat Commun       Date:  2019-10-11       Impact factor: 14.919

8.  Rapid, Paralog-Sensitive CNV Analysis of 2457 Human Genomes Using QuicK-mer2.

Authors:  Feichen Shen; Jeffrey M Kidd
Journal:  Genes (Basel)       Date:  2020-01-29       Impact factor: 4.096

9.  De novo assembly of haplotype-resolved genomes with trio binning.

Authors:  Sergey Koren; Arang Rhie; Brian P Walenz; Alexander T Dilthey; Derek M Bickhart; Sarah B Kingan; Stefan Hiendleder; John L Williams; Timothy P L Smith; Adam M Phillippy
Journal:  Nat Biotechnol       Date:  2018-10-22       Impact factor: 54.908

10.  Haplotype threading: accurate polyploid phasing from long reads.

Authors:  Sven D Schrinner; Rebecca Serra Mari; Jana Ebler; Mikko Rautiainen; Lancelot Seillier; Julia J Reimer; Björn Usadel; Tobias Marschall; Gunnar W Klau
Journal:  Genome Biol       Date:  2020-09-21       Impact factor: 13.583

  10 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.