Literature DB >> 30124947

Hercules: a profile HMM-based hybrid error correction algorithm for long reads.

Can Firtina1, Ziv Bar-Joseph2, Can Alkan1, A Ercument Cicek1,2.   

Abstract

Choosing whether to use second or third generation sequencing platforms can lead to trade-offs between accuracy and read length. Several types of studies require long and accurate reads. In such cases researchers often combine both technologies and the erroneous long reads are corrected using the short reads. Current approaches rely on various graph or alignment based techniques and do not take the error profile of the underlying technology into account. Efficient machine learning algorithms that address these shortcomings have the potential to achieve more accurate integration of these two technologies. We propose Hercules, the first machine learning-based long read error correction algorithm. Hercules models every long read as a profile Hidden Markov Model with respect to the underlying platform's error profile. The algorithm learns a posterior transition/emission probability distribution for each long read to correct errors in these reads. We show on two DNA-seq BAC clones (CH17-157L1 and CH17-227A2) that Hercules-corrected reads have the highest mapping rate among all competing algorithms and have the highest accuracy when the breadth of coverage is high. On a large human CHM1 cell line WGS data set, Hercules is one of the few scalable algorithms; and among those, it achieves the highest accuracy.

Entities:  

Mesh:

Year:  2018        PMID: 30124947      PMCID: PMC6265270          DOI: 10.1093/nar/gky724

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


  33 in total

1.  Velvet: algorithms for de novo short read assembly using de Bruijn graphs.

Authors:  Daniel R Zerbino; Ewan Birney
Journal:  Genome Res       Date:  2008-03-18       Impact factor: 9.043

Review 2.  Profile hidden Markov models.

Authors:  S R Eddy
Journal:  Bioinformatics       Date:  1998       Impact factor: 6.937

3.  proovread: large-scale high-accuracy PacBio correction through iterative short read consensus.

Authors:  Thomas Hackl; Rainer Hedrich; Jörg Schultz; Frank Förster
Journal:  Bioinformatics       Date:  2014-07-10       Impact factor: 6.937

4.  On genomic repeats and reproducibility.

Authors:  Can Firtina; Can Alkan
Journal:  Bioinformatics       Date:  2016-03-11       Impact factor: 6.937

5.  CoLoRMap: Correcting Long Reads by Mapping short reads.

Authors:  Ehsan Haghshenas; Faraz Hach; S Cenk Sahinalp; Cedric Chauve
Journal:  Bioinformatics       Date:  2016-09-01       Impact factor: 6.937

6.  Reconstructing complex regions of genomes using long-read sequencing technology.

Authors:  John Huddleston; Swati Ranade; Maika Malig; Francesca Antonacci; Mark Chaisson; Lawrence Hon; Peter H Sudmant; Tina A Graves; Can Alkan; Megan Y Dennis; Richard K Wilson; Stephen W Turner; Jonas Korlach; Evan E Eichler
Journal:  Genome Res       Date:  2014-01-13       Impact factor: 9.043

7.  Accurate self-correction of errors in long reads using de Bruijn graphs.

Authors:  Leena Salmela; Riku Walve; Eric Rivals; Esko Ukkonen
Journal:  Bioinformatics       Date:  2017-03-15       Impact factor: 6.937

8.  HALC: High throughput algorithm for long read error correction.

Authors:  Ergude Bao; Lingxiao Lan
Journal:  BMC Bioinformatics       Date:  2017-04-05       Impact factor: 3.169

9.  Accurate whole human genome sequencing using reversible terminator chemistry.

Authors:  David R Bentley; Shankar Balasubramanian; Harold P Swerdlow; Geoffrey P Smith; John Milton; Clive G Brown; Kevin P Hall; Dirk J Evers; Colin L Barnes; Helen R Bignell; Jonathan M Boutell; Jason Bryant; Richard J Carter; R Keira Cheetham; Anthony J Cox; Darren J Ellis; Michael R Flatbush; Niall A Gormley; Sean J Humphray; Leslie J Irving; Mirian S Karbelashvili; Scott M Kirk; Heng Li; Xiaohai Liu; Klaus S Maisinger; Lisa J Murray; Bojan Obradovic; Tobias Ost; Michael L Parkinson; Mark R Pratt; Isabelle M J Rasolonjatovo; Mark T Reed; Roberto Rigatti; Chiara Rodighiero; Mark T Ross; Andrea Sabot; Subramanian V Sankar; Aylwyn Scally; Gary P Schroth; Mark E Smith; Vincent P Smith; Anastassia Spiridou; Peta E Torrance; Svilen S Tzonev; Eric H Vermaas; Klaudia Walter; Xiaolin Wu; Lu Zhang; Mohammed D Alam; Carole Anastasi; Ify C Aniebo; David M D Bailey; Iain R Bancarz; Saibal Banerjee; Selena G Barbour; Primo A Baybayan; Vincent A Benoit; Kevin F Benson; Claire Bevis; Phillip J Black; Asha Boodhun; Joe S Brennan; John A Bridgham; Rob C Brown; Andrew A Brown; Dale H Buermann; Abass A Bundu; James C Burrows; Nigel P Carter; Nestor Castillo; Maria Chiara E Catenazzi; Simon Chang; R Neil Cooley; Natasha R Crake; Olubunmi O Dada; Konstantinos D Diakoumakos; Belen Dominguez-Fernandez; David J Earnshaw; Ugonna C Egbujor; David W Elmore; Sergey S Etchin; Mark R Ewan; Milan Fedurco; Louise J Fraser; Karin V Fuentes Fajardo; W Scott Furey; David George; Kimberley J Gietzen; Colin P Goddard; George S Golda; Philip A Granieri; David E Green; David L Gustafson; Nancy F Hansen; Kevin Harnish; Christian D Haudenschild; Narinder I Heyer; Matthew M Hims; Johnny T Ho; Adrian M Horgan; Katya Hoschler; Steve Hurwitz; Denis V Ivanov; Maria Q Johnson; Terena James; T A Huw Jones; Gyoung-Dong Kang; Tzvetana H Kerelska; Alan D Kersey; Irina Khrebtukova; Alex P Kindwall; Zoya Kingsbury; Paula I Kokko-Gonzales; Anil Kumar; Marc A Laurent; Cynthia T Lawley; Sarah E Lee; Xavier Lee; Arnold K Liao; Jennifer A Loch; Mitch Lok; Shujun Luo; Radhika M Mammen; John W Martin; Patrick G McCauley; Paul McNitt; Parul Mehta; Keith W Moon; Joe W Mullens; Taksina Newington; Zemin Ning; Bee Ling Ng; Sonia M Novo; Michael J O'Neill; Mark A Osborne; Andrew Osnowski; Omead Ostadan; Lambros L Paraschos; Lea Pickering; Andrew C Pike; Alger C Pike; D Chris Pinkard; Daniel P Pliskin; Joe Podhasky; Victor J Quijano; Come Raczy; Vicki H Rae; Stephen R Rawlings; Ana Chiva Rodriguez; Phyllida M Roe; John Rogers; Maria C Rogert Bacigalupo; Nikolai Romanov; Anthony Romieu; Rithy K Roth; Natalie J Rourke; Silke T Ruediger; Eli Rusman; Raquel M Sanches-Kuiper; Martin R Schenker; Josefina M Seoane; Richard J Shaw; Mitch K Shiver; Steven W Short; Ning L Sizto; Johannes P Sluis; Melanie A Smith; Jean Ernest Sohna Sohna; Eric J Spence; Kim Stevens; Neil Sutton; Lukasz Szajkowski; Carolyn L Tregidgo; Gerardo Turcatti; Stephanie Vandevondele; Yuli Verhovsky; Selene M Virk; Suzanne Wakelin; Gregory C Walcott; Jingwen Wang; Graham J Worsley; Juying Yan; Ling Yau; Mike Zuerlein; Jane Rogers; James C Mullikin; Matthew E Hurles; Nick J McCooke; John S West; Frank L Oaks; Peter L Lundberg; David Klenerman; Richard Durbin; Anthony J Smith
Journal:  Nature       Date:  2008-11-06       Impact factor: 49.962

10.  LoRDEC: accurate and efficient long read error correction.

Authors:  Leena Salmela; Eric Rivals
Journal:  Bioinformatics       Date:  2014-08-26       Impact factor: 6.937

View more
  6 in total

Review 1.  Nanopore sequencing technology, bioinformatics and applications.

Authors:  Yunhao Wang; Yue Zhao; Audrey Bollas; Yuru Wang; Kin Fai Au
Journal:  Nat Biotechnol       Date:  2021-11-08       Impact factor: 54.908

Review 2.  Music of metagenomics-a review of its applications, analysis pipeline, and associated tools.

Authors:  Bilal Wajid; Faria Anwar; Imran Wajid; Haseeb Nisar; Sharoze Meraj; Ali Zafar; Mustafa Kamal Al-Shawaqfeh; Ali Riza Ekti; Asia Khatoon; Jan S Suchodolski
Journal:  Funct Integr Genomics       Date:  2021-10-18       Impact factor: 3.410

Review 3.  Genome sequence assembly algorithms and misassembly identification methods.

Authors:  Yue Meng; Yu Lei; Jianlong Gao; Yuxuan Liu; Enze Ma; Yunhong Ding; Yixin Bian; Hongquan Zu; Yucui Dong; Xiao Zhu
Journal:  Mol Biol Rep       Date:  2022-09-23       Impact factor: 2.742

4.  Scalable long read self-correction and assembly polishing with multiple sequence alignment.

Authors:  Pierre Morisse; Camille Marchet; Antoine Limasset; Thierry Lecroq; Arnaud Lefebvre
Journal:  Sci Rep       Date:  2021-01-12       Impact factor: 4.379

Review 5.  A comprehensive evaluation of long read error correction methods.

Authors:  Haowen Zhang; Chirag Jain; Srinivas Aluru
Journal:  BMC Genomics       Date:  2020-12-21       Impact factor: 3.969

Review 6.  Technology dictates algorithms: recent developments in read alignment.

Authors:  Mohammed Alser; Jeremy Rotman; Onur Mutlu; Serghei Mangul; Dhrithi Deshpande; Kodi Taraszka; Huwenbo Shi; Pelin Icer Baykal; Harry Taegyun Yang; Victor Xue; Sergey Knyazev; Benjamin D Singer; Brunilda Balliu; David Koslicki; Pavel Skums; Alex Zelikovsky; Can Alkan
Journal:  Genome Biol       Date:  2021-08-26       Impact factor: 13.583

  6 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.