Literature DB >> 19056777

Sequence progressive alignment, a framework for practical large-scale probabilistic consistency alignment.

Benedict Paten1, Javier Herrero, Kathryn Beal, Ewan Birney.   

Abstract

MOTIVATION: Multiple sequence alignment is a cornerstone of comparative genomics. Much work has been done to improve methods for this task, particularly for the alignment of small sequences, and especially for amino acid sequences. However, less work has been done in making promising methods that work on the small-scale practically for the alignment of much larger genomic sequences.
RESULTS: We take the method of probabilistic consistency alignment and make it practical for the alignment of large genomic sequences. In so doing we develop a set of new technical methods, combined in a framework we term 'sequence progressive alignment', because it allows us to iteratively compute an alignment by passing over the input sequences from left to right. The result is that we massively decrease the memory consumption of the program relative to a naive implementation. The general engineering of the challenges faced in scaling such a computationally intensive process offer valuable lessons for planning related large-scale sequence analysis algorithms. We also further show the strong performance of Pecan using an extended analysis of ancient repeat alignments. Pecan is now one of the default alignment programs that has and is being used by a number of whole-genome comparative genomic projects. AVAILABILITY: The Pecan program is freely available at http://www.ebi.ac.uk/ approximately bjp/pecan/ Pecan whole genome alignments can be found in the Ensembl genome browser.

Entities:  

Mesh:

Substances:

Year:  2008        PMID: 19056777     DOI: 10.1093/bioinformatics/btn630

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  30 in total

1.  PSAR-align: improving multiple sequence alignment using probabilistic sampling.

Authors:  Jaebum Kim; Jian Ma
Journal:  Bioinformatics       Date:  2013-11-12       Impact factor: 6.937

2.  Detection of nonneutral substitution rates on mammalian phylogenies.

Authors:  Katherine S Pollard; Melissa J Hubisz; Kate R Rosenbloom; Adam Siepel
Journal:  Genome Res       Date:  2009-10-26       Impact factor: 9.043

3.  Cactus: Algorithms for genome multiple sequence alignment.

Authors:  Benedict Paten; Dent Earl; Ngan Nguyen; Mark Diekhans; Daniel Zerbino; David Haussler
Journal:  Genome Res       Date:  2011-06-10       Impact factor: 9.043

Review 4.  The Genome 10K Project: a way forward.

Authors:  Klaus-Peter Koepfli; Benedict Paten; Stephen J O'Brien
Journal:  Annu Rev Anim Biosci       Date:  2015       Impact factor: 8.923

5.  Population genomic sequencing of Coccidioides fungi reveals recent hybridization and transposon control.

Authors:  Daniel E Neafsey; Bridget M Barker; Thomas J Sharpton; Jason E Stajich; Daniel J Park; Emily Whiston; Chiung-Yu Hung; Cody McMahan; Jared White; Sean Sykes; David Heiman; Sarah Young; Qiandong Zeng; Amr Abouelleil; Lynne Aftuck; Daniel Bessette; Adam Brown; Michael FitzGerald; Annie Lui; J Pendexter Macdonald; Margaret Priest; Marc J Orbach; John N Galgiani; Theo N Kirkland; Garry T Cole; Bruce W Birren; Matthew R Henn; John W Taylor; Steven D Rounsley
Journal:  Genome Res       Date:  2010-06-01       Impact factor: 9.043

6.  PicXAA: greedy probabilistic construction of maximum expected accuracy alignment of multiple sequences.

Authors:  Sayed Mohammad Ebrahim Sahraeian; Byung-Jun Yoon
Journal:  Nucleic Acids Res       Date:  2010-04-22       Impact factor: 16.971

7.  Nanopore sequencing and the Shasta toolkit enable efficient de novo assembly of eleven human genomes.

Authors:  Kishwar Shafin; Trevor Pesout; Ryan Lorig-Roach; Marina Haukness; Hugh E Olsen; Colleen Bosworth; Joel Armstrong; Kristof Tigyi; Nicholas Maurer; Sergey Koren; Fritz J Sedlazeck; Tobias Marschall; Simon Mayes; Vania Costa; Justin M Zook; Kelvin J Liu; Duncan Kilburn; Melanie Sorensen; Katy M Munson; Mitchell R Vollger; Jean Monlong; Erik Garrison; Evan E Eichler; Sofie Salama; David Haussler; Richard E Green; Mark Akeson; Adam Phillippy; Karen H Miga; Paolo Carnevali; Miten Jain; Benedict Paten
Journal:  Nat Biotechnol       Date:  2020-05-04       Impact factor: 54.908

Review 8.  Upcoming challenges for multiple sequence alignment methods in the high-throughput era.

Authors:  Carsten Kemena; Cedric Notredame
Journal:  Bioinformatics       Date:  2009-07-30       Impact factor: 6.937

9.  Ensembl's 10th year.

Authors:  Paul Flicek; Bronwen L Aken; Benoit Ballester; Kathryn Beal; Eugene Bragin; Simon Brent; Yuan Chen; Peter Clapham; Guy Coates; Susan Fairley; Stephen Fitzgerald; Julio Fernandez-Banet; Leo Gordon; Stefan Gräf; Syed Haider; Martin Hammond; Kerstin Howe; Andrew Jenkinson; Nathan Johnson; Andreas Kähäri; Damian Keefe; Stephen Keenan; Rhoda Kinsella; Felix Kokocinski; Gautier Koscielny; Eugene Kulesha; Daniel Lawson; Ian Longden; Tim Massingham; William McLaren; Karine Megy; Bert Overduin; Bethan Pritchard; Daniel Rios; Magali Ruffier; Michael Schuster; Guy Slater; Damian Smedley; Giulietta Spudich; Y Amy Tang; Stephen Trevanion; Albert Vilella; Jan Vogel; Simon White; Steven P Wilder; Amonida Zadissa; Ewan Birney; Fiona Cunningham; Ian Dunham; Richard Durbin; Xosé M Fernández-Suarez; Javier Herrero; Tim J P Hubbard; Anne Parker; Glenn Proctor; James Smith; Stephen M J Searle
Journal:  Nucleic Acids Res       Date:  2009-11-11       Impact factor: 16.971

10.  Towards realistic benchmarks for multiple alignments of non-coding sequences.

Authors:  Jaebum Kim; Saurabh Sinha
Journal:  BMC Bioinformatics       Date:  2010-01-26       Impact factor: 3.169

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.