Literature DB >> 19188606

Alignment-free genome comparison with feature frequency profiles (FFP) and optimal resolutions.

Gregory E Sims1, Se-Ran Jun, Guohong A Wu, Sung-Hou Kim.   

Abstract

For comparison of whole-genome (genic + nongenic) sequences, multiple sequence alignment of a few selected genes is not appropriate. One approach is to use an alignment-free method in which feature (or l-mer) frequency profiles (FFP) of whole genomes are used for comparison-a variation of a text or book comparison method, using word frequency profiles. In this approach it is critical to identify the optimal resolution range of l-mers for the given set of genomes compared. The optimum FFP method is applicable for comparing whole genomes or large genomic regions even when there are no common genes with high homology. We outline the method in 3 stages: (i) We first show how the optimal resolution range can be determined with English books which have been transformed into long character strings by removing all punctuation and spaces. (ii) Next, we test the robustness of the optimized FFP method at the nucleotide level, using a mutation model with a wide range of base substitutions and rearrangements. (iii) Finally, to illustrate the utility of the method, phylogenies are reconstructed from concatenated mammalian intronic genomes; the FFP derived intronic genome topologies for each l within the optimal range are all very similar. The topology agrees with the established mammalian phylogeny revealing that intron regions contain a similar level of phylogenic signal as do coding regions.

Entities:  

Mesh:

Year:  2009        PMID: 19188606      PMCID: PMC2634796          DOI: 10.1073/pnas.0813249106

Source DB:  PubMed          Journal:  Proc Natl Acad Sci U S A        ISSN: 0027-8424            Impact factor:   11.205


  12 in total

Review 1.  Alignment-free sequence comparison-a review.

Authors:  Susana Vinga; Jonas Almeida
Journal:  Bioinformatics       Date:  2003-03-01       Impact factor: 6.937

2.  Optimal word sizes for dissimilarity measures and estimation of the degree of dissimilarity between DNA sequences.

Authors:  Tiee-Jian Wu; Ying-Hsueh Huang; Lung-An Li
Journal:  Bioinformatics       Date:  2005-09-06       Impact factor: 6.937

3.  The average common substring approach to phylogenomic reconstruction.

Authors:  Igor Ulitsky; David Burstein; Tamir Tuller; Benny Chor
Journal:  J Comput Biol       Date:  2006-03       Impact factor: 1.479

Review 4.  Measuring genome evolution.

Authors:  M A Huynen; P Bork
Journal:  Proc Natl Acad Sci U S A       Date:  1998-05-26       Impact factor: 11.205

5.  The neighbor-joining method: a new method for reconstructing phylogenetic trees.

Authors:  N Saitou; M Nei
Journal:  Mol Biol Evol       Date:  1987-07       Impact factor: 16.240

6.  A measure of the similarity of sets of sequences not requiring sequence alignment.

Authors:  B E Blaisdell
Journal:  Proc Natl Acad Sci U S A       Date:  1986-07       Impact factor: 11.205

7.  Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project.

Authors:  Ewan Birney; John A Stamatoyannopoulos; Anindya Dutta; Roderic Guigó; Thomas R Gingeras; Elliott H Margulies; Zhiping Weng; Michael Snyder; Emmanouil T Dermitzakis; Robert E Thurman; Michael S Kuehn; Christopher M Taylor; Shane Neph; Christoph M Koch; Saurabh Asthana; Ankit Malhotra; Ivan Adzhubei; Jason A Greenbaum; Robert M Andrews; Paul Flicek; Patrick J Boyle; Hua Cao; Nigel P Carter; Gayle K Clelland; Sean Davis; Nathan Day; Pawandeep Dhami; Shane C Dillon; Michael O Dorschner; Heike Fiegler; Paul G Giresi; Jeff Goldy; Michael Hawrylycz; Andrew Haydock; Richard Humbert; Keith D James; Brett E Johnson; Ericka M Johnson; Tristan T Frum; Elizabeth R Rosenzweig; Neerja Karnani; Kirsten Lee; Gregory C Lefebvre; Patrick A Navas; Fidencio Neri; Stephen C J Parker; Peter J Sabo; Richard Sandstrom; Anthony Shafer; David Vetrie; Molly Weaver; Sarah Wilcox; Man Yu; Francis S Collins; Job Dekker; Jason D Lieb; Thomas D Tullius; Gregory E Crawford; Shamil Sunyaev; William S Noble; Ian Dunham; France Denoeud; Alexandre Reymond; Philipp Kapranov; Joel Rozowsky; Deyou Zheng; Robert Castelo; Adam Frankish; Jennifer Harrow; Srinka Ghosh; Albin Sandelin; Ivo L Hofacker; Robert Baertsch; Damian Keefe; Sujit Dike; Jill Cheng; Heather A Hirsch; Edward A Sekinger; Julien Lagarde; Josep F Abril; Atif Shahab; Christoph Flamm; Claudia Fried; Jörg Hackermüller; Jana Hertel; Manja Lindemeyer; Kristin Missal; Andrea Tanzer; Stefan Washietl; Jan Korbel; Olof Emanuelsson; Jakob S Pedersen; Nancy Holroyd; Ruth Taylor; David Swarbreck; Nicholas Matthews; Mark C Dickson; Daryl J Thomas; Matthew T Weirauch; James Gilbert; Jorg Drenkow; Ian Bell; XiaoDong Zhao; K G Srinivasan; Wing-Kin Sung; Hong Sain Ooi; Kuo Ping Chiu; Sylvain Foissac; Tyler Alioto; Michael Brent; Lior Pachter; Michael L Tress; Alfonso Valencia; Siew Woh Choo; Chiou Yu Choo; Catherine Ucla; Caroline Manzano; Carine Wyss; Evelyn Cheung; Taane G Clark; James B Brown; Madhavan Ganesh; Sandeep Patel; Hari Tammana; Jacqueline Chrast; Charlotte N Henrichsen; Chikatoshi Kai; Jun Kawai; Ugrappa Nagalakshmi; Jiaqian Wu; Zheng Lian; Jin Lian; Peter Newburger; Xueqing Zhang; Peter Bickel; John S Mattick; Piero Carninci; Yoshihide Hayashizaki; Sherman Weissman; Tim Hubbard; Richard M Myers; Jane Rogers; Peter F Stadler; Todd M Lowe; Chia-Lin Wei; Yijun Ruan; Kevin Struhl; Mark Gerstein; Stylianos E Antonarakis; Yutao Fu; Eric D Green; Ulaş Karaöz; Adam Siepel; James Taylor; Laura A Liefer; Kris A Wetterstrand; Peter J Good; Elise A Feingold; Mark S Guyer; Gregory M Cooper; George Asimenos; Colin N Dewey; Minmei Hou; Sergey Nikolaev; Juan I Montoya-Burgos; Ari Löytynoja; Simon Whelan; Fabio Pardi; Tim Massingham; Haiyan Huang; Nancy R Zhang; Ian Holmes; James C Mullikin; Abel Ureta-Vidal; Benedict Paten; Michael Seringhaus; Deanna Church; Kate Rosenbloom; W James Kent; Eric A Stone; Serafim Batzoglou; Nick Goldman; Ross C Hardison; David Haussler; Webb Miller; Arend Sidow; Nathan D Trinklein; Zhengdong D Zhang; Leah Barrera; Rhona Stuart; David C King; Adam Ameur; Stefan Enroth; Mark C Bieda; Jonghwan Kim; Akshay A Bhinge; Nan Jiang; Jun Liu; Fei Yao; Vinsensius B Vega; Charlie W H Lee; Patrick Ng; Atif Shahab; Annie Yang; Zarmik Moqtaderi; Zhou Zhu; Xiaoqin Xu; Sharon Squazzo; Matthew J Oberley; David Inman; Michael A Singer; Todd A Richmond; Kyle J Munn; Alvaro Rada-Iglesias; Ola Wallerman; Jan Komorowski; Joanna C Fowler; Phillippe Couttet; Alexander W Bruce; Oliver M Dovey; Peter D Ellis; Cordelia F Langford; David A Nix; Ghia Euskirchen; Stephen Hartman; Alexander E Urban; Peter Kraus; Sara Van Calcar; Nate Heintzman; Tae Hoon Kim; Kun Wang; Chunxu Qu; Gary Hon; Rosa Luna; Christopher K Glass; M Geoff Rosenfeld; Shelley Force Aldred; Sara J Cooper; Anason Halees; Jane M Lin; Hennady P Shulha; Xiaoling Zhang; Mousheng Xu; Jaafar N S Haidar; Yong Yu; Yijun Ruan; Vishwanath R Iyer; Roland D Green; Claes Wadelius; Peggy J Farnham; Bing Ren; Rachel A Harte; Angie S Hinrichs; Heather Trumbower; Hiram Clawson; Jennifer Hillman-Jackson; Ann S Zweig; Kayla Smith; Archana Thakkapallayil; Galt Barber; Robert M Kuhn; Donna Karolchik; Lluis Armengol; Christine P Bird; Paul I W de Bakker; Andrew D Kern; Nuria Lopez-Bigas; Joel D Martin; Barbara E Stranger; Abigail Woodroffe; Eugene Davydov; Antigone Dimas; Eduardo Eyras; Ingileif B Hallgrímsdóttir; Julian Huppert; Michael C Zody; Gonçalo R Abecasis; Xavier Estivill; Gerard G Bouffard; Xiaobin Guan; Nancy F Hansen; Jacquelyn R Idol; Valerie V B Maduro; Baishali Maskeri; Jennifer C McDowell; Morgan Park; Pamela J Thomas; Alice C Young; Robert W Blakesley; Donna M Muzny; Erica Sodergren; David A Wheeler; Kim C Worley; Huaiyang Jiang; George M Weinstock; Richard A Gibbs; Tina Graves; Robert Fulton; Elaine R Mardis; Richard K Wilson; Michele Clamp; James Cuff; Sante Gnerre; David B Jaffe; Jean L Chang; Kerstin Lindblad-Toh; Eric S Lander; Maxim Koriabine; Mikhail Nefedov; Kazutoyo Osoegawa; Yuko Yoshinaga; Baoli Zhu; Pieter J de Jong
Journal:  Nature       Date:  2007-06-14       Impact factor: 49.962

8.  Genomics, biogeography, and the diversification of placental mammals.

Authors:  Derek E Wildman; Monica Uddin; Juan C Opazo; Guozhen Liu; Vincent Lefort; Stephane Guindon; Olivier Gascuel; Lawrence I Grossman; Roberto Romero; Morris Goodman
Journal:  Proc Natl Acad Sci U S A       Date:  2007-08-29       Impact factor: 11.205

9.  Confirming the phylogeny of mammals by use of large comparative sequence data sets.

Authors:  Arjun B Prasad; Marc W Allard; Eric D Green
Journal:  Mol Biol Evol       Date:  2008-05-02       Impact factor: 16.240

10.  Pattern-based phylogenetic distance estimation and tree reconstruction.

Authors:  Michael Höhl; Isidore Rigoutsos; Mark A Ragan
Journal:  Evol Bioinform Online       Date:  2007-02-25       Impact factor: 1.625

View more
  126 in total

1.  Normal and compound poisson approximations for pattern occurrences in NGS reads.

Authors:  Zhiyuan Zhai; Gesine Reinert; Kai Song; Michael S Waterman; Yihui Luan; Fengzhu Sun
Journal:  J Comput Biol       Date:  2012-06       Impact factor: 1.479

2.  Previously unknown and highly divergent ssDNA viruses populate the oceans.

Authors:  Jessica M Labonté; Curtis A Suttle
Journal:  ISME J       Date:  2013-07-11       Impact factor: 10.302

Review 3.  Alignment-free genetic sequence comparisons: a review of recent approaches by word analysis.

Authors:  Oliver Bonham-Carter; Joe Steele; Dhundy Bastola
Journal:  Brief Bioinform       Date:  2013-07-31       Impact factor: 11.622

Review 4.  New developments of alignment-free sequence comparison: measures, statistics and next-generation sequencing.

Authors:  Kai Song; Jie Ren; Gesine Reinert; Minghua Deng; Michael S Waterman; Fengzhu Sun
Journal:  Brief Bioinform       Date:  2013-09-23       Impact factor: 11.622

Review 5.  Sequence analysis by iterated maps, a review.

Authors:  Jonas S Almeida
Journal:  Brief Bioinform       Date:  2013-10-25       Impact factor: 11.622

6.  K-mer natural vector and its application to the phylogenetic analysis of genetic sequences.

Authors:  Jia Wen; Raymond H F Chan; Shek-Chung Yau; Rong L He; Stephen S T Yau
Journal:  Gene       Date:  2014-05-22       Impact factor: 3.688

7.  Prot-SpaM: fast alignment-free phylogeny reconstruction based on whole-proteome sequences.

Authors:  Chris-Andre Leimeister; Jendrik Schellhorn; Svenja Dörrer; Michael Gerth; Christoph Bleidorn; Burkhard Morgenstern
Journal:  Gigascience       Date:  2019-03-01       Impact factor: 6.524

8.  Whole-proteome phylogeny of prokaryotes by feature frequency profiles: An alignment-free method with optimal feature resolution.

Authors:  Se-Ran Jun; Gregory E Sims; Guohong A Wu; Sung-Hou Kim
Journal:  Proc Natl Acad Sci U S A       Date:  2009-12-14       Impact factor: 11.205

9.  Whole-proteome phylogeny of large dsDNA virus families by an alignment-free method.

Authors:  Guohong Albert Wu; Se-Ran Jun; Gregory E Sims; Sung-Hou Kim
Journal:  Proc Natl Acad Sci U S A       Date:  2009-06-24       Impact factor: 11.205

10.  Monte Carlo simulation of mechanical unfolding of proteins based on a simple two-state model.

Authors:  William T King; Meihong Su; Guoliang Yang
Journal:  Int J Biol Macromol       Date:  2009-12-23       Impact factor: 6.953

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.