BACKGROUND: There has been a trend in increasing the phylogenetic scope of genome sequencing without finishing the sequence of the genome. Increasing numbers of genomes are being published in scaffold or contig form. Rearrangement algorithms, however, including gene order-based phylogenetic tools, require whole genome data on gene order or syntenic block order. How then can we use rearrangement algorithms to compare genomes available in scaffold form only? Can the comparative evidence predict the location of unsequenced genes? RESULTS: Our method involves optimally filling in genes missing from the scaffolds, while incorporating the augmented scaffolds directly into the rearrangement algorithms as if they were chromosomes. This is accomplished by an exact, polynomial-time algorithm. We then correct for the number of extra fusion/fission operations required to make scaffolds comparable to full assemblies. We model the relationship between the ratio of missing genes actually absent from the genome versus merely unsequenced ones, on one hand, and the increase of genomic distance after scaffold filling, on the other. We estimate the parameters of this model through simulations and by comparing the angiosperm genomes Ricinus communis and Vitis vinifera. CONCLUSIONS: The algorithm solves the comparison of genomes with 18,300 genes, including 4500 missing from one genome, in less than a minute on a MacBook, putting virtually all genomes within range of the method.
BACKGROUND: There has been a trend in increasing the phylogenetic scope of genome sequencing without finishing the sequence of the genome. Increasing numbers of genomes are being published in scaffold or contig form. Rearrangement algorithms, however, including gene order-based phylogenetic tools, require whole genome data on gene order or syntenic block order. How then can we use rearrangement algorithms to compare genomes available in scaffold form only? Can the comparative evidence predict the location of unsequenced genes? RESULTS: Our method involves optimally filling in genes missing from the scaffolds, while incorporating the augmented scaffolds directly into the rearrangement algorithms as if they were chromosomes. This is accomplished by an exact, polynomial-time algorithm. We then correct for the number of extra fusion/fission operations required to make scaffolds comparable to full assemblies. We model the relationship between the ratio of missing genes actually absent from the genome versus merely unsequenced ones, on one hand, and the increase of genomic distance after scaffold filling, on the other. We estimate the parameters of this model through simulations and by comparing the angiosperm genomes Ricinus communis and Vitis vinifera. CONCLUSIONS: The algorithm solves the comparison of genomes with 18,300 genes, including 4500 missing from one genome, in less than a minute on a MacBook, putting virtually all genomes within range of the method.
Authors: Douglas E Soltis; Victor A Albert; Jim Leebens-Mack; Charles D Bell; Andrew H Paterson; Chunfang Zheng; David Sankoff; Claude W Depamphilis; P Kerr Wall; Pamela S Soltis Journal: Am J Bot Date: 2009-01 Impact factor: 3.844
Authors: Chunfang Zheng; P Kerr Wall; James Leebens-Mack; Claude DE Pamphilis; Victor A Albert; David Sankoff Journal: J Bioinform Comput Biol Date: 2009-06 Impact factor: 1.122
Authors: David Sankoff; Chunfang Zheng; P Kerr Wall; Claude dePamphilis; Jim Leebens-Mack; Victor A Albert Journal: J Comput Biol Date: 2009-10 Impact factor: 1.479
Authors: Riccardo Velasco; Andrey Zharkikh; Michela Troggio; Dustin A Cartwright; Alessandro Cestaro; Dmitry Pruss; Massimo Pindo; Lisa M Fitzgerald; Silvia Vezzulli; Julia Reid; Giulia Malacarne; Diana Iliev; Giuseppina Coppola; Bryan Wardell; Diego Micheletti; Teresita Macalma; Marco Facci; Jeff T Mitchell; Michele Perazzolli; Glenn Eldredge; Pamela Gatto; Rozan Oyzerski; Marco Moretto; Natalia Gutin; Marco Stefanini; Yang Chen; Cinzia Segala; Christine Davenport; Lorenzo Demattè; Amy Mraz; Juri Battilana; Keith Stormo; Fabrizio Costa; Quanzhou Tao; Azeddine Si-Ammour; Tim Harkins; Angie Lackey; Clotilde Perbost; Bruce Taillon; Alessandra Stella; Victor Solovyev; Jeffrey A Fawcett; Lieven Sterck; Klaas Vandepoele; Stella M Grando; Stefano Toppo; Claudio Moser; Jerry Lanchbury; Robert Bogden; Mark Skolnick; Vittorio Sgaramella; Satish K Bhatnagar; Paolo Fontana; Alexander Gutin; Yves Van de Peer; Francesco Salamini; Roberto Viola Journal: PLoS One Date: 2007-12-19 Impact factor: 3.240