Literature DB >> 15882139

Space-efficient whole genome comparisons with Burrows-Wheeler transforms.

Ross A Lippert1.   

Abstract

The starting point for any alignment of mammalian genomes is the computation of exact matches satisfying various criteria. Time-efficient, O(n), data structures for this computation, such as the suffix tree, require O(n log(n)) space, several times the space of the genomes themselves. Thus, any reasonable whole-genome comparative project finds itself requiring tens of Gigabytes of RAM to maintain time-efficiency. This is beyond most modern workstations. With a new data structure, the compressed suffix array (CSA) implemented via the Burrows-Wheeler transform, we can trade time-efficiency for space-efficiency, taking O(n log(n)) time, but running in O(n) space, typically in total space less than or equal to that of the genomes themselves. If space is more expensive than time, this is an appropriate approach to consider. The most space-efficient implementation of this data structure requires 5 bits per nucleotide character to build on-line, in the worst case, and 2.5 bits per character to store once built. We present a description of this data structure and how it is used to obtain matches. An implementation (called bbbwt) is demonstrated by aligning two mammalian genomes on a modest workstation equipped with under 2 GB of free RAM in time superior to that of the implementations of other data structures.

Entities:  

Mesh:

Year:  2005        PMID: 15882139     DOI: 10.1089/cmb.2005.12.407

Source DB:  PubMed          Journal:  J Comput Biol        ISSN: 1066-5277            Impact factor:   1.479


  10 in total

1.  Short Read Mapping: An Algorithmic Tour.

Authors:  Stefan Canzar; Steven L Salzberg
Journal:  Proc IEEE Inst Electr Electron Eng       Date:  2015-09-07       Impact factor: 10.961

2.  Aging Human Hematopoietic Stem Cells Manifest Profound Epigenetic Reprogramming of Enhancers That May Predispose to Leukemia.

Authors:  Hsuan-Ting Huang; Alejandro Roisman; Emmalee R Adelman; André Olsson; Antonio Colaprico; Tingting Qin; R Coleman Lindsley; Rafael Bejar; Nathan Salomonis; H Leighton Grimes; Maria E Figueroa
Journal:  Cancer Discov       Date:  2019-05-13       Impact factor: 39.397

3.  kngMap: Sensitive and Fast Mapping Algorithm for Noisy Long Reads Based on the K-Mer Neighborhood Graph.

Authors:  Ze-Gang Wei; Xing-Guo Fan; Hao Zhang; Xiao-Dan Zhang; Fei Liu; Yu Qian; Shao-Wu Zhang
Journal:  Front Genet       Date:  2022-05-05       Impact factor: 4.772

4.  Ultrafast and memory-efficient alignment of short DNA sequences to the human genome.

Authors:  Ben Langmead; Cole Trapnell; Mihai Pop; Steven L Salzberg
Journal:  Genome Biol       Date:  2009-03-04       Impact factor: 13.583

5.  Towards computational improvement of DNA database indexing and short DNA query searching.

Authors:  Done Stojanov; Sašo Koceski; Aleksandra Mileva; Nataša Koceska; Cveta Martinovska Bande
Journal:  Biotechnol Biotechnol Equip       Date:  2014-10-31       Impact factor: 1.632

Review 6.  An NGS Workflow Blueprint for DNA Sequencing Data and Its Application in Individualized Molecular Oncology.

Authors:  Jian Li; Aarif Mohamed Nazeer Batcha; Björn Grüning; Ulrich R Mansmann
Journal:  Cancer Inform       Date:  2016-04-10

7.  CHOP: haplotype-aware path indexing in population graphs.

Authors:  Tom Mokveld; Jasper Linthorst; Zaid Al-Ars; Henne Holstege; Marcel Reinders
Journal:  Genome Biol       Date:  2020-03-11       Impact factor: 13.583

8.  Fast and accurate short read alignment with Burrows-Wheeler transform.

Authors:  Heng Li; Richard Durbin
Journal:  Bioinformatics       Date:  2009-05-18       Impact factor: 6.937

9.  Entropic Profiler - detection of conservation in genomes using information theory.

Authors:  Francisco Fernandes; Ana T Freitas; Jonas S Almeida; Susana Vinga
Journal:  BMC Res Notes       Date:  2009-05-05

10.  GSAlign: an efficient sequence alignment tool for intra-species genomes.

Authors:  Hsin-Nan Lin; Wen-Lian Hsu
Journal:  BMC Genomics       Date:  2020-02-24       Impact factor: 3.969

  10 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.